Tuesday, August 16, 2011

GSoC Final Weeks Status Report

     Google Summer of Code is coming to an end, and my project has gone quite smoothly!  I have been working with the open-source speech recognition program simon under the mentorship of Peter Grasch.  I have developed a context-evaluation system that will improve the performance and versatility of speech recognition scenarios in simon.  In short, the system gathers the context of the user (what programs are running, what is the active window, etc.), and then allows that context to be easily used by scenarios in order to determine whether or not they should be active.  You can read more about the details in my proposal.


     At this point, everything that has been proposed has been implemented except acoustic model switching (which has been considered a side project from the beginning).  I will continue to work on acoustic model switching after GSoC, but in the mean time I would like to focus on the implementation of the context gathering and grammar switching.


     The context-gathering system has been developed with plugins, so it will be easily expandable.  The abstract interface for a context-gathering plugin is the Condition class.  The Scenario class in simon has been expanded to include a group of these Condition objects, and the scenario will automatically deactivate (its vocabulary and other features will be excluded from the language model) whenever it has a Condition that is not satisfied.  


     For example, the ActiveWindow plugin that I made monitors the active window, which allows a scenario to be activated only if the the active window is controlled by a specific process or has a specific name (eg. The active window is controlled by firefox.exe and has the title ".*Mozilla Firefox" (Qt regular expressions can be used for the window title)). 

     Other currently completed context-gathering plugins include one that checks whether or not a process is currently running, and one that checks whether or not at least one condition from a group of other conditions is satisfied.  I have thoroughly documented all aspects of the context system with doxygen style comments, so adding more context-gathering plugins should be fairly easy (feel free to ask me for help if you want it though!).


This screenshot shows the context specifications for a rekonq child scenario that is only active when rekonq is opened and has the active window.  This child scenario could, for example, be used exclusively for hotkey commands that would only work if rekonq was the active window.


     My original design idea for the grammar switching involved actually changing the grammar of the scenarios based on the context, but this would have been strange to implement.  It was much easier to deactivate a whole scenario than just a part of it.  Unfortunately, the desired use-case would be the ability to make a single scenario that reacted differently to many contexts.  In order to get around this, my mentor, Peter, came up with the idea of having a scenario hierarchy where a single "super-scenario" contained a number of child scenarios that would be activated under different contexts.  This way, the structure of the scenarios was kept simple, but they could be organized in a way that produced the same result as my original design.


     I am currently in the process of cleaning up my code, adding documentation, and making some tutorial YouTube videos.  Pictures and video links are on the way!  It's gonna be sweet!