Posts

GSoc 2012 Final Report

I had a great summer engage in a project for Banshee media player developing a voice control extension for it. My main goal was to enhance the usability and accessibility of Banshee. For that I used the pocketsphinx and sphinxbase packages of CMUsphinx speech recognition tool kit. It was challenging to integrate it to Banshee media player as it has used Mono technologies and CMUsphinx is developed using C. I used the gstreamer pipeline to get the input continuously from the microphone. To enable the extension first PocketSphinx has to be added to your system. These are the steps you should follow in Ubuntu. First download pocketsphinx and sphinxbase packages and wnzip them. Then build them and installed them % ./configure % make % make install do this for both folders.  Then ran the  export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/ lib/pkgconfig command in the terminal. then  pocketsphinx_continuous command worked. please refer to this

GUADEC 2012 and my Lightening Talk

The GUADEC 2012 speeches and main events were ended and now it is Hackfests and Bofs time. I'm hacking on my GSoc project to  enhance the speech recognition accuracy. I met Arun Raghavan one of the main contributors of the PulseAudio and he added some important suggestions for my work. I'm going to follow his suggestions regarding removing the noises out of audio and extracting only the voice attributes. It was a brilliant chance to be at GUADEC 2012 and meet so many Hackers working on different backgrounds. I had the chance to listen to their experiences, the problems they faced and how they were able to overcome them. This will help me to be more productive professional :) I did my lightening talk on "Enhancing the accessibility through voice control for Banshee "  Download The Presentation Slides here. Your suggestions and comments about my work and my speech is highly appreciated as it will help to improve my future work. :)

Integration of the Pocketsphinx to Banshee

Image
I was able write some C code and make the pipeline . But in the beginning (Gst.Pipeline) Gst.parse_launch("gconfaudiosrc ! audioconvert ! audioresample ! vader name=vad auto-threshold=true ! pocketsphinx name=asr ! fakesink"); code didn't work in my Ubuntu 11.10 system as  there was a problem with "gconfaudiosrc". Then I installed some packages and tried. It worked well. Now I could make the GStreamer pipeline through C code. Then I was able to implement make files getting the help of my Mentor Alex K. He gave me permission to commit to the community extensions of Banshee media player. So I created a branch called voicecontrol and committed the make files and the basic file structure. He guided me to learn the basics of the Git commands. I implemented C# code and invoked the C functions and able to get a out put results. When I run the code microphone was listening and continuously detected the sounds and converted them to  text.

Update Voice Control for Banshee

I tested few speech recognition frameworks that are available free to use Simon , GNOME Voice Control and two of them had some difficulties to be used in with Banshee C# code. GNOME Voice Control was developed for GSoc 2007 project and it seems not much work recently done on it. CMUSphinx group of speech recognition systems developed at Carnegie Mellon University. CMUSphinx toolkit is a leading speech recognition toolkit with various tools used to build speech applications. CMU Sphinx toolkit has a number of packages for different tasks and applications. It’s sometimes confusing what to choose. Pocketsphinx — lightweight recognizer library written in C. Sphinxbase — support library required by Pocketsphinx I'm going to use these two library files to be integrated to Banshee code base. In Ubuntu 11.10 environment : To build and install SphinxBase. I downloaded directly from the repository, % ./autogen.sh then compiled and installed: % ./configure % make

Progress of the Voice Control For Banshee

Week # 03 & Week 04   I started coding to record a voice clip through the microphone. Got added to GNOME Planet. Created project details in wiki.   There are several ways to add voice control for Banshee. What will be the best ? It is to be found out. I should try several options and choose what will be the best. Speech recognition can be used by using a frame work .  Simon project  was implemented in C/C++ and tried to connect to Banshee. Simon uses a software called Hidden Markov Model Toolkit (HTK) to generate the speechmodel. This software is free of charge, but its licence prohibits its distribution with Simon. So it will be a main dependency for the project. Gnome had a project which was implemented for GSoc 2007. The project was GnomeVoiceControl was implemented in C in conjunction with CMU Sphinx, which is an open source tool, created to convert speech to text. Banshee was developed using Mono IDE and .NET platforms which I have to write a C# code to c

Hello GNOME Planet

Voice Control For Banshee The idea is to add voice control for Banshee media player to improve the accessibility. Many portable media player devices can be controlled using the human voice and it has resulted in the users having a rich accessibility interface. The plan is to let the user do basic controls through his voice which will be helpful for the people who can't use the normal mouse controls.