Archive for June, 2009|Monthly archive page

Merapi AIR/Java Speech Recognition and Voice Control

While I have been playing around with Merapi to add voice recognition for the talking head, I created a little Merapi application that allows me to move a target on the screen by saying the voice commands LEFT, RIGHT, MIDDLE.

I have seen Rich Tretola’s blog Everything Flex on text to speech through Merapi and thought it would make sense to do it the other way around. In other words, voice recognition and voice control.

This is fairly simple to achieve. I decided to use the Sphinx 4 Speech Recognizer which is an open source java project by Carnegie Mellon University and others.

I then wrote a client for that framework, added the merapi jar files and broadcast a merapi message whenever the Sphinx Client detected speech.

You can download the full source code by clicking here.

Advertisements

AIR/Java Augmented Reality Talking Head

I have been working on this project for the past few weeks and it is now coming together nicely so I thought I would create a little blog entry about it. The idea is to create an Augmented Reality head with which one can have a conversation.

So far I have got as far as being able to type something and have the augmented reality head answer me back.

What takes place is that we have an AIR client (built using Cairngorm) communicating with a Java server side using Remote Objects over BlazeDS. The text is sent to the Java server application using remote objects where a text response is generated using AIML and a Java chatbot framework. This text response is passed to a text to speech (TTS) socket server to generate both an mp3 byte array and something called MBROLA input format. MBROLA input format is a stream of text symbols (phonemes) together with duration in milliseconds, that represent visemes (mouth shapes).

The whole lot is packaged and sent back over the wire via BlazeDS where we have an Augmented Reality Viewer created as an Advanced Flex Visual Component (using Papervision3D and FLARToolkit). The model head was created in Maya and is an animated Collada with 13 different mouth shapes that have been mapped to the output received from the MBROLA stream.

To play the speech response in AIR, the mp3 byte array is written as a temporary file, read into a sound object and then played back. At the same time the MBROLA stream has been parsed into an ArrayCollection of frames (for the model head) and durations and this is now iterated over in the handler method of a timer.

Coming soon hopefully will be speech recognition via the Merapi Java/AIR so that you can talk to the head.