Sunday, May 4, 2008

Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

Fels and Hinton create a hand based artificial speech system using neural networks. They create 3 networks that determine the inputs to a speech synthesizer. The first determines whether the left hand is forming a vowel or a consonant. The second determines which vowel is being formed based on the hands vertical and horizontal position. The last network determines which consonant the user is forming with the fingers of the right hand. The first is a fully feedforward network trained on 2600 samples, while each other is an RBF network with the centers trained as class averages. Each network shows low expected error (<6%), and a user trained for 100 hours can speak intelligibly.

Discussion
100 hours of training? For someone that they think will learn easily due to prior experience? How long will someone with no experience take to learn to speak with this system. It's also odd that they chose to use phoneme signing, rather than interpreting sign language to text and using a text to speech converter. The system would undoubtedly have a small translation lag time, but could be used by someone who already knew how to sign alphabetically.

Reference

Fels, S. S. and G. E. Hinton (1998). "Glove-TalkII-a neural-network
interface which maps gestures to parallel formant speech synthesizer
controls." Neural Networks, IEEE Transactions on 9(1): 205-212.

No comments: