Wednesday, January 30, 2008

A Dynamic Gesture Interface for Virtual Environments Based on Hidden Markov Models - Qing et al.

Summary
Qing et al use HMMs to recognize hand gestures. First, to combat the difficulty of spotting a gesture in a continuous stream of input, they reduce the time sequence of each sensor to its standard deviation, though don't say how this segments the gestures. Then, the data is vector quantized. Next, the 20 sensor standard deviations are used as the observation sequence that is input to the HMM. Initially, the HMM for each consists of 20 states, corresponding each sensor, with the transition probability from state i to i+1 equaling 1 and the rest equaling 0, always starting in the first state. They train the HMMs using 10 examples of each of three gestures, index finger bending, thumb bending, and index/middle fingers bending (difficult to separate, no?). The state transition and initial state probabilities are trained in addition to the observation probabilities. Lastly, they note that they successfully rotate a 3-D cube along 3 axes using this system and its 3 gestures.

Discussion
I don't like this paper. There is no reason to use an HMM in this setup. Some kind of probability distribution estimate for the standard deviation of each sensor value for each gesture class maybe, but HMM this is not. They already know what each state is, so Hidden is out. They aren't modeling a process over time with the states any more, so why use state transitions even? Don't tell me you're going to use an HMM then boil out all the complexity to something that you could just use nearest neighbors or a linear/quadratic classifier for.

Reference
Qing, C., A. El-Sawah, et al. (2005). A dynamic gesture interface for virtual environments based on hidden Markov models. Haptic Audio Visual Environments and their Applications, 2005. IEEE International Workshop on.

1 comment:

Test said...

lol. But it's dynamic ;-)

But really, your point is an excellent one.