Wednesday, January 30, 2008

A Dynamic Gesture Interface for Virtual Environments Based on Hidden Markov Models - Qing et al.

Summary
Qing et al use HMMs to recognize hand gestures. First, to combat the difficulty of spotting a gesture in a continuous stream of input, they reduce the time sequence of each sensor to its standard deviation, though don't say how this segments the gestures. Then, the data is vector quantized. Next, the 20 sensor standard deviations are used as the observation sequence that is input to the HMM. Initially, the HMM for each consists of 20 states, corresponding each sensor, with the transition probability from state i to i+1 equaling 1 and the rest equaling 0, always starting in the first state. They train the HMMs using 10 examples of each of three gestures, index finger bending, thumb bending, and index/middle fingers bending (difficult to separate, no?). The state transition and initial state probabilities are trained in addition to the observation probabilities. Lastly, they note that they successfully rotate a 3-D cube along 3 axes using this system and its 3 gestures.

Discussion
I don't like this paper. There is no reason to use an HMM in this setup. Some kind of probability distribution estimate for the standard deviation of each sensor value for each gesture class maybe, but HMM this is not. They already know what each state is, so Hidden is out. They aren't modeling a process over time with the states any more, so why use state transitions even? Don't tell me you're going to use an HMM then boil out all the complexity to something that you could just use nearest neighbors or a linear/quadratic classifier for.

Reference
Qing, C., A. El-Sawah, et al. (2005). A dynamic gesture interface for virtual environments based on hidden Markov models. Haptic Audio Visual Environments and their Applications, 2005. IEEE International Workshop on.

Online, Interactive Learning of Gestures for Human/Robot Interfaces - Lee and Xu

Summary
Lee and Xu seek to create an HMM-based system that recognizes hand gestures with little up-front training that can learn from its mistakes and add new gestures on the fly. First they segment the input stream from a CyberGlove into discrete symbols using a fast Fourier transform and vector quantization. They collect one example of a set gestures and train several left-to-right HMMs to recognize these gestures. Next, they classify several test gestures using a confidence measure. If this measure is below a threshold, the classifier is certain of its classification and an action is taken; otherwise, it is uncertain and prompts the user for the correct classification. The uncertain example is then either used to create a new HMM and class or to update the appropriate HMM by iterating through Baum-Welch with the additional example. Their iterative method achieves high accuracy (>99%) after a small number of examples and performs on par with batch methods (based on the likelihood that the HMMs would generate the training data).

Discussion
This is a good extension of HMMs allowing for tuning the system to a user while in use; however, they do not provide a test accuracy of batch trained HMMs for comparison making it difficult to determine which performs more accurately. Their ideal to probabilistically determine the certainty of a classification seems like a very good (useful) idea. I'd like to know if their evaluation function is just something that they thought up that works pretty well or if it has some statistical basis.

Reference
Lee, C. and X. Yangsheng (1996). Online, interactive learning of gestures for human/robot interfaces. Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on.

Wednesday, January 23, 2008

An Introduction to Hidden Markov Models

Summary
Rabiner and Juang provide an excellent beginner's guide to Hidden Markov models. The begin with a bit of background information about HMMs, before describing what an HMM is through an example. An HMM is essential a set of hidden states about which probabilistic observations can be made and a set of rules governing how to move between states. Given an HMM, we can produce a series of observations by moving between states according to the rules and then probabilistically generating the observation. Additionally, Rabiner and Juang detail three other problems that can be solved using HMMs. First, is determining the likelihood of a sequence of observations given an HMM. This can be done using the forward or backward procedure, detailed on page 9. Next, given an HMM and an observation sequence, the most likely state sequence can be determined using the Viterbi algorithm on page 11. Lastly, an HMM can be generated from a sequence or sequence of observations using Baum-Welch re-estimation, also on page 11. Lastly, they provide a example application of HMMs, recognition of single spoken words.

Discussion
This paper is a great reference for HMMs. The algorithms are described in a straight-forward, understandable manner. The only hard part is when and how to apply an HMM to a given problem.

Reference
Rabiner, L. and B. Juang (1986). "An introduction to hidden Markov models." ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine] 3(1): 4-16.

American Sign Language Finger Spelling Recognition System

Summary
Allen et al. seek to create a system to allow improved communication between the deaf community and the general public. To this end, they first seek to create an automated translator from the alphabetic portion of American Sign Language to written and spoken letters. They use an 18 sensor CyberGlove to measure the position and orientation of the users fingers and the orientation of the hand with respect to the rest of the arm. They trained a perceptron-based neural network to translate a single person's signs. With ten examples of each letter, they achieved a 90% accuracy rate for translation for a single user.

Discussion
Not much to say about this one. It's essentially CyberGlove + neural network = translator. It's a good first step, but faces a few problems, starting with the hardware being somewhat expensive. Training to a specific user isn't too big of a problem, since it could be marketed to an individual user, but a version that achieves high accuracy for multiple users would be nice.

Reference

Tuesday, January 22, 2008

Flexible Gesture Recognition for Immersive Virtual Environments - Deller, Ebert, et al

Summary
Deller et al create a framework for interaction involving a data glove, analagous to the LADDER framework for geometric sketches. They use a P5 data glove for their system, but it is adaptable for any type of hardware. The data glove provides hand position and orientation information as well as finger flexion. Additionally, the glove has several buttons for additional input. Gestures are defined as a sequence of postures and orientations rather than as motions over time. Postures rely mainly on the flexion of the fingers, though orientation may be important as well, therefore posture information contains both flexion and orientation, as well as a relevance value for orientation. As new postures are generated by example, a user simply move thier hand to the correct position to define the posture. Alternately, variations of the posture can be input to create an average posture. Recognition is divided into two phases: data acquistion and gesture management. As the data glove is very noisy, the data must be filtered to obtain adequate values. First a deadband filter is applied, and extreme changes are rejected. Then a dynamic average is taken to smooth the data. Next, matching posture candidates are found from the posture library, and if the posture is held briefly, a PostureChanged event is created. This contains both the previous and current posture as well as position and orientation. Also, GloveMove and ButtonPressed events are created when the glove position changes enough or a button is pressed. Gesture management matches postures data to stored postures by treating flexion values as a five dimensional vector and calculating the closest stored posture. If the posture is close enough to the stored one and the orientation is stasified, it is assigned that posture class. Gestures are defined as a sequence of one or more postures, and the sequence of past postures is matched to possible gestures. The gesture system was demonstrated using a virtual desktop. User natually interacted with the environment, grasping objects by making a fist or pointing at objects to examine them more closely.

Discussion
Though it seems relatively simple, the authors do not test recognition accuracy extensively. Also, their demonstration uses only a handful of postures, all of which would seem to be fairly distinct, making posture recognition easy. It would be more interesting to see how accurate posture recognition is for a more expansive posture data set, such as sign language mentioned by the authors. A more robust posture recognizer may be required in the face of a greater number of possibly ambiguous postures.

Reference
Deller, M., A. Ebert, et al. (2006). Flexible Gesture Recognition for Immersive Virtual Environments. Information Visualization, 2006. IV 2006. Tenth International Conference on.

Environmental technology: making the real world virtual - Myron

Summary
Myron summarizes his work in shifting from a world in which users must learn how to use computers and software up front to one in which they learn by interacting with the system as they do with the real world. Unlike other research at that time who used bulky hardware to measure how a user was interacting, Myron focused on interaction through observation, using video and floor pressure sensors to perceive user actions. In the creation of VideoPlace, Myron created a virtual shared space that overlapped video of the users' hand with virtual objects in which multiple users could interact with each other as well as shared objects via teleconference. In this environment, Myron observed that users reacted to and interacted with objects much as they would with real ones. Myron's next project created a virtual world through which users could move based on the movement of their hands and body. This led to a variety of VideoPlace applications such as range-of-motion therapy, virtual tutoring, and other virtual educational experiences. Myron next moved from a large scale setup to a smaller one, creating the more contained VideoDesk and associated applications such a virtual modeling and sculpting. Throughout his research, Myron sees teleconferencing as the primary benefactor of haptic interaction.

Discussion


Reference
Myron, W. K. (1993). "Environmental technology: making the real world virtual." Commun. ACM 36(7): 36-37.