Wednesday, February 6, 2008

A Similarity Measure for Motion Stream Segmentation and Recognition - Li and Prabhakaran

Summary
Li and Prabhakarn define a metric that can measure how similar two gestures are to one another. They create this metric base on singular value decomposition. After collecting a matrix A of sensor observations (column for each sensor, row for each observation), they find the eigenvalues and eigenvectors of the square matrix A^T A. Using only the first k eigenvectors/values, they find a weighted sum of the dot product of the eigenvectors. This metric ranges from 0 to 1, not similar to identical. To recognize and segment gestures from a stream of gestures, beginning at the start of the stream or end of the last gesture, they scan a section of the stream varying in size between a minimum and maximum window. For each window size, the similarity to stored isolated gestures is computed, and the window that is most similar to a stored gesture is classified as that gesture. After testing with both CyberGlove and Motion Capture data of both isolated and sequences of gestures, they determined that this metric was much more accurate, especially on data streams, than previous metrics, but took time comparable to the fastest previous metric.

Discussion
I liked that they not only gave overall accuracy comparisons between the three metrics, but also compared accuracy over a wide range of k values. However, while they discuss dead time (no gestures performed) between two gestures, saying that it causes noise, they don't segment the "no gesture" segments out, but instead incorporate it into the later gesture. The windowing procedure could also have flaws related to aggressive recognition (first part of a gesture is similar to another) where the beginning of a gesture could be misclassified, and the remainder lumped into the next sequence.

Reference

Monday, February 4, 2008

A multi-class pattern recognition system for practical finger spelling translation

Summary Hernandez et al create a simple, cheap, accelerometer-based glove to track hand postures and classify gestures using dimensionality reduction and a decision tree. The glove consists of five accelerometers attached to the fingers between the second and third joints. By relating the accelerometers to the pull of gravity the overall position of a finger (or at least that of the segment of the finger). By summing the x-components of each accelerometer and the y-components, they form an global x and y position. The y-position of the index finger is taken as a measure of the

Discussion The amount of data reduction would seem to oversimplify hand posture at least in general. I'm still not convinced that it's adequate to describe the position of the fingers simply by an average or overall curvature and spread when two distinct gestures may differ only in the bend of a single joint. While this hand measurement system seems to work well for signing, it doesn't seem to be useful for general gesturing, since you could in theory have two different gestures with the fingers in the same orientation but different hand positioning. Also, accelerometers tend to be very sensitive to noise, making dynamic, moving gestures difficult.

Reference Hernandez-Rebollar, J. L., R. W. Lindeman, et al. (2002). A multi-class pattern recognition system for practical finger spelling translation. Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on.

Hand tension as gesture segmentation cue

Summary
Rather than just build a gesture recognizer, Harling and Edwards want to create a gesture framework (or interface). They begin by trying to group gestures into broad classes, initially arriving at four groups that contain most gestures: Static posture, static location; Dynamic posture, static location; static posture, dynamic location; and dynamic posture dynamic location. In sequence each class is more complex than the previous and builds upon the less complex classes. Since the first class has been solved adequately through previous works, they focus on the second. The key differentiation between the two classes is the problem of gesture segmentation or separating one posture from the next in the case of this class. They define a "hand tension" metric as a method to segment one posture from the next. When assuming a hand posture, a person must exert effort to maintain that posture rather than return to a natural resting hand position, and between two postures the hand first tends to move toward this rest position. The hand tension metric increases as the hand moves away from the rest position. Gestures can be segmented by finding the minima of hand tension and taking the maximal tension between the minima as the intended postures. They provide two graphs of hand tension during sequences of gestures that suggest that hand tension can segment postures.

Discussion
I like the four gesture classes presented here. It seems to me that most of the gestures that we perform fall the two middle categories, though the most complex is certainly not negligible. The first class SPSL sounds too cumbersome to use (Make the position then hit the recognize key). This paper provides what could be a fairly useful metric for the second class DPSL, though it could use some updating for modern tools that could give more fine-tuned tension readings. We've previously seen an example of the third class SPDL, simply use recognizers from SPSL, and add location tracking. Then the fourth class gets harder, though a complex gesture could possibly be modeled as a sequence of sub-gestures from the 2nd or 3rd class.

Reference
Philip A. Harling and Alistair D. N. Edwards. Hand tension as a gesture segmentation cue. Progress in Gestural Interaction: Proceedings of Gesture Workshop '96, pages 75--87, Springer, Berlin et al., 1997