Monday, February 4, 2008

Hand tension as gesture segmentation cue

Summary
Rather than just build a gesture recognizer, Harling and Edwards want to create a gesture framework (or interface). They begin by trying to group gestures into broad classes, initially arriving at four groups that contain most gestures: Static posture, static location; Dynamic posture, static location; static posture, dynamic location; and dynamic posture dynamic location. In sequence each class is more complex than the previous and builds upon the less complex classes. Since the first class has been solved adequately through previous works, they focus on the second. The key differentiation between the two classes is the problem of gesture segmentation or separating one posture from the next in the case of this class. They define a "hand tension" metric as a method to segment one posture from the next. When assuming a hand posture, a person must exert effort to maintain that posture rather than return to a natural resting hand position, and between two postures the hand first tends to move toward this rest position. The hand tension metric increases as the hand moves away from the rest position. Gestures can be segmented by finding the minima of hand tension and taking the maximal tension between the minima as the intended postures. They provide two graphs of hand tension during sequences of gestures that suggest that hand tension can segment postures.

Discussion
I like the four gesture classes presented here. It seems to me that most of the gestures that we perform fall the two middle categories, though the most complex is certainly not negligible. The first class SPSL sounds too cumbersome to use (Make the position then hit the recognize key). This paper provides what could be a fairly useful metric for the second class DPSL, though it could use some updating for modern tools that could give more fine-tuned tension readings. We've previously seen an example of the third class SPDL, simply use recognizers from SPSL, and add location tracking. Then the fourth class gets harder, though a complex gesture could possibly be modeled as a sequence of sub-gestures from the 2nd or 3rd class.

Reference
Philip A. Harling and Alistair D. N. Edwards. Hand tension as a gesture segmentation cue. Progress in Gestural Interaction: Proceedings of Gesture Workshop '96, pages 75--87, Springer, Berlin et al., 1997

2 comments:

Paul Taele said...

That's actually a very interesting solution you propose, modeling the most difficult fourth class from sub-gestures derived from the second and third class. I myself found the partitioning of gestures into those four classes as making sense. Attacking the fourth class head-on would be a very difficult task, but I think some progress can be made in the idea you brought up.

Test said...

I felt like the division made was mostly a division of hardware. The posture carried the information from the CyberGlove, which was either moving or not. And the location carried the information from the flock of birds, which was also either static or dynamic. I don't feel that there was much perceptual examination as to whether this was logical, but rather just an outcome of the chosen hardware. I would be interested to hear more about your solution for the 4th class of gestures.