This work presents the study and implementation of Hierarchical Hidden Markov Models (HHMMs) for real-time gesture segmentation, recognition and following. The model provides a 2-level hierarchical (segmental) representation of gestures that allow for hybrid control of sound synthesis.
Music can be represented with hierarchical time structures. For example, one could intuitively subdivise a musical sequence into motifs, notes, transients, etc. In this work, we aim at integrating such hierarchical time representations in the design of gesture-to-sound mappings for interactive musical systems.
As illustrated on the figure below, our approach is based on multilevel segmentations of gesture and sound, allowing the design of complex relationships spanning on various time scales.
We propose a particular implementation of the approach using hierarchical Hidden Markov Models (Hierarchical HMMs). The model is an extension of traditional HMMs designed to represent multilevel time structures. Here, the model provides a representation of gestures as sequences of segments by the introduction of a high level transition structure. Thus, the model has 2 levels:
- The signal level encodes the fine temporal structure of a short gesture “segment”, using the approach developed for Gesture Follower
- The segment level models the high-level transition structure governing how these segments can be sequenced
We propose a proof-of-concept application to the control of audio processing. The application uses a mapping-by-demonstration paradigm, allowing the user to demonstrate gestures associated with particular sounds in order to train a mapping system. We propose a particular decomposition of gesture as a sequence of 4 segments: PASR for Preparation-Attack-Sustain-Release.