PhD Thesis: "Motion-Sound Mapping by Demonstration"

My PhD work focuses on developing the approach and computational models for Motion-Sound Mapping by Demonstration. The approach intersects the design principle of mapping through listening and interactive machine learning, to allow users to craft the mapping between motion and sound from movements performed while listening.

I did my PhD in the {Sound Music Movement} Interaction team at Ircam, supervised by Frédéric Bevilacqua and Thierry Artières. My grant for doctoral studies comes from the EDITE doctoral school at Université Pierre et Marie Curie.


Designing the relationship between motion and sound is essential to the creation of interactive systems. This thesis proposes an approach to the design of the mapping between motion and sound called Mapping-by-Demonstration. Mapping-by-Demonstration is a framework for crafting sonic interactions from demonstrations of embodied associations between motion and sound. It draws upon existing literature emphasizing the importance of bodily experience in sound perception and cognition. It uses an interactive machine learning approach to build the mapping iteratively from user demonstrations.

Drawing upon related work in the fields of animation, speech processing and robotics, we propose to fully exploit the generative nature of probabilistic models, from continuous gesture recognition to continuous sound parameter generation. We studied several probabilistic models under the light of continuous interaction. We examined both instantaneous (Gaussian Mixture Model) and temporal models (Hidden Markov Model) for recognition, regression and parameter generation. We adopted an Interactive Machine Learning perspective with a focus on learning sequence models from few examples, and continuously performing recognition and mapping. The models either focus on movement, or integrate a joint representation of motion and sound. In movement models, the system learns the association between the input movement and an output modality that might be gesture labels or movement characteristics. In motion-sound models, we model motion and sound jointly, and the learned mapping directly generates sound parameters from input movements.

We explored a set of applications and experiments relating to real-world problems in movement practice, sonic interaction design, and music. We proposed two approaches to movement analysis based on Hidden Markov Model and Hidden Markov Regression, respectively. We showed, through a use-case in Tai Chi performance, how the models help characterizing movement sequences across trials and performers. We presented two generic systems for movement sonification. The first system allows users to craft hand gesture control strategies for the exploration of sound textures, based on Gaussian Mixture Regression. The second system exploits the temporal modeling of Hidden Markov Regression for associating vocalizations to continuous gestures. Both systems gave birth to interactive installations that we presented to a wide public, and we started investigating their interest to support gesture learning.

Résumé (FR)

Apprentissage des Relations entre Mouvement et Son par Démonstration

Le design du mapping (ou couplage) entre mouvement et son est essentiel à la création de systèmes interactifs sonores et musicaux. Cette thèse propose une approche appelée mapping par démonstration qui permet aux utilisateurs de créer des interactions entre mouvement et son par des exemples de gestes effectués pendant l’écoute. L’approche s’appuie sur des études existantes en perception et cognition sonore, et vise à intégrer de manière plus cohérente la boucle action-perception dans le design d’interaction. Le mapping par démonstration est un cadre conceptuel et technique pour la création d’interactions sonores à partir de démonstrations d’associations entre mouvement et son. L’approche utilise l’apprentissage automatique interactif pour construire le mapping à partir de démonstrations de l’utilisateur.

En s’appuyant sur des travaux récents en animation, en traitement de la parole et en robotique, nous nous proposons d’exploiter la nature générative des modèles probabilistes, de la reconnaissance de geste continue à la génération de paramètres sonores. Nous avons étudié plusieurs modèles probabilistes, à la fois des modèles instantanés (Modèles de Mélanges Gaussiens) et temporels (Modèles de Markov Cachés) pour la reconnaissance, la régression, et la génération de paramètres sonores. Nous avons adopté une perspective d’apprentissage automatique interactif, avec un intérêt particulier pour l’apprentissage à partir d’un nombre restreint d’exemples et l’inférence en temps réel. Les modèles représentent soit uniquement le mouvement, soit intègrent une représentation conjointe des processus gestuels et sonores, et permettent alors de générer les trajectoires de paramètres sonores continûment depuis le mouvement.

Nous avons exploré un ensemble d’applications en pratique du mouvement et danse, en design d’interaction sonore, et en musique. Nous proposons deux approches pour l’analyse du mouvement, basées respectivement sur les modèles de Markov cachés et sur la régression par modèles de Markov. Nous montrons, au travers d’un cas d’étude en Tai Chi, que les modèles permettent de caractériser des séquences de mouvements entre plusieurs performances et différents participants. Nous avons développé deux systèmes génériques pour la sonification du mouvement. Le premier système permet à des utilisateurs novices de personnaliser des stratégies de contrôle gestuel de textures sonores, et se base sur la régression par mélange de Gaussiennes. Le second système permet d’associer des vocalisations à des mouvements continus. Les deux systèmes ont donné lieu à des installations publiques, et nous avons commencé à étudier leur application à la sonification du mouvement pour supporter l’apprentissage moteur.

Supplementary Material

Chapter 4 – Probabilistic Movement Models

4.2 – Designing Sonic Interactions with GMMs

This video presents a system using GMMs for recognizing different modes of “scratching” from a contact microphone. We trained three GMM with recordings of three scratching modes. In Performance, we use the posterior likelihoods of each model to mix the filtering of the input audio by different resonant filters.

This application builds upon research in the ISMM team, notably by Nicolas Rasamimanana and Julien Bloit at Phonotonic, that was extended by Bruno Zamborlin with Mogees.

4.6 – Segment-level Mapping with the HHMM

This video presents an application of the Hierarchical HMM to the control of sound synthesis, as described in section 4.6.3. The video accompanies the article presented as SMC 2012.

Chapter 6 - Probabilistic Models for Parameter Generation

6.3 – HMR for Gesture-based Control of Physical Modeling Sound Synthesis

This video presents a system using HMR for learning the relationship between gestures and trajectoires of input parameters to a physical model. This video accompanies the demonstration presented at ACM Multimedia 2013.

Chapter 8 – Playing Sound Textures

8.3 – Siggraph’14 Installation

Demo Video:

The following recording presents the 8 sound examples used for the the SIGGRAPH’14 installation.

Screenshot of the application used in the installation:

8.4 – Gesture Imitation with Sonification

The following video illustrates the 4 demonstration gestures and sounds to reproduce in the experiment.

Chapter 9 – Motion-Sound Interaction through Vocalization

9.2 – Vocalization System Overview

This demonstration video, supporting our proposal for SIGGRAPH’14 Emerging Technologies, illustrates the system for performing vocalization based on continuous gestures.

This video illustrates Wired Gestures, developed by Greg Beller in the “Synekine” project. More information can be found on Greg Beller’s Website.

9.3 – The Imitation game

These sound examples were recorded during the SIGGRAPH’14 installation “The Imitation game”. We successively report the vocalization used for demonstration, and performed by player 1, and the attempts to reproduce the vocal imitation through gesture interaction by player 2.
Demonstration (Player 1):

Attempts to imitate (Player 2):



A.3 – Towards Continuous Parametric Synthesis

These examples presents the developments of Pablo Arias‘s Master Thesis —Description et synthèse sonore dans le cadre de l’apprentissage mouvement-son par démonstration, that I supervised with Norbert Schnell and Frédéric Bevilacqua.

Granular with Transient Conservation

This video demonstrates gesture-based synthesis of vocalizations with the granular engine with transient conservation. From Pablo Arias.

Hybrid Synthesis

This video demonstrates gesture-based synthesis of vocalizations with the four proposed approaches. From Pablo Arias.