NEWS from IRCAM
Time | Stravinsky Room – Conference room |
---|---|
9am-9:30am | Registration |
9:30am-10am | Gregory BELLER and Paola PALUMBO (IRCAM) Welcome Session |
10am-10:30am | Hugues VINET and Brigitte d’ ANDREA NOVEL (IRCAM) IRCAM research and development news |
10:30am-11am | Axel ROEBEL and Charles PICASSO (IRCAM) News from the Analysis Synthesis Team |
11am-11:30am | Break |
11:30am-12pm | Olivier WARUSFEL, Markus NOISTERNIG and Thibaut CARPENTIER (IRCAM) News from EAC team |
12pm-12:30pm | Thomas HELIE, Robert PIECHAUD, Jean LOCHARD (IRCAM) and Hans Peter STUBBE News from S3AM Team |
12:30pm-1pm | Jérôme NIKA, Jean-Louis GIAVITTO, Philippe ESLING and Gérard ASSAYAG (IRCAM) News from RepMus Team |
1pm-2:30pm | Lunch buffet |
2:30pm-3pm | Fréderic BEVILACQUA, Diemo SCHWARZ, Riccardo BORGHESI and Benjamin MATUSZEWKI (IRCAM) News from ISMM Team |
3pm-4pm | Jean Julien AUCOUTURIER, Marco LIUNI, Pablo ARIAS (IRCAM) News from Cream project TeamConveners: Laura RACHMAN, STMS Lab (IRCAM/CNRS/UPMC UMR 9912), Brain and Spine Institute (CNRS UMR 7225 / UPMC / INSERM U 1127), Paris, France Jean-Julien AUCOUTURIER, STMS Lab (IRCAM/CNRS/UPMC UMR 9912), Paris, France In recent years, the experimental sciences of emotion perception and production have greatly benefited from software tools able to synthesize realistic facial expressions, which can be used as stimuli in experimental paradigms such as reverse-correlation. In the audio modality however, tools to similarly control or synthesize the acoustic characteristics of emotional speech or music typically do not exist. The objective of this symposium is to present four new open-source software tools, developed in the past two years in the context of the ERC CREAM project (“Cracking the Emotional Code of Music”), that attempt to fill this methodological gap.
In more details, the tools presented here are designed as transformation techniques: they do not synthesize artificial sound, but rather work on genuine audio recordings, or sometimes even on real-time audio streams, which they parametrically manipulate to make them sound more or less emotional. Three of the tools (DAVID, ZIGGY, ANGUS) are computational models of a specific vocal behavior, such as the form of the speaker’s mouth conveying the sound of a smile, or the roughness of a voice expressing arousal. . The fourth tool (CLEESE) was developed not to generate emotional speech per se, but rather to generate infinite prosodic variations, which can then be used as stimuli ito uncover people’s mental representations of specific emotional or attitudinal vocal expressions n reverse-correlation paradigms. In the four presentations of this symposium, each of the tools will be presented along with a demonstration of possible applications in experimental research. All the tools presented in this symposium are made available open-source for the community (http://forumnet.ircam.fr/), in the hope that they will foster new ideas and experimental paradigms to study emotion processing in speech and music. Authors: Laura RACHMAN, Marco LIUNI and Jean-Julien AUCOUTURIER, STMS Lab (IRCAM/CNRS/UPMC UMR 9912) and Brain and Spine Institute (CNRS UMR 7225 / UPMC / INSERM U 1127), Paris, France DAVID is a tool developed to apply infra-segmental cues related to emotional expressions, such as pitch inflections, vibrato, and spectral changes, onto any preexisting audio stimuli or direct vocal input through a microphone. Users can control the audio effects in a modular manner to create customized transformations. Three emotion presets (happy, sad, afraid) have been thoroughly validated in English, French, Swedish and Japanese, showing that they are reliably recognized as emotional, and not typically detected as artificially produced1. When applying the emotion effects to real-time speech, the latency of the software is less than 20milliseconds, short enough to leave continuous speech unaffected by any latency effect. This notably makes the tool useful for vocal feedback studies2 and investigations of emotional speech in interpersonal communication. DAVID can be controlled through a graphical user interface, which is practical for exploring different combinations of the audio effects, as well as piloted in experimental software via a Python module pyDAVID. This extension allows for trial-by-trial control of for example the onset or the intensity of the emotion effects. Finally, time stamps can be stored with pyDAVID, making the tool not only appropriate for various behavioral paradigms, but also ideally suited to use in conjunction with neurophysiological recordings, such as electroencephalography (EEG).
1. Rachman, L., Liuni, M., Arias, P., Lind, A., Johansson, P., Hall, L., Richardson, D., Watanabe, K., Dubal, S. and Aucouturier, J.J. (2017) DAVID: An open-source platform for real-time transformation of infra-segmental emotional cues in running speech. Behaviour Research Methods. doi: 10.3758/s13428-017-0873-y 2. Aucouturier, J.J., Johansson, P., Hall, L., Segnini, R., Mercadié, L. & Watanabe, K. (2016) Covert Digital Manipulation of Vocal Emotion Alter Speakers’ Emotional State in a Congruent Direction. Proceedings of the National Academy of Sciences, vol. 113 no. 4, doi: 10.1073/pnas.1506552113 Authors: Pablo Arias and Jean-Julien Aucouturier, STMS Lab (IRCAM/CNRS/UPMC UMR 9912), Paris, France ZYGi is a digital audio processing algorithm designed to model the acoustic consequences of smiling —Facial Action Unit 12— in speech. The algorithm is able to simulate the subtle acoustic consequences of zygomatic contraction in the voice while leaving other linguistic and paralinguistic dimensions, such as semantic content and prosodic features, unchanged. The algorithm, which is based on a phase vocoder technique, uses spectral transformations —frequency warping and dynamic spectral filtering—to implement the formant movements and high-frequency enhancements that characterize smiled speech. Concretely, the algorithm can either shift the first formants of the voice towards the high frequencies to give the impression of a smile during the production, or shift them towards the low frequencies, giving the impression of a closed/round mouth. In a series of recent studies, we showed that such manipulated acoustic cues are not only recognized as smiled and as more positive, but that they can also trigger unconscious facial imitation1. ZYGi exists as a Python wrapper around IRCAM SuperVP voice transformation software and is open to the research community.
1. Arias, P., Belin, P., & Aucouturier, J.-J. (2017). Auditory smiles trigger unconscious facial imitation, in review. Authors: Emmanuel Ponsot, Laboratoire des Systèmes Perceptifs (CNRS UMR 8248) and Département d’études cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France ; Juan-Jose Burred, Independent Researcher, Paris, France ; Jean-Julien Aucouturier, STMS Lab (IRCAM/CNRS/UPMC UMR 9912), Paris, France CLEESE (Combinatorial Expressive Speech Engine) is a tool designed to generate an infinite number of natural-sounding, expressive variations around any speech recording. It consists of a voice-processing algorithm based on the phase vocoder architecture. It operates by generating a set of breakpoints in a given recording (ex. at every 100ms in the file), and applying a different audio transformation to every segment. Doing so, it allows to modify the temporal dynamics of any arbitrary recorded voice’s original contour of pitch, loudness, timbre (spectral envelopes) and speed (i.e. roughly defined, its prosody), in a way that is both fully parametric and realistic. Notably, it can be used to generate thousands of novel, natural-sounding variants of the same word utterance, each with randomly manipulated relevant dimensions. Such stimuli can then be used to access humans’ high-level representations of speech (e.g., emotional or social traits) using psychophysical reverse-correlation methods. By providing a computational account of such high-level auditory “filtering”, we believe this tool will open a vast range of experimental possibilities for future research seeking to decipher the acoustical bases of human social and emotional communication1, hopefully as successfully as it has been in vision science from analogous tools2. CLEESE is available open-source as both a Matlab and Python toolbox
1. Ponsot, E., Burred, JJ., Belin, P. & Aucouturier, JJ. (2017) Cracking the social code of speech prosody using reverse correlation. In Review. 2. Yu, H., Garrod, O. G., & Schyns, P. G. (2012). Perception-driven facial expression synthesis. Computers & Graphics, 36(3), 152-162. Authors: Marco Liuni, Luc Ardaillon, and Jean-Julien Aucouturier, STMS Lab (IRCAM/CNRS/UPMC UMR 9912), Paris, France ANGUS is a software tool for high quality transformation of natural voice with parametrical control of roughness. Recent psychophysical and imaging studies suggest that rough sounds, characterized by specific spectro-temporal modulations, target neural circuits involved in fear/danger processing; the brain extracts such features from human voices to infer socio-emotional traits of their speakers1. Our software aims at the design of reproducible psychophysical experiments imposing a parametrical screaminspired effect on natural sounds, with the aim of investigating the emotional response to this sound feature. Analysing and synthesizing rough vocals is challenging, as roughness is generated by highly unstable modes in the vocal fold and tract: compared to standard production, rough vocals present additional sub-harmonics as well as nonlinear components. Our approach is based on multiple amplitude modulations of the incoming sound, that are automatically adapted to the sound’s fundamental frequency, which leadto a realistic, but also highly efficient, parametric effect well-suited for realtime applications.
1. Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A. L., & Poeppel, D. (2015). Human screams occupy a privileged niche in the communication soundscape. Current Biology, 25(15), 2051-2056. |
4pm-4:30pm | Break |
4:30pm-5:30pm | Marta GENTILUCCI with Jérome NIKA, Axel ROEBEL and Marco LIUNI End of artistic research residency with demo : Female singing voice’s vibrato and tremolo : Analysis, mapping and improvisation |
5:30pm-6:00pm | Rama GOTTFRIED and RepMus Team (IRCAM) Introducing Symbolist, a graphic notation environment for music and multimedia, developed by Rama Gottfried and Jean Bresson (IRCAM – Musical Representations) as part of Rama’s 2017-18 IRCAM-ZKM Musical Research Residency. Symbolist was designed to be flexible in purpose and function, capable of controlling computer rendering process such as spatial movement, and an open workspace for developing symbolic representations for performance with new gestural interfaces. The system is based on an Open Sound Control (OSC) encoding of symbols representing multi-rate and multidimensional control data, which can be streamed as control messages to audio processing, or any kind of media rendering system that speaks OSC. Symbols can be designed and composed graphically, and brought in relationship with other symbols. The environment provides tools for creating symbol groups and stave references, by which symbols maybe timed and used to constitute a structured and executable multimedia score.
|
2:30pm-3:30pm | Shannon Room – Classroom
Thibaut CARPENTIER and Eric DAUBRESSE (IRCAM) |
6:30pm-7:30pm | ANNOUNCEMENT OF THE Laureates of the Artistic Research Residency Program 2018-2019
Drinks under the glass roof |
8:30pm-10:00pm | IRCAM LIVE CONCERT
Centre Pompidou, Grande Salle |