Check out the beta version of the new IRCAM Forum ⇢

ProductView more


The Ministry of Silly Talks

Infinite numbers of prosodic variations with CLEESE.

CLEESE (“Ministry of Silly Speech”) is a Python toolbox for performing random or deterministic pitch, timescale, filtering and gain transformations on an input sound. It is originally aimed at generating many random variations of a single speech utterance, to be used as stimuli in the scope of listening tests for reverse correlation experiments. The modifications can be both static or time-varying. Besides its original purpose, the toolbox can also be used for producing individual, user-determined modifications.

CLEESE operates by generating a set of random breakpoint functions (BPFs) in the appropriate format for each treatment, which are then passed to the included spectral processing engine (based on a Phase Vocoder) with the corresponding parameters. Alternatively, the BPFs can be externally created by the user, and so it can also be used as a Phase Vocoder-based effects unit.

CLEESE is a free, standalone Python module, distributed under an open-source MIT Licence on the IRCAM Forumnet plateform. It was designed by Juan José Burred, Emmanuel Ponsot and Jean-Julien Aucouturier (STMS, IRCAM/CNRS/Sorbonne Université, Paris), with collaboration from Pascal Belin (Institut des Neurosciences de la Timone, Aix-Marseille Université), and with generous funding from the European Research Council (CREAM 335536, 2014-2019, PI: JJ Aucouturier). The current version of the toolbox has been developed and tested on Python 2.7.13. It requires Numpy and Scipy.

Sound examples

Random pitch variations around the same recording (French sentence: “Je suis en route pour la réunion” – I’m on my way to the meeting).

Same recording, with random speed variations around the original speed contour:

All this is obviously language-independent. `We’ll stop in a couple of minutes’, in Japanese, with random pitch:


1. Python / Jupyter Notebook

CLEESE being a Python package, you will to first have a working installation of Python (2.7). In addition, to run the included tutorial, you will need Jupyter Notebook as well as a number of commonly used packages for scientific computing. For new users, we highly recommend installing Anaconda. Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.

Use the following installation steps:

  • Download Anaconda. We recommend downloading Anaconda’s Python 2.7 version.
  • Install the version of Anaconda which you downloaded, following the instructions on the download page.


  • Download the CLEESE folder from the download link in this page, and unzip it in your computer.
  • Launch Jupyter notebook from your shell/command line jupyter notebook and navigate to the tutorial.ipynb file


CLEESE can be used in several different modes, depending on how the main processing function is called. Examples
of several typical usage scenarios are included in the example script A Jupyter notebook tutorial with basic usage scenario is also available in the project folder (see also here for a quick peak).

In batch mode, CLEESE generates many random modifications from a single input sound file, called the base sound.
It can be launched as follows:

Two parameters have to be set by the user:

  • inputFile: the path to the base sound, which has to be a mono sound in WAV format.
  • configFile: the path to the configuration script

All the generation parameters for all treatments are set up in a configuration script that has to be edited or created by the user. An example of configuration script with parameters for all treatments is included with the toolbox:

For each run in batch mode, the toolbox generates the following folder structure, where <outPath> is specified in
the parameter file:

  • <outPath>/<currentExperimentFolder>: main folder for the current generation experiment. The name
    <currentExperimentFolder> is automatically created from the current date and time. This folder contains:

    • <baseSound>.wav: a copy of the base sound used for the current experiment
    • *.py: a copy of the configuration script used for the current experiment
    • One subfolder for each one of the performed treatments, which can be either pitch, eq, stretch or gain.
      Each of them contains, for each generated stimulus:

      •  <baseSound>.xxxx.<treatment>.wav: the generated stimulus, where xxxx is a running number (e.g.:
      • <baseSound>.xxxx.<treatment>BPF.txt: the generated BPF, in ASCII format, for the generated
        stimulus (e.g.: cage.0001.stretchBPF.txt)

Available manipulations

1. Time stretching (stretch)

This manipulation stretches or compresses locally the sound file according to the current stretching factor (oscillating around 1) at the current timestamp. This is the only treatment that changes the duration of the output compared to the base sound. The algorithm used is a phase vocoder with phase locking based on frame-wise peak picking.

2. Pitch shifting (pitch)

The BPF is used to transpose up and down the pitch of the sound. The algorithm used is a phase vocoder with phase-locking based on frame-wise peak picking, followed by resampling on a window-by-window basis.

3. Time-varying equalization (eq)

This manipulation divides the spectrum into a set of frequency bands, and applies random amplitudes to the bands. The definition of band edges is constant, the amplitudes can be time-varying. The corresponding BPF is thus two-dimensional. There are two possible ways to define the band division:

  • Linear division into a given number of bands between 0 Hz and Nyquist.
  • Division according to a mel scale into a given number of bands. Note that it it possible to specify any number of filters (less or more than the traditional 40 filters for mel cepstra)

4. Time-varying gain (gain)

For gain or level randomization, the BPF is interpolated and interpreted as an amplitude modulator. Note that the corresponding standard deviation is specified in base-10 logarithm. If the resulting output exceeds the maximum float amplitude of 1.0, the whole output signal is normalized.

The Science behind CLEESE

Examples of reverse correlation experiments using CLEESE :

  • Ponsot, E., Arias, P. & Aucouturier, JJ. (2018). Uncovering mental representations of smiled speech using reverse correlation. J. Acoust. Soc. Am. 143 (1). [html] [pdf]
  • Ponsot, E., Burred, JJ., Belin, P. & Aucouturier, JJ. (2018) Cracking the social code of speech prosody using reverse correlation. Proceedings of the National Academy of Sciences  [html] [pdf]