Available modules

The modules are found either in vt_server_modules, or defined individual as Python modules. Each module is called with a keyword that is used for the module field in the query.


channel-patch

“channel-patch” copies the input signal onto various channels. The arguments are:

coefs

An array of coefficients applied to each channel. If the input signal is X, and the coefficients are [a1, a2], the output will be [a1⋅X, a2⋅X]. In a two-channel (stereo) file, the first channel is the left channel, and the second channel is the right channel.


gibberish

This module contains a function to create a gibberish masker, created out of random sentence chunks, that can be used in the CRM experiment.

{
    "module": "gibberish",
    "seed":   8,
    "files": ["sp1F/cat_8_red.wav", "sp1F/cat_9_black.wav", "..."],
    "chunk_dur_min": 0.2,
    "chunk_dur_max": 0.7,
    "total_dur": 1.2,
    "prevent_chunk_overlap": true,
    "ramp": 0.05,
    "force_nb_channels": 1,
    "force_fs": 44100,
    "stack": [
        {
            "module": "world",
            "f0":     "*2",
            "vtl":    "-3.8st"
        }
    ]
}

This module is intended to be used at the top of the stack

If the source files have different sampling frequencies, the sampling frequency of the first chunk will be used as reference, and all the following segments will be resampled to that sampling frequency. Alternatively, it is possible to specify force_fs to impose a sampling frequency. If force_fs is 0 or None, the default method is used.

A similar mechanism is used for stereo vs. mono files. The number of channels can be imposed with force_nb_channels. Again, if force_nb_channels is 0 or None, the default method based on the first chunk is used. If the number of channels of a segment is greater than the number of channels in the output, all channels are averaged and duplicated to the appropriate number of channels. This is fine for stereo/mono conversion, but keep that in mind if you ever use files with more channels. If a segment has fewer channels than needed, the extra channels are created by recycling the existing values. Again, for stero/mono conversion, this is fine, but might not be what you want for multi-channel audio.

Files

The module will look through the provided files to generate the output. As much as possible, it will try to not reuse a file, but will recycle the list if necessary.

If the module is first in the stack, the filenames provided in files (or shell_pattern, or re_pattern) are relative to the folder specified in the file field of the query. Make sure that the folder name ends with a /.

However, note that if the module is not used at the top of the stack, but lower, there may be unexpected results as the folder will be the cache folder of the previous module.

The list will be shuffled randomly based on the seed parameter.

Instead of files, we can have shell_pattern which defines a shell-like patterns as an object:

{
    "module": "gibberish",
    "seed":   8,
    "shell_pattern": {
        "include": "sp1F/cat*.wav",
        "exclude": ["sp1F/cat_8_*.wav", "sp1F/cat_*_red.wav"]
    },
    "...": "..."
}

If a list of patterns is provided, the outcome is cumulative.

Alternatively, a regular expression can be used as re_pattern.

If all files, and shell_pattern and/or re_pattern are provided, only one is used by prioritising in the order they are presented here.

Segment properties

chunk_dur_min and chunk_dur_max define the minimum and maximum segment duration. total_dur is the total duration we are aiming to generate. ramp defines the duration of the ramps applied to each segment. ~~prevent_chunk_overlap`` defines whether the algorithm tries to select intervals that do not overlap (default is true). This is only relevant if all the sound files have a similar structure (like in the CRM).

Stack

stack is an optional processing stack that will be applied to all the selected files before concatenation.

Seed

The seed parameter is mandatory to make sure cache is managed properly.


mixin

“mixin” adds another sound file (B) to the input file (A). The arguments are:

file

The file that needs to be added to the input file.

levels =[0,0]

A 2-element array containing the gains in dB applied to the A and B.

pad =[0,0,0,0]

A 4-element array that specifies the before and after padding of A and B (in seconds): [A.before, A.after, B.before, B.after]. Note that this could also be done with sub-queries, but doing it here will reduce the number of cache files generated.

align =’left’

‘left’, ‘center’, or ‘right’. When the two sounds files are not the same length, the shorter one will be padded so as to be aligned as described with the other one. This is applied after padding.

If the two sound files are not of the same sampling frequency, they are resampled to the max of the two.

If the two sound files are not the same shape (number of channels), the one with fewer channels is duplicated to have the same number of channels as the one with the most.


pad

“pad” adds silence before and/or after the sound. It takes before and/or after as arguments, specifying the duration of silence in seconds.


ramp

“ramp” smoothes the onset and/or offset of a signal by applying a ramp. The parameters are:

duration

In seconds. If a single number, it is applied to both onset and offset. If a vector is given, then it specifies [onset, offset]. A value of zero means no ramp.

shape

Either ‘linear’ (default) or ‘cosine’.


slice

“slice” selects a portion of a sound. It takes the following arguments:

start

The onset point, in seconds. [0 if omitted.]

end

The offset point, in seconds. Negative number are counted from the end. Values exceding the length of the file will lead to zero padding. [The end of the sound if omitted.]

If the start time is larger than the end time, an error is raised.


time-reverse

“time-reverse” flips temporally the input. It doesn’t take any argument.


vocoder

This module defines the world processor based on vocoder, a MATLAB vocoder designed to be highly programmable.

Here is and example of module instructions:

{
    "module": "vocoder",
    "fs": 44100,
    "analysis_filters": {
        "f": { "fmin": 100, "fmax": 8000, "n": 8, "scale": "greenwood" },
        "method": { "family": "butterworth", "order": 3, "zero-phase": true }
        },
    "synthesis_filters": "analysis_filters",
    "envelope": {
        "method": "low-pass",
        "rectify": "half-wave",
        "order": 2,
        "fc": 160,
        "modifiers": "spread"
        },
    "synthesis": {
        "carrier": "sin",
        "filter_before": false,
        "filter_after": true
        }
}

The fs attribute is optional but can be used to speed up processing. The filter definitions that are generated depend on the sampling frequency, so the it has to be known to generate the filters. If the argument is not passed, it will be read from the file that needs processing. Passing the sampling frequency as an attribute will speed things up as we don’t need to open the sound file to check its sampling rate. However, beware that if the fs does not match that of the file, you will get an error.

The other attributes are as follows:

analysis_filters

analysis_filters is a dictionary defining the filterbank used to analyse the input signal. It defines both the cutoff frequencies f and the filtering method.

f Filterbank frequencies

These can either be specified as an array of values, using a predefined setting, or by using a regular method.

If f is a numerical array, the values are used as frequencies in Hertz.

If f is a string, it refers to a predefined setting. The predefined values are: ci24 and hr90k refering to the default map of cochlear implant manufacturers Cochlear and Advanced Bionics, respectively.

Otherwise f is a dictionary with the following items:

fmin

The starting frequency of the filterbank.

fmax

The end frequency of the filterbank.

n

The number of channels.

scale

[optional] The scale on which the frequencies are divided into channels. Default is log. Possible values are greenwood, log and linear.

shift

[optional] A shift in millimiters, towards the base. Note that the shift is applied after all other calculations so the fmin and fmax boundaries will not be respected anymore.

Filtering method

A dictionary with the following elements:

family

The type of filter. At the moment only butterworth is implemented.

For butterworth, the following parameters have to be provided:

order

The actual order of the filter. Watch out, that this is the order that is actually achieved. Choosing true for zero-phase means only even numbers can be provided.

zero-phase

Whether a zero-phase filter is being used. If true, then filtfilt() is used instead of filt().

Unlike in the MATLAB version, this is implemented with second-order section filters (sosfiltfilt() and sosfilt()).

synthesis_filters

It can be the string “analysis_filters” to make them identical to the analysis filters. This is also what happens if the element is omitted or null.

Otherwise it can be a dictionary similar to analysis_filters. The number of channels has to be the same. If it differs, an error will be returned.

envelope

That specifies how the envelope is extracted.

method

Can be low-pass or hilbert.

For low-pass, the envelope is extracted with rectification and low-pass filtering. The following parameters are required:

rectify

The wave rectification method: half-wave or full-wave.

order

The order of the filter used for envelope extraction. Again, this is the effective order, so only even numbered are accepted because the envelope is extracted with a zero-phase filter.

fc

The cutoff of the envelope extraction in Hertz. Can be a single value or a value per band. If fewer values than bands are provided, the array is recycled as necessary.

modifiers

[optional] A (list of) modifier function names that can be applied to envelope matrix. At the moment, only “spread” is implemented. With this modifier, the synthesis filters are used to simulate a spread of excitation on the envelope levels themselves. This is useful when the carrier is a sinewave (see Crew et al., 2012, JASA).

synthesis

The synthesis field describes how the resynthesis should be performed.

carrier

Can be noise or sin (low-noise and pshc are not implemented).

filter_before

If true, the carrier is filtered before multiplication with the envelope (default is false).

filter_after

If true, the modulated carrier is refiltered in the band to suppress sidebands (default is true). Keep in mind that if you filter broadband carriers both before and after modulation you may alter the spectral shape of your signal.

random_seed

[optional] For noise carriers only.

If the carrier is noise, then a random seed can be provided in random_seed to have frozen noise. If not the random number generator will be initialized with the current clock. Note that for multi-channel audio files, the seed is used for each channel. If no seed is given, the various bands will have different noises as carriers. To have correlated noise across bands, pass in a (random) seed. Also note that the cache system also means that once an output file is generated, it will be served as is rather than re-generated. To generate truely random files, provide a random seed on each request.

If the carrier is sin, the center frequency of each band will be determined based on the scale that is used. If cutoffs are manually provided, the geometric mean is used as center frequency.


world

This module defines the world processor based on pyworld, a module wrapping Morise’s WORLD vocoder.

Here are some examples of module instructions:

{
    "module": "world",
    "f0":     "*2",
    "vtl":    "-3.8st"
}

If a key is missing (here, duration) it is considered as None, which means this part is left unchanged.

f0 can take the following forms:

  • * followed by a number, in which case it is multiplicating ratio applied to the whole f0 contour. For instance *2.

  • a positive or negative number followed by a unit (Hz or st). This will behave like an offset, adding so many Hertz or so many semitones to the f0 contour.

  • ~ followed by a number, followed by a unit (only Hz). This will set the average f0 to the defined value.

vtl is defined similarly:

  • * represents a multiplier for the vocal-tract length. Beware, this is not a multiplier for the spectral envelope, but its inverse.

  • offsets are defined using the unit st only.

duration:

  • the * multiplier can also be used.

  • an offset can be defined in seconds (using unit s).

  • the absolute duration can be set using ~ followed by a value and the s unit.

Note that in v0.2.8, WORLD is making the sounds 1 frame (5 ms) too long if no duration is specified. If you specify the duration, it is generated accurately.