- Conceptor Python Module
- Speaker Recognition
- Gender Identification
- Emotion Detection
- Tone Characterisation
- Speaker Recognition Program for RedHen Pipeline
All source code can be found in my github repository.
Conceptor Python Module
Theory
A long detailed documentation of the conceptor theory can be found in this technique report by Prof. Herbert Jäger. Basic computations are based on Section 4 and recognition functions are based on Section 3.12.
Implementation
You can find the module in the folder called conceptor. Detailed documentation of each file can be found from my previous posts: basic module and recognition.
Test
This module is tested in the following ipython notebooks: basic computations, classification. To run the classification notebook, please download the training and testing data used: ae.train ae.test.
Speaker Recognition
Theory
Gaussian
Mixture Models, some classic papers can be found here:
http://web.cs.swarthmore.edu/~turnbull/cs97/f09/paper/reynolds00.pdf
http://www.cs.toronto.edu/~frank/csc401/readings/ReynoldsRose.pdf
Implementation
silence.py
An energy-based voice activity detection and silence remove function.
skgmm.py
A set of GMMs given by SciKit-Learn.
GmmSpeakerRec.py
A speaker recognition interface, which includes the following functions:
enroll(): enroll new training data
train(): train a GMM for each class
recognize(): read an audio signal and output the recognition results
dump(): save a trained model
load(): load an existing model
An energy-based voice activity detection and silence remove function.
skgmm.py
A set of GMMs given by SciKit-Learn.
GmmSpeakerRec.py
A speaker recognition interface, which includes the following functions:
enroll(): enroll new training data
train(): train a GMM for each class
recognize(): read an audio signal and output the recognition results
dump(): save a trained model
load(): load an existing model
Test
The usage and performance of this interface is demonstrated in the following Ipython notebooks:
Obama: a speaker recogniser with 7 mins training for Obama and 40 secs training for David Simon
Gender Identification
Theory
Exactly the same with Speaker Recognition, except that change the training voice to a concatenation of many voices of the same gender.
Implementation
See the implementation part of "Speaker Recognition".
Test
The usage and performance of this interface is demonstrated in the following Ipython notebooks:
Gender: a gender identifier with about a 5 mins training signal for each gender
Emotion Detection
Theory
This method proposed by Microsoft Research last year in the Interspeech conference, an approach using a deep neural network(DNN) and extreme learning machine(ELM).
Implementation
For details, please refer to my last post.
energy.py
Takes a speech signal and returns the indices of frames with top 10% energy.
Takes a speech signal and returns the indices of frames with top 10% energy.
Given two audio folders (training and validation, see "folder structure" for the structures of these folders), extracts the segment-level features from audio files in these folders for DNN training.
Given one (testing) audio folder, extracts the segment-level features from audio files in the folder for DNN feature extraction.
Train ELM with the probability features extracted by DNN.
Annotate the recognition results of the test files into Results.txt
Test
the recognition results on one section of the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database from here.
Tone Characterisation
The same with Emotion detection, except that the training data for each class should be a collection of utterances with the same tone.
Speaker Recognition Program for RedHen Pipeline
The pipeline version has the following updated features:
- Shifted from Python3 to Python2
- Replaced GMM from Sklearn by GMM from PyCASP, so that the training is much faster!
- Added functions to recognize features directly, so that it is ready for the shared features from the pipeline.
- Returns the log likelihood of each prediction so that one can make rejections on untrained classes and filter out unreliable prediction results. You can also use it to search for speakers, by looking for predicted speakers with high likelihood.
- Karan's speaker diarization results are now incorporated.
- Output file has a format consistent with other RedHen output files.
Implementation
Python Speaker Identification Module written for the RedHen Audio Analysis Pipeline
Pipeline program that takes use of the speaker ID module and speaker diarisation results to output .spk file that has consistent format with other RedHen output files.
Test
here you can find an example output file produced on 2015-08-07_0050_US_FOX-News_US_Presidential_Politics
here you can find an example output file produced on 2015-08-07_0050_US_FOX-News_US_Presidential_Politics