Sunday, August 30, 2015


This post is a summary of my work for GSoC 2015, which includes the following subtasks:

  1. Conceptor Python Module 
  2. Speaker Recognition
  3. Gender Identification
  4. Emotion Detection
  5. Tone Characterisation
  6. Speaker Recognition Program for RedHen Pipeline

All source code can be found in my github repository

Conceptor Python Module 


A long detailed documentation of the conceptor theory can be found in this technique report by Prof. Herbert Jäger. Basic computations are based on Section 4 and recognition functions are based on Section 3.12.


You can find the module in the folder called conceptor. Detailed documentation of each file can be found from my previous posts:  basic module and recognition.


This module is tested in the following ipython notebooks: basic computationsclassification. To run the classification notebook, please download the training and testing data used: ae.train ae.test.

Speaker Recognition


Gaussian Mixture Models, some classic papers can be found here:

An energy-based voice activity detection and silence remove function.
A set of GMMs given by SciKit-Learn.
A speaker recognition interface, which includes the following functions:
enroll(): enroll new training data
train(): train a GMM for each class
recognize(): read an audio signal and output the recognition results
dump(): save a trained model
load(): load an existing model


The usage and performance of this interface is demonstrated in the following Ipython notebooks:
Obama: a speaker recogniser with 7 mins training for Obama and 40 secs training for David Simon

Gender Identification


Exactly the same with Speaker Recognition, except that change the training voice to a concatenation of many voices of the same gender.


See the implementation part of "Speaker Recognition".


The usage and performance of this interface is demonstrated in the following Ipython notebooks:
Gender: a gender identifier with about a 5 mins training signal for each gender

Emotion Detection


This method proposed by Microsoft Research last year in the Interspeech conference, an approach using a deep neural network(DNN) and extreme learning machine(ELM).


For details, please refer to my last post.
Takes a speech signal and returns the indices of frames with top 10% energy.

Given two audio folders (training and validation, see "folder structure" for the structures of these folders), extracts the segment-level features from audio files in these folders for DNN training.

Given one (testing) audio folder, extracts the segment-level features from audio files in the folder for DNN feature extraction.

Train ELM with the probability features extracted by DNN.

Annotate the recognition results of the test files into Results.txt


the recognition results on one section of the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database from here.

Tone Characterisation

The same with Emotion detection,  except that the training data for each class should be a collection of utterances with the same tone.

Speaker Recognition Program for RedHen Pipeline

The pipeline version has the following updated features:

  • Shifted from Python3 to Python2
  • Replaced GMM from Sklearn by GMM from PyCASP, so that the training is much faster!
  • Added functions to recognize features directly, so that it is ready for the shared features from the pipeline.
  • Returns the log likelihood of each prediction so that one can make rejections on untrained classes and filter out unreliable prediction results. You can also use it to search for speakers, by looking for predicted speakers with high likelihood.
  • Karan's speaker diarization results are now incorporated.
  • Output file has a format consistent with other RedHen output files.


Python Speaker Identification Module written for the RedHen Audio Analysis Pipeline

Pipeline program that takes use of the speaker ID module and speaker diarisation results to output .spk file that has consistent format with other RedHen output files.


here you can find an example output file produced on 2015-08-07_0050_US_FOX-News_US_Presidential_Politics


  1. Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..
    Python Training in Chennai