In this post, I am going to explain my work for gender identification and speaker recognition:

Toolkits used:

Librosa: A Python package for Music and Audio Analysis

SciKit-Learn: Machine Learing in Python

I use librosa to load audio files and extract features from audio signals. I choose it for now because it is a light-weight open source library with nice Python interface and IPython functionalities, it can also be integrated with SciKit-Learn to form a feature extraction pipeline for machine learning. This is enough for moderately complex tasks such as speaker recognition.

SciKit-Learn is used for training a UBM/GMM on MFCC features.

Data preprocessing:

I trained a UBM with 32 Gaussian components on a dataset of standardised MFCC vectors extracted from speech signals by multiple female and male speakers.

For every standardised MFCC vector, it's probability in each Gaussian component is evaluated and put together as a feature vector for conceptor classifications. The reason for this it to refine the subspace using Gaussian components and the probabilities are in the range of [0, 1] already, there is no need for normalisation. These data are then fed into the generic Conceptor recognition framework.

This method makes decision for every MFCC vector (one every 512 ms), and one example results (of Gender Detection on a short female male conversation audio, 0 indicates female, 1 indicates male) look like this:

, which is noisy and will not be very useful in practice as we usually want to have recognition decisions for longer periods (multiple seconds) and with less noises. A simple frequency count does not solve this problem, since the noisy decisions will very often overwhelm the right decisions. One way to cope with this is to use mid-term statistics(mean, std, median, min, max) on short-term features, as I did in the demo code for Gender Identification I submitted before. This, although works, is not the best way since many information are lost during the statistics computations.

In the next step, I will try recognitions on spectrogram segments and use convolutional neural networks (CNN) to extract features from these segments and feed them to Conceptors.

To catch up with my planned schedule and provide a working solution for now, I implemented a GMM Speaker Recognition system with state-of-the-art performance. This system consists of the following part:

silence.py

An energy-based voice activity detection and silence remove function.

skgmm.py

A set of GMMs given by SciKit-Learn.

GmmSpeakerRec.py

A speaker recognition interface, which includes the following functions:

enroll(): enroll new training data

train(): train a GMM for each class

recognize(): read an audio signal and output the recognition results

dump(): save a trained model

load(): load an existing model

The usage and performance of this interface is demonstrated in the following two Ipython notebooks:

Gender: a gender identifier with about a 5 mins training signal for each gender

Obama: a speaker recogniser with 7 mins training for Obama and 40 secs training for David Simon

## Monday, June 22, 2015

## Sunday, June 14, 2015

### Week 3: Generic Conceptor Framework for Speaker Recognition

Based on the description in "Section 3.12 Example: Dynamical Pattern Recognition" of the tech report a generic frame work for pattern recognition is implemented and added to the Python module.

Edit:

All conceptor-based functions related to recognition tasks are now put into conceptor.recognition of the Python module.

A usage example:

import conceptor.recognition as recog

new_recogniser = recog.Recognizer()

new_recogniser.train(training_data)

results = new_recognizer.predict(test_data)

, where training_data is a list of feature_size * sample_size dimension numpy arrays with each array corresponding to a training dataset from one class; test_data is a feature_size * sample_size dimension numpy array to be recognized; results is a sample_size dimension vector with each element an integer as a class index.

This framework repeats the results shown in the tech report:

http://nbviewer.ipython.org/github/littleowen/Conceptor/blob/master/ClassifyTest.ipynb

A usage example:

import conceptor.recognition as recog

new_recogniser = recog.Recognizer()

new_recogniser.train(training_data)

results = new_recognizer.predict(test_data)

, where training_data is a list of feature_size * sample_size dimension numpy arrays with each array corresponding to a training dataset from one class; test_data is a feature_size * sample_size dimension numpy array to be recognized; results is a sample_size dimension vector with each element an integer as a class index.

This framework repeats the results shown in the tech report:

http://nbviewer.ipython.org/github/littleowen/Conceptor/blob/master/ClassifyTest.ipynb

## Monday, June 1, 2015

### Week 1: Python Module for Conceptors

A python module for conceptor computation is implemented based on section 4 of the technique report and this github repository .

The module is consisted of the following files:

reservoir:

set up the reservoir network, drive the reservoir with dynamic patterns, train output weights to read the original pattern signals, train internal weight to reconstruct the original reservoir dynamics, compute the correlation matrices from reservoir states, compute conceptor matrices from the correlation matrices.

logic:

apply logic operations on conceptors. in particular, AND, OR, NOT, PHI functions from the original MATLAB implementation.

util:

useful utility functions that will be repeatedly used within the module, for example, randomly initialise the weights in a reservoir network.

An IPython notebook script was also written to test the above-mentioned module, the results match with those in the technique report and can be viewed here: http://nbviewer.ipython.org/github/littleowen/Conceptor/blob/master/ConceptorTest.ipynb

The module is consisted of the following files:

reservoir:

set up the reservoir network, drive the reservoir with dynamic patterns, train output weights to read the original pattern signals, train internal weight to reconstruct the original reservoir dynamics, compute the correlation matrices from reservoir states, compute conceptor matrices from the correlation matrices.

logic:

apply logic operations on conceptors. in particular, AND, OR, NOT, PHI functions from the original MATLAB implementation.

util:

useful utility functions that will be repeatedly used within the module, for example, randomly initialise the weights in a reservoir network.

An IPython notebook script was also written to test the above-mentioned module, the results match with those in the technique report and can be viewed here: http://nbviewer.ipython.org/github/littleowen/Conceptor/blob/master/ConceptorTest.ipynb

Subscribe to:
Posts (Atom)