MFCC means Mel-frequency cepstral coefficients. It’s a powerful feature representation for sound. Although there is a lot of implementations in different programming language for MFCC, they give sheerly different results for the same audio input.

To solve this problem, I got an open-source implementation of C++ for MFCC and built a Python module for it. By using SWIG, this work became less painful.

The function has `sample_rate`

and a one-dimension-array as input, a two-dimensions-array as output. So the header file of C++ looks like:

void mfcc(int sample_rate, short* in_array, int size_in, double** out_array, int* dim1, int* dim2 );

We also need to use `numpy`

, so the interface file for SWIG is:

%module mfcc %{ #define SWIG_FILE_WITH_INIT #include "mfcc.hpp" %} %include "numpy.i" %init %{ import_array(); %} %apply (short* IN_ARRAY1, int DIM1) {(short* in_array, int size_in)} %apply (double** ARGOUTVIEW_ARRAY2, int* DIM1, int* DIM2) {(double** out_array, int* dim1, int* dim2)} %rename (mfcc) my_mfcc; %inline %{ void my_mfcc(int sample_rate, short* in_array, int size_in, double** out_array, int* dim1, int* dim2) { mfcc(sample_rate, in_array, size_in, out_array, dim1, dim2); } %}

To use this module, here is an example Python code:

import mfcc import numpy as np from scipy.io import wavfile sr, audio = wavfile.read("mono.wav") output = mfcc.mfcc(sr, audio) print(output.shape, output)

All the code is in my repository.