MFCC means Mel-frequency cepstral coefficients. It’s a powerful feature representation for sound. Although there is a lot of implementations in different programming language for MFCC, they give sheerly different results for the same audio input.
To solve this problem, I got an open-source implementation of C++ for MFCC and built a Python module for it. By using SWIG, this work became less painful.
The function has sample_rate
and a one-dimension-array as input, a two-dimensions-array as output. So the header file of C++ looks like:
C++<br>
x
4
1
void mfcc(int sample_rate,
2
short* in_array, int size_in,
3
double** out_array, int* dim1, int* dim2
4
);
We also need to use numpy
, so the interface file for SWIG is:
SWIG
1
17
17
1
%module mfcc
2
%{
3
#define SWIG_FILE_WITH_INIT
4
#include "mfcc.hpp"
5
%}
6
%include "numpy.i"
7
%init %{
8
import_array();
9
%}
10
%apply (short* IN_ARRAY1, int DIM1) {(short* in_array, int size_in)}
11
%apply (double** ARGOUTVIEW_ARRAY2, int* DIM1, int* DIM2) {(double** out_array, int* dim1, int* dim2)}
12
%rename (mfcc) my_mfcc;
13
%inline %{
14
void my_mfcc(int sample_rate, short* in_array, int size_in, double** out_array, int* dim1, int* dim2) {
15
mfcc(sample_rate, in_array, size_in, out_array, dim1, dim2);
16
}
17
%}
To use this module, here is an example Python code:
Python
1
6
1
import mfcc
2
import numpy as np
3
from scipy.io import wavfile
4
sr, audio = wavfile.read("mono.wav")
5
output = mfcc.mfcc(sr, audio)
6
print(output.shape, output)
All the code is in my repository.