LinearSVC versus SVC in scikit-learn

In competition ‘Quora Insincere Questions Classification’, I want to use simple TF-IDF statistics as a baseline.

The result is not bad:

But after I change LinearSVC to SVC(kernel=’linear’), the program couldn’t work out any result even after 12 hours!
Am I doing anything wrong? In the page of sklearn.svm.LinearSVC, there is a note:

Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

Also in the page of sklearn.svm.SVC, it’s another note:

The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

That’s the answer: LinearSVC is the right choice to process a large number of samples.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.