Integrating Solr and Mahout classifier
Solr has been a revolution in search world with its major implementations. Mahout is an exciting tool for machine learning work. In this article I am going to cover about the integration of Solr and Mahout for classification process.
Classification here is the process of categorizing a content into pre-defined set of categories. Classification process depends on model created from training sets. I will cover about mahout classification in my next blog.
I am going to hook into Solr update process, call the mahout classifier and add the category field based on the result from classifier. So every document indexing will have its category automatically assigned. Add the following configuration to solrconfig.xml.
When you start solr it might take a little more time as the classification model is loaded into memory. Don't worry it is only once loaded and kept in memory, so your classificaiton process will be lightening fast :)
If your model is too big, then you might get Java heap error. In that case you can start solr as,
java -jar -XX:+UseConcMarkSweepGC start.jar