Selvam S
January 19, 2013
KnackForge has done a lot of customizations on Drupal and SOLR integration. Earlier I wrote about Drupal-7-filtering-solr-results which covers limiting SOLR results. This is another work where we had to alter the default apache solr autocomplete module to limit suggestions only to node titles. And another interesting challenge in it was to provide case insensitive search with case-sensitive results and phrase-based search. This means when the node title is "Hello world", searching for 'hello' should return "Hello world" not the two results 'hello' and 'world'.
Autocomplete module:
The apache solr autocomplete module uses keywords that are already broken into words (space delimited). So we had to write our custom solr configurations which store untokenized titles. I am going to detail the whole process here,
Pre-requests:
I assume you already have installed apachesolr, apachesolr_autocomplete with properly set up SOLR.
Solr schema:
In schema.xml, add a new field type under <types> that controls our new field indexing and querying,
<fieldType class="solr.TextField" name="text_auto">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
Remember I am not using any LowerCaseFilters here. KeyWordTokenizerFactory used here just stores the full phrase without delimiting.
Now define a new field,
<field name="label_autocomplete" type="text_auto" indexed="true" stored="true"/>
<searchComponent name="suggest" class="solr.SpellCheckComponent"><br>
<str name="queryAnalyzerFieldType">text_auto</str>
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="buildOnOptimize">true</str>
<str name="buildOnCommit">true</str>
<str name="field">label_autocomplete</str>
</lst>
</searchComponent>
<requestHandler name="suggest" class="solr.SearchHandler">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
/**
* Implements hook_apachesolr_index_document_build().
*/
function hook_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {
if (!empty($entity->title)) {
//append the lowercased title and normal title to help auto completion
$label_auto = strtolower($entity->title) . '~~' . $entity->title;
$document->setMultiValue('label_autocomplete', $label_auto);
}
else {
$document->setMultiValue('label_autocomplete', '');
}
}
function apachesolr_autocomplete_suggest($keys, $params, $theme_callback, $orig_keys, $suggestions_to_return = 5) {
$matches = array();
$suggestions = array();
$keys = trim($keys);
// We need the keys array to make sure we don't suggest words that are already
// in the search terms.
$keys_array = explode(' ', $keys);
$keys_array = array_filter($keys_array);
// Query Solr for $keys so that suggestions will always return results.
$query = apachesolr_drupal_query($keys);
//add our custom request handler
$query->replaceParam('qt', 'suggest');
if (!$query) {
return array();
}
$new_params = array('q' => $params['facet.prefix']);
foreach ($new_params as $param => $paramValue) {
$query->addParam($param, $paramValue);
}
// Query Solr
$response = $query->search($keys);
$q = $new_params['q'];
$suggestions_data = $response->spellcheck->suggestions->{$q};
foreach ($suggestions_data->suggestion as $term) {
//split the label into two part and take second part which contains original label
$labels = explode('~~', $term);
if ($labels && count($labels) == 2) {
$term = $labels[1];
}
if (isset($matches[$term])) {
$matches[$term] += 1;
}
else {
$matches[$term] = 1;
}
}
if (sizeof($matches) > 0) {
// Eliminate suggestions that are stopwords or are already in the query.
$matches_clone = $matches;
$stopwords = apachesolr_autocomplete_get_stopwords();
foreach ($matches_clone as $term => $count) {
if ((strlen($term) > 3) && !in_array($term, $stopwords) && !array_search($term, $keys_array)) {
// Longer strings get higher ratings.
#$matches_clone[$term] += strlen($term);
}
else {
unset($matches_clone[$term]);
unset($matches[$term]);
}
}
// The $count in this array is actually a score. We want the highest ones first.
arsort($matches_clone);
// Shorten the array to the right ones.
$matches_clone = array_slice($matches_clone, 0, $suggestions_to_return, TRUE);
// Build suggestions using returned facets
foreach ($matches_clone as $match => $count) {
if ($keys != $match) {
$suggestion = trim($keys . ' ' . $match);
// On cases where there are more than 3 keywords, omit displaying
// the count because of the mm settings in solrconfig.xml
if (substr_count($suggestion, ' ') >= 2) {
$count = 0;
}
if ($suggestion != '') {
// Add * to array element key to force into a string, else PHP will
// renumber keys that look like numbers on the returned array.
$suggestions['*' . $suggestion] = theme('apachesolr_autocomplete_highlight', array('keys' => $orig_keys, 'suggestion' => $suggestion, 'count' => $count));
}
}
}
}
return array(
'suggestions' => $suggestions,
'response' => &$response
);
}
Just like how your fellow techies do.
We'd love to talk about how we can work together
Take control of your AWS cloud costs that enables you to grow!