blog-banner

Drupal 7 and SOLR - Auto Completing Full Node Title

  • Autocomplete
  • Drupal
  • SOLR
  • SUGGESTION

KnackForge has done a lot of customizations on Drupal and SOLR integration. Earlier I wrote about Drupal-7-filtering-solr-results which covers limiting SOLR results. This is another work where we had to alter the default apache solr autocomplete module to limit suggestions only to node titles. And another interesting challenge in it was to provide case insensitive search with case-sensitive results and phrase-based search. This means when the node title is "Hello world", searching for 'hello' should return "Hello world" not the two results 'hello' and 'world'.

Autocomplete module:

The apache solr autocomplete module uses keywords that are already broken into words (space delimited). So we had to write our custom solr configurations which store untokenized titlesI am going to detail the whole process here,

Pre-requests:

I assume you already have installed apachesolr, apachesolr_autocomplete with properly set up SOLR.

Solr schema:

In schema.xml, add a new field type under <types> that controls our new field indexing and querying,

<fieldType class="solr.TextField" name="text_auto"> 
  <analyzer>
   <tokenizer class="solr.KeywordTokenizerFactory"/>  
 </analyzer>  
</fieldType>

Remember I am not using any LowerCaseFilters here. KeyWordTokenizerFactory used here just stores the full phrase without delimiting.

Now define a new field,

<field name="label_autocomplete" type="text_auto" indexed="true" stored="true"/>
under <fields> section.
 
Solr config:
 
In solrconfig.xml,
 
1) Add a new search component,
<searchComponent name="suggest" class="solr.SpellCheckComponent"><br>
  <str name="queryAnalyzerFieldType">text_auto</str>
  <lst name="spellchecker">
   <str name="name">suggest</str>
   <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
   <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
   <str name="buildOnOptimize">true</str>
   <str name="buildOnCommit">true</str>
   <str name="field">label_autocomplete</str>
 </lst>
</searchComponent>
2) Add a request handler,
<requestHandler name="suggest" class="solr.SearchHandler">
     <lst name="defaults">
      <str name="spellcheck">true</str>
      <str name="spellcheck.dictionary">suggest</str>
      <str name="spellcheck.count">10</str>
     </lst>
     <arr name="components">
      <str>suggest</str>
     </arr>
</requestHandler>
 
 
Drupal side changes:
 
In your custom module add this hook,
/**
 * Implements hook_apachesolr_index_document_build().
 */
function hook_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {
  if (!empty($entity->title)) {
    //append the lowercased title and normal title to help auto completion
    $label_auto = strtolower($entity->title) . '~~' . $entity->title;
    $document->setMultiValue('label_autocomplete', $label_auto);
  }
  else {
    $document->setMultiValue('label_autocomplete', '');
  }
}
The above code indexes the node title field. The lowercased and original titles are appended together. This is done to return the original title but provide a case-insensitive search. Doing this on the SOLR side needs a custom filter to be written. Now we need to change the original apachesolr_autocomplete.module file, I find no other way at the moment. All we need is to alter only one method, here is the updated method,
 
function apachesolr_autocomplete_suggest($keys, $params, $theme_callback, $orig_keys, $suggestions_to_return = 5) {
  $matches = array();
  $suggestions = array();
  $keys = trim($keys);

  // We need the keys array to make sure we don't suggest words that are already
  // in the search terms.
  $keys_array = explode(' ', $keys);
  $keys_array = array_filter($keys_array);

  // Query Solr for $keys so that suggestions will always return results.
  $query = apachesolr_drupal_query($keys);
  //add our custom request handler
  $query->replaceParam('qt', 'suggest');
  if (!$query) {
    return array();
  }
  $new_params = array('q' => $params['facet.prefix']);
  foreach ($new_params as $param => $paramValue) {
    $query->addParam($param, $paramValue);
  }

  // Query Solr
  $response = $query->search($keys);
  $q = $new_params['q'];
  $suggestions_data = $response->spellcheck->suggestions->{$q};
  foreach ($suggestions_data->suggestion as $term) {
    //split the label into two part and take second part which contains original label
    $labels = explode('~~', $term);
    if ($labels && count($labels) == 2) {
      $term = $labels[1];      
    }
    if (isset($matches[$term])) {
      $matches[$term] += 1;
    }
    else {
      $matches[$term] = 1;
    }
  }

  if (sizeof($matches) > 0) {
    // Eliminate suggestions that are stopwords or are already in the query.
    $matches_clone = $matches;
    $stopwords = apachesolr_autocomplete_get_stopwords();
    foreach ($matches_clone as $term => $count) {
      if ((strlen($term) > 3) && !in_array($term, $stopwords) && !array_search($term, $keys_array)) {
        // Longer strings get higher ratings.
        #$matches_clone[$term] += strlen($term);
      }
      else {
        unset($matches_clone[$term]);
        unset($matches[$term]);
      }
    }

    // The $count in this array is actually a score. We want the highest ones first.
    arsort($matches_clone);

    // Shorten the array to the right ones.
    $matches_clone = array_slice($matches_clone, 0, $suggestions_to_return, TRUE);

    // Build suggestions using returned facets
    foreach ($matches_clone as $match => $count) {
      if ($keys != $match) {
        $suggestion = trim($keys . ' ' . $match);
        // On cases where there are more than 3 keywords, omit displaying
        //  the count because of the mm settings in solrconfig.xml
        if (substr_count($suggestion, ' ') >= 2) {
          $count = 0;
        }
        if ($suggestion != '') {
          // Add * to array element key to force into a string, else PHP will
          // renumber keys that look like numbers on the returned array.
          $suggestions['*' . $suggestion] = theme('apachesolr_autocomplete_highlight', array('keys' => $orig_keys, 'suggestion' => $suggestion, 'count' => $count));
        }
      }
    }
  }

  return array(
    'suggestions' => $suggestions,
    'response' => &$response
  );
}
Now you need to reindex all the data and rebuild the spellchecker. To rebuild the spellchecker,
https://localhost:8983/solr/select?spellcheck.rebuild=true&qt=suggest
To check spell checking is working,
https://localhost:8983/solr/select?q=tes&qt=suggest
This should return some results (replace the query string as per your data). I hope this blog will help some people who are scratching their heads for this full phrase search functionality! Do let me know if you face any issues.
 
Get awesome tech content in your inbox