blog-banner

SOLR 5 Reading Indexed Data at a Lightning Speed

  • Lucene
  • SOLR
  • Solr Cloud
  • SOLR Java

Solr Lucene

 

This blog is about reading SOLR data by directly accessing the Lucene index folder, if you are looking for a normal query method, you should look at a different SOLR tutorial. This blog deals with little low-level details.

Coming to our scenario, we had to fetch & write all unique IDs from SOLR 5.5 running in cloud mode, for running a comparison to find missing IDs. We had a total of 350 million records, we needed to get those 350 million IDs. Normally, you can query SOLR, say from Java, PHP or Python and write it to a file. But since we had a huge set of documents, the query was leading to memory errors even when we had 80GB of RAM.

Thinking of a faster solution, we decided to go low level using Lucene. We used Lucene Lucene-core-5.5.1.jar with a simple Java program that will read the document ids and write them to a file. See the code below,

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.io.FileWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.document.*;
import org.apache.lucene.document.Field.Store;

public class readIdSolr {
 private static void createFile(String file, ArrayList < String > arrData)
 throws IOException {
  FileWriter writer = new FileWriter(file + ".txt");
  int size = arrData.size();
  for (int i = 0; i < size; i++) {
   String str = arrData.get(i).toString();
   writer.write(str);
  }
  writer.close();
 }


 public static void main(String[] args) throws IOException {
  Path path = Paths.get(“index "); //replace with path to the index file
   Directory dirIndex = FSDirectory.open(path); 
   IndexReader indexReader = DirectoryReader.open(dirIndex); 
   String id = "";
   ArrayList < String > docIds = new ArrayList < String > ();
   Document doc = null; 
   System.out.println("In--total--" + indexReader.numDocs()); 
   int cnt = 0;
   for (int i = 0; i < indexReader.numDocs(); i++) {
    cnt += 1;
    doc = indexReader.document(i);
    id = doc.get("id");
    docIds.add(id);
    if (cnt % 10000 == 0) {
     System.out.println("Current cnt " + cnt);
    }
   }
   createFile("MyDataFile", docIds); indexReader.close(); dirIndex.close();
  }
 }

You can replace it with whatever fields you wish to fetch.