Skip to main content
SOLR 5 reading at lightning speed

SOLR 5 reading indexed data at a lightning speed

This blog is about reading SOLR data by directly accessing the Lucene index folder, if you are looking for normal query method, you should look at a different SOLR tutorial. This blog deals with little low-level details.

Coming to our scenario, we had to fetch & write all unique IDs from SOLR 5.5 running in cloud mode, for running a comparison to find missing IDs. We had totally 350 million records, we needed to get those 350 million IDs. Normally, you can query SOLR, say from Java,PHP or Python and write it to a file. But since we had a huge set of documents, the query was leading to memory error even when we had 80GB of RAM.

Thinking of a faster solution, we decided to go low level using Lucene. We used Lucene lucene-core-5.5.1.jar with a simple Java program that will read the document ids and write it to a file. See the code below,

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.io.FileWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.document.*;
import org.apache.lucene.document.Field.Store;

public class readIdSolr {
 private static void createFile(String file, ArrayList < String > arrData)
 throws IOException {
  FileWriter writer = new FileWriter(file + ".txt");
  int size = arrData.size();
  for (int i = 0; i < size; i++) {
   String str = arrData.get(i).toString();
   writer.write(str);
  }
  writer.close();
 }


 public static void main(String[] args) throws IOException {
  Path path = Paths.get(“index "); //replace with path to the index file
   Directory dirIndex = FSDirectory.open(path); 
   IndexReader indexReader = DirectoryReader.open(dirIndex); 
   String id = "";
   ArrayList < String > docIds = new ArrayList < String > ();
   Document doc = null; 
   System.out.println("In--total--" + indexReader.numDocs()); 
   int cnt = 0;
   for (int i = 0; i < indexReader.numDocs(); i++) {
    cnt += 1;
    doc = indexReader.document(i);
    id = doc.get("id");
    docIds.add(id);
    if (cnt % 10000 == 0) {
     System.out.println("Current cnt " + cnt);
    }
   }
   createFile("MyDataFile", docIds); indexReader.close(); dirIndex.close();
  }
 }

You can replace with whatever fields you wish to fetch.

 

Add new comment

The content of this field is kept private and will not be shown publicly.

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.