SOLR 5 Reading Indexed Data at a Lightning Speed

Lucene
SOLR
Solr Cloud
SOLR Java

Solr Lucene

This blog is about reading SOLR data by directly accessing the Lucene index folder, if you are looking for a normal query method, you should look at a different SOLR tutorial. This blog deals with little low-level details.

Coming to our scenario, we had to fetch & write all unique IDs from SOLR 5.5 running in cloud mode, for running a comparison to find missing IDs. We had a total of 350 million records, we needed to get those 350 million IDs. Normally, you can query SOLR, say from Java, PHP or Python and write it to a file. But since we had a huge set of documents, the query was leading to memory errors even when we had 80GB of RAM.

Thinking of a faster solution, we decided to go low level using Lucene. We used Lucene Lucene-core-5.5.1.jar with a simple Java program that will read the document ids and write them to a file. See the code below,

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.io.FileWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.document.*;
import org.apache.lucene.document.Field.Store;

public class readIdSolr {
 private static void createFile(String file, ArrayList < String > arrData)
 throws IOException {
  FileWriter writer = new FileWriter(file + ".txt");
  int size = arrData.size();
  for (int i = 0; i < size; i++) {
   String str = arrData.get(i).toString();
   writer.write(str);
  }
  writer.close();
 }


 public static void main(String[] args) throws IOException {
  Path path = Paths.get(“index "); //replace with path to the index file
   Directory dirIndex = FSDirectory.open(path); 
   IndexReader indexReader = DirectoryReader.open(dirIndex); 
   String id = "";
   ArrayList < String > docIds = new ArrayList < String > ();
   Document doc = null; 
   System.out.println("In--total--" + indexReader.numDocs()); 
   int cnt = 0;
   for (int i = 0; i < indexReader.numDocs(); i++) {
    cnt += 1;
    doc = indexReader.document(i);
    id = doc.get("id");
    docIds.add(id);
    if (cnt % 10000 == 0) {
     System.out.println("Current cnt " + cnt);
    }
   }
   createFile("MyDataFile", docIds); indexReader.close(); dirIndex.close();
  }
 }

You can replace it with whatever fields you wish to fetch.

Get awesome tech content in your inbox

Similar Blog

How To Protect Apache Solr Admin Console

knackforge

March 19, 2012

Apache Solr Admin If you are using the default start.jar that comes along with Apache Solr, to ...

Similar Blog

How To Protect Apache Solr Admin Console

knackforge

March 19, 2012

Apache Solr Admin If you are using the default start.jar that comes along with Apache Solr, to ...

SOLR 5 Reading Indexed Data at a Lightning Speed

Solr Lucene

Get awesome tech content in your inbox

Similar Blog

How To Protect Apache Solr Admin Console

Similar Blog

How To Protect Apache Solr Admin Console

Ready to get started?

AWS CLOUDCOST

SOLR 5 Reading Indexed Data at a Lightning Speed

Solr Lucene

Get awesome tech content in your inbox

Similar Blog

How To Protect Apache Solr Admin Console

Similar Blog

How To Protect Apache Solr Admin Console

Ready to get started?

AWS CLOUDCOST

Connect with us on