facing issues while using phrase query with slop in LUCENE

General Tech Technology & Software 2 years ago

0 3 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (3)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I am facing some issues with phrase query, so write a small code to exactly know how phrase query actually works with slop stuff:

I have a string "abc institute of technology" and I indexed different combination of this string(more like a shingle) like this

Document doc = new Document();
ArrayList<String> sh = new ArrayList<String>(); 
     sh.add("abc institute engineering technology");
     sh.add("abc institute engineering");
     sh.add("abc institute");
     sh.add("abc");
     sh.add("institute engineering technology");
     sh.add("institute engineering");
     sh.add("institute");
     sh.add("engineering technology");
     sh.add("engineering");
     sh.add("technology");
  for(String s : sh){
        doc.add(new Field("insti_shingles", s.toLowerCase(), Field.Store.YES,  Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
  }
  writer.addDocument(doc);

Now when i read all the tokens from the index directory i have these set of tokens:

engineering technology
abc
institute
abc institute engineering technology
technology
abc institute
abc institute engineering
institute engineering technology
engineering
institute engineering

Now when i search for term "abc institute technology"

IndexSearcher searcher = new IndexSearcher(dir);
BooleanQuery booleanQuery = new BooleanQuery();
PhraseQuery query = new PhraseQuery();
query.add(new Term("insti_shingles", "abc institute technology"));
query.setSlop(4);
booleanQuery.add(query, BooleanClause.Occur.SHOULD);
TopDocs hits = searcher.search(booleanQuery, 30);

Now according to documentation of phrase query with slop, i should get some results but i am getting empty result set. But I get the result when i search for the term that is exactly as indexed token.

i think the term "abc institute technology" should get matched by token "abc institute engineering technology" when we use phrase query???

Am i doing anything wrong? Help

profilepic.png
manpreet 2 years ago

You don't need a special tokenizer to use phrase queries with slop - indeed it will cause these queries to fail, as you have noticed.

Just tokenize using a StandardAnalyzer, no need to do that custom shingle stuff.


0 views   0 shares

profilepic.png
manpreet 2 years ago

You don't need a special tokenizer to use phrase queries with slop - indeed it will cause these queries to fail, as you have noticed.

Just tokenize using a StandardAnalyzer, no need to do that custom shingle stuff.


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.