Build a Custom Solr Filter to Handle Unit Conversions

Recently, I came across a use case where it was required to handle units of weight in the index. For instance, 2kg and 2000g, when searched should return the same set of results.

So, for achieving the above, I wrote a custom Solr filter that will work along with KeywordTokenizer to convert all units of weight in the incoming request to a single unit (g) and hence every measurement will be saved in the form of a number; at the same time, it will also keep units like kg/g/mg intact while returning the docs.

Firstly, we need to write custom tokenfilter and tokenfilterfactory .

UnitConversionFilter.java


package com.solr.custom.filter.test;
import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

/**
 * @author SumeetS
 *
 */
public class UnitConversionFilter extends TokenFilter{

private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);

/**
 * @param input
 */
 public UnitConversionFilter(TokenStream input) {
 super(input);
 }

/* (non-Javadoc)
 * @see org.apache.lucene.analysis.TokenStream#incrementToken()
 */
 @Override
 public boolean incrementToken() throws IOException {
 if (input.incrementToken()) {
// charUtils.toLowerCase(termAtt.buffer(), 0, termAtt.length());
 int length = termAtt.length();
 String inputWt = termAtt.toString(); //assuming format to be 1kg/mg
 float valInGrams = convertUnit(inputWt);
 String storeFormat = valInGrams+"";
 termAtt.setEmpty();
 termAtt.copyBuffer(storeFormat.toCharArray(), 0, storeFormat.length());
 return true;
 } else
 return false;
 }

 private float convertUnit(String field){
 String [] tmp = field.split("(k|m)?g");
 float weight = Integer.parseInt(tmp[0]);
 String[] tmp2 = field.split(tmp[0]);
 String unit = tmp2[1];
 float convWt = 0;
 switch(unit) {
 case "kg":
 convWt = weight * 1000;
 break;
 case "mg":
 convWt = weight /1000;
 break;
 case "g":
 convWt = weight;
 break;
 }
 return convWt; 
 }
}

UnitConversionTokenFilterFactory.java


package com.solr.custom.filter.test; 
import java.util.Map;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.util.TokenFilterFactory;

/**
 * @author SumeetS
 *
 */
public class UnitConversionTokenFilterFactory extends TokenFilterFactory {

/**
 * @param args
 */
 public UnitConversionTokenFilterFactory(Map<String, String> args) {
 super(args);
 if (!args.isEmpty()) {
 throw new IllegalArgumentException("Unknown parameters: " + args);
 }
 }

/* (non-Javadoc)
 * @see org.apache.lucene.analysis.util.TokenFilterFactory#create(org.apache.lucene.analysis.TokenStream)
 */
 @Override
 public TokenStream create(TokenStream input) {
 return new UnitConversionFilter(input);
 }

}

NOTE: When you override the TokenFilter and TokenFilterFactory, make sure to edit the protected constructors to public, otherwise it will throw NoSuchMethodException during plugin init.

Now, compile and export your above classes into a jar say customUnitConversionFilterFactory.jar

Steps to Deploy Your Jar Into Solr

1. Place your jar file under <solr installation>/lib

2. Make an entry in solrConfig.xml file to help it identify your custom jar.


	<lib dir="../../../lib/" regex=".*\.jar" />

3. Add custom fieldType and field in your schema.xml

 

<field name="unitConversion" type="unitConversion" indexed="true" stored="true"/>
<fieldType name="unitConversion" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="com.solr.custom.filter.test.UnitConversionTokenFilterFactory" />
</analyzer>
</fieldType>

4. Now restart Solr and browse to the Solr console/<core>/documents

5. Add documents in your index like below:

{"id":"tmp1","unitConversion":"1000g"}
{"id":"tmp2","unitConversion":"2kg"}
{"id":"tmp3","unitConversion":"1kg"}

6. Query your index.

Query1 : querying for documents with 1kg

http://localhost:8983/solr/core1/select?q=*%3A*&fq=unitConversion%3A1kg&wt=json&indent=true

Result:

{
 "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
 "q":"*:*",
 "indent":"true",
 "fq":"unitConversion:1kg",
 "wt":"json"}},
 "response":{"numFound":2,"start":0,"docs":[
 {
 "id":"tmp1",
 "unitConversion":"1000g",
 "_version_":1524411029806645248},
 {
 "id":"tmp3",
 "unitConversion":"1kg",
 "_version_":1524411081738420224}]
 }}

Query2: querying for documents with 2kg

http://localhost:8983/solr/core1/select?q=*%3A*&fq=unitConversion%3A2kg&wt=json&indent=true

Result:

{
 "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
 "q":"*:*",
 "indent":"true",
 "fq":"unitConversion:2kg",
 "wt":"json"}},
 "response":{"numFound":1,"start":0,"docs":[
 {
 "id":"tmp2",
 "unitConversion":"2kg",
 "_version_":1524411089834475520}]
 }}

Query3: let’s try faceting

http://localhost:8983/solr/core1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=unitConversion

{
 "responseHeader":{
 "status":0,
 "QTime":1,
 "params":{
 "q":"*:*",
 "facet.field":"unitConversion",
 "indent":"true",
 "rows":"0",
 "wt":"json",
 "facet":"true"}},
 "response":{"numFound":335,"start":0,"docs":[]
 },
 "facet_counts":{
 "facet_queries":{},
 "facet_fields":{
 "unitConversion":[
 "1000.0",2,
 "2000.0",1]},
 "facet_dates":{},
 "facet_ranges":{},
 "facet_intervals":{},
 "facet_heatmaps":{}}}

This is just a basic implementation. One can add additional fields to identify the type of unit and then based on that decide the conversion.

Further improvements include handling of range queries along with the units.

One thought on “Build a Custom Solr Filter to Handle Unit Conversions

  1. tasoss October 21, 2016 / 11:07 am

    Hello and thank you for this guide. I can’t find detailed tutorials on how to make custom filters in solr. Let me ask something:

    In the java code there are dependencies from lucene-core, lucene-analyzers-common and resources (java). I have to include these jars for example in NetBeans or Eclipse to build succesfully my jar. So, i have this structure: customUnitConversionFilterFactory.jar with the two classes and manifest and inside lib folder 3 jars (dependencies).

    Then, i have my solr structure, for example in xampp for localhost test: xampp/solr/server/solr/test

    The start.jar as well lib folder is in server folder (i think it is installation folder). server/solr folder contains configsets and my core (test). Test contains core’s conf and data. So, is right for me (3 levels up the config file and then lib). There i put my jar. I have tried to put the jar, also the lib folder of the build which contains the 3 jars with the dependencies but i take JVM Error creating core [mylab]: null when i add the filter line in schema.xml.

    What am I doing wrong?

    Thank you.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s