Build a Custom Solr Filter to Handle Unit Conversions

Recently, I came across a use case where it was required to handle units of weight in the index. For instance, 2kg and 2000g, when searched should return the same set of results.

So, for achieving the above, I wrote a custom Solr filter that will work along with KeywordTokenizer to convert all units of weight in the incoming request to a single unit (g) and hence every measurement will be saved in the form of a number; at the same time, it will also keep units like kg/g/mg intact while returning the docs.

Firstly, we need to write custom tokenfilter and tokenfilterfactory .

package com.solr.custom.filter.test;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

 * @author SumeetS
public class UnitConversionFilter extends TokenFilter{

private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);

 * @param input
 public UnitConversionFilter(TokenStream input) {

/* (non-Javadoc)
 * @see org.apache.lucene.analysis.TokenStream#incrementToken()
 public boolean incrementToken() throws IOException {
 if (input.incrementToken()) {
// charUtils.toLowerCase(termAtt.buffer(), 0, termAtt.length());
 int length = termAtt.length();
 String inputWt = termAtt.toString(); //assuming format to be 1kg/mg
 float valInGrams = convertUnit(inputWt);
 String storeFormat = valInGrams+"";
 termAtt.copyBuffer(storeFormat.toCharArray(), 0, storeFormat.length());
 return true;
 } else
 return false;

 private float convertUnit(String field){
 String [] tmp = field.split("(k|m)?g");
 float weight = Integer.parseInt(tmp[0]);
 String[] tmp2 = field.split(tmp[0]);
 String unit = tmp2[1];
 float convWt = 0;
 switch(unit) {
 case "kg":
 convWt = weight * 1000;
 case "mg":
 convWt = weight /1000;
 case "g":
 convWt = weight;
 return convWt; 

package com.solr.custom.filter.test; 
import java.util.Map;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.util.TokenFilterFactory;

 * @author SumeetS
public class UnitConversionTokenFilterFactory extends TokenFilterFactory {

 * @param args
 public UnitConversionTokenFilterFactory(Map<String, String> args) {
 if (!args.isEmpty()) {
 throw new IllegalArgumentException("Unknown parameters: " + args);

/* (non-Javadoc)
 * @see org.apache.lucene.analysis.util.TokenFilterFactory#create(org.apache.lucene.analysis.TokenStream)
 public TokenStream create(TokenStream input) {
 return new UnitConversionFilter(input);


NOTE: When you override the TokenFilter and TokenFilterFactory, make sure to edit the protected constructors to public, otherwise it will throw NoSuchMethodException during plugin init.

Now, compile and export your above classes into a jar say customUnitConversionFilterFactory.jar

Steps to Deploy Your Jar Into Solr

1. Place your jar file under <solr installation>/lib

2. Make an entry in solrConfig.xml file to help it identify your custom jar.

	<lib dir="../../../lib/" regex=".*\.jar" />

3. Add custom fieldType and field in your schema.xml


<field name="unitConversion" type="unitConversion" indexed="true" stored="true"/>
<fieldType name="unitConversion" class="solr.TextField" positionIncrementGap="100">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="com.solr.custom.filter.test.UnitConversionTokenFilterFactory" />

4. Now restart Solr and browse to the Solr console/<core>/documents

5. Add documents in your index like below:


6. Query your index.

Query1 : querying for documents with 1kg




Query2: querying for documents with 2kg




Query3: let’s try faceting



This is just a basic implementation. One can add additional fields to identify the type of unit and then based on that decide the conversion.

Further improvements include handling of range queries along with the units.

Multi-tenancy in Cloud Application through Meta Data Driven Architecture

A multi-tenant architecture is designed to allow tenant-specific configurations at the UI, business rules, business processes and data model layers. This is enabled without changing the code thereby transforming complex customization into configuration of software. This drives the clear need for “metadata driven everything” including metadata driven database, metadata driven SOA, Metadata driven business layer, Metadata driven AOP and Metadata driven user interfaces.

Metadata Driven Database

To develop a Multi-tenanted database, one of the following architecture approaches applies:

  • Shared Tables among Tenants
  • Flexible Schema, Shared Tables
  • Multi-Schema, Private Tables
  • Single Schema, Private Tables for Tenants
  • Multi-Instance

As the service grows building a cloud database service to manage a vast, ever-changing set of actual database would be difficult. Rules pertaining to whom, where, how etc. may become an overhead as the application and numbers of clients grow.

Metadata driven approach involves collecting all these answers in tables so that it could be reused. It involves putting info about all tables, columns, indexes, constraints, partitions, SPs, parameters, functions; rules defines in business and transaction steps in a SP.

In a true metadata driven database, no rule and procedure refer to tables directly and even these rules are abstracted and used through metadata.

Metadata Driven SOA

To be a true service-oriented application the fractal model must be applicable from the system boundary to the database, with service interfaces defined for each component or sub-system and each service treated as a black- box by the caller.

The metadata-driven nature of the services of application leads the solution to a dead-end if a pure technical ‘code it’ approach is taken. In such a metadata-driven application exposing functions is replaced by exposing metadata.

Exposing the metadata itself is not the true intent of a metadata-driven application. Driving the propagation of services [functions] over the system boundary is a more accurate manner of phasing the approach that needs to be employed.

A metadata-driven application is capable of providing a bridging approach to propagate its services into many technologies via code generation. This is a direct result of all services being regular and that all service descriptions are available in a meta-format at both build-time and runtime.

Metadata Driven Business Layer

In the past, business logic and workflow were written using if else condition. If a business model or workflow is being designed in a multitenant environment, then the very first step has to be preparing metadata configurations. It should include the data source, extractions steps, transformation routing, loading and the rules and execution logic derivation source. Next step has to be the decision of tools and language, the usage of which can generate code and workflows out of the configurations. The final and the most challenging one will be changing the mindset of developers to “not create workflows and business objects but write code which can generate”.

Metadata Driven AOP

Metadata and the Join Point Model

A join point is an identifiable point in the execution of a system. The model defines which join points in a system are exposed and how they are captured. To implement crosscutting functionality using aspects, you need to capture the required join points using a programming construct called a pointcut.

Pointcuts select join points and collect the context at selected join points. All AOP systems provide a language to define pointcuts. The sophistication of the pointcut language is a differentiating factor among the various AOP systems. The more mature the pointcut language, the easier it is to write robust pointcuts.

Capturing Join Points with Metadata

Signature-based pointcuts cannot capture the join points needed to implement certain crosscutting concerns. For example, how would you capture join points requiring transaction management or authorization? Nothing inherent in an element’s name or signature suggests transactionality or authorization characteristics. The pointcut required in these situations can get unwieldy. The example is in AspectJ but pointcuts in other systems are conceptually identical.

pointcut transactedOps() 

    : execution(public void

      || execution(public void Account.debit(..)) 

Situations like these invite the use of metadata to capture the required join points. For example, you could write a pointcut as shown below to capture the execution of all the methods carrying the @Transactional annotation.

pointcut execution(@Transactional * *.*(..));

AOP systems and their join point models can be augmented by consuming metadata annotations. By piggybacking on code generation support it’s possible to consume metadata even when the core AOP system doesn’t directly support it.
Metadata support in AOP systems

To support metadata-based crosscutting, an AOP system needs to provide a way to consume and supply annotations. An AOP system that supports consuming annotations will let you select join points based on annotations associated with program elements. The current AOP systems that offer such support extend the definition for various signature patterns to allow annotation types and properties to be specified. For example, a pointcut could select all the methods carrying an annotation of type Timing. Further, it could subselect only methods with the value property exceeding, say, 25. To implement advice dependent on both annotation type and properties, the system could include pointcut syntax capturing the annotation instances associated with the join points. Lastly, the system could also allow advice to access annotation instances through reflective APIs.

Metadata Driven User Interfaces

Many business applications require the user interface (UI) to be extensible as the requirements vary from one customer to another. Client-side business logic for the UI may also need customization based on individual user need. A screen layout for a user might be different from another user. This may include control position, visibility, UIs for various mobile devices. The business logic customization also includes customizing validation rules, changing control properties, and other modifications. For example, a manager may have different options for deleting and moving files than a subordinate.

There are many techniques for enabling business applications to be extensible or customizable. Most applications solve this problem by storing customizable items such as UI layout and client-side business logic as metadata in a repository. This metadata can then be interpreted by a run-time engine to display the screen to users and to execute the client-side business logic when the user performs an action on the screen.

The advantages of this approach are:

  • Redeployment of components on the presentation layer is not required as the customization is done in a central repository.
  • A very light client installation is required. One only needs to deploy the run-time engine to the client machine.

While designing a Metadata driven UI, the following components are taken into account:

  1. Metadata Service. An ordinary service layer delivers Meta data for UI
  2. Login/Role Controller
  3. Action Controller
  4. Widget Controller
  5. MetaTree
  6. TreeService

Multi-tenancy in cloud applications can have a huge impact on the application delivery and productivity of an IT company.  Yet most people who use cloud and its services tend to ignore it owing to it’s “behind the scenes” functionality. Many old applications have been written in multitenant manner but moving them to SAAS or converting legacy to SOA might become a challenge. Meta data driven programming is indeed a different paradigm. However, it has a capability to solve numerous challenges associated not only with multi-tenancy but other cloud issues as well.

10 things to do while migrating an ASP.NET App to Azure

Here I will list a few points that you should take care of while migrating an ASP.NET application to Azure. The reader should have a basic understanding of Azure platform.

  1. Make sure that your application is 64-bit compatible since Window azure is a 64-bit environment.
  2. Convert your website project to a web application project to associate with a web role since the Visual Studio tools for Azure do not support ASP.NET website projects.
  3. Ensure that webroles listen to http requests on different ports, in case there are more than one webrole in a single service deployment. Webservices in the application can associate to a webrole.
  4. Migrate Windows services to a worker role. OnStart() method of the Window services class is equivalent to the Run() method of the RoleEntryPoint class.
  5. Consider moving out some or all of the settings in config files to service setting files that do not require redeployment after every change, though app.config and web.config files work without any change.
  6. Make sure that your web application does not have any issues with running on IIS 7, as Window Azure webroles runs on IIS7 Integrated mode. In my case the application was using the WCSF framework which has a known issue with IIS7. Fortunately there is a work-around to make the WCSF work on IIS7. Please note that IIS7 Integrated mode has removed the HttpRequest context from the Application_Start event.
  7. Ensure that your application is not caching anything in the ASP.NET Cache or Session. There is a sample on codeplex that illustrates how to host Memcached in a worker role. This sample also provides a .NET client for Memcached and a sample webrole project for performance monitoring of your Memcached instance. Where ever you need to cache anything you can use Memcached or Azure storage.
  8. Upload and save files to a blob in Windows azure storage, if your web application allows user to upload files, which I think is easiest, or you can save the file in a simulated Xdrive. All data in Window Azure Storage can be accessed with HTTP requests that follow REST conventions.
  9. Set the default page in web.config under the system.webServer node (Supported in IIS7, since you cannot configure IIS in Azure).
  10. Get a SQL Server 2008 R2 client to work with SQL Azure. You can migrate the Database Schema to SQL Azure database by using the generate scripts wizard and use SSIS to move data into and out of SQL Azure. Once you migrate the database to SQL Azure successfully you just have to change the connection string in your application.

Here’s a blog that discusses some more migration issues.