Transporter & Elasticsearch Mapping

Now that Compose has Elasticsearch and Transporter, we've been looking at the process of moving data around and the particular obstacles one can meet in the process. One of the smart things that Elasticsearch does when data is being put into one of its indexes is to create a dynamic type mapping of the data as its inserted. So where it sees a field with a date, it adds to the dynamic type mapping that that field is a date and carries on.

Transporter

When you come to query the data in the index, Elasticsearch can use that type and immediately apply search semantics suitable for dates.

This is very useful but there's a problem that can catch users out when using this dynamic type mapping and that's numbers. Say the first record you have being imported has a field "myval" and it is set to "100". Elasticsearch will look at that and quite reasonably decide that it is an integer and set the mapping type to integer. But say in the second record, "myval" is now "100.123". Elasticsearch will look at the type defined and try and parse it as an integer, which in turn will cause an error in Elasticsearch.

Ah, you say, we'll just make sure that the numbers are all consistently formatted as floating point or integer as appropriate in the database. Which would be a good idea, but when you are dealing with MongoDB - you are dealing with JSON and JavaScript... and there's something you need to know about how they handle numbers.

JavaScript, and by extension JSON don't really have types like Integer, Double or Float for numbers – they just have Number, an all purpose holder for numeric values which doesn't distinguish between floating point and integer numbers. Instead Number actually stores the value as a floating point number. The format of the number is decided when the value is extracted from the Number. It's meant to be flexible in a dynamic scripting language where type is implied.

But it also means that where you have an integer in a floating point variable it will be rendered as an integer and when that number is read by a Java application like Elasticsearch, it'll see an integer and set the field type to integer in the scenario we outlined above. If you are using the Compose Transporter to push data into Elasticsearch, you are most probably not going to want to try and create mappings to pre-empt this behavior, especially if your data is in records with very variable structure.

So what to do? Well, there is a little trick you can do with the Transformer part of the Transporter and that trick is to make all integers into strings. Let's show you the code, then we'll explain. Here's the norm function...

function norm(obj) {  
  Object.keys(obj).forEach(function(k) {
      if(typeof obj[k] === "number") {
if(Number.isInteger(obj[k])) {  
  obj[k]=obj[k]+".0";
}
      }
else if(typeof obj[k] === "object") {  
norm(obj[k]);  
}
     });
}

module.exports = function(doc) {  
  var o=_.pick(doc, [ "_id", "val"]);
  norm(o);
  return o;
}

When the norm function is given any JavaScript object, it recursively steps through it. Where it finds fields are of type Number which also have a value that is an integer, the function swaps the number for a string made up of the integer's value with ".0" appended.

Now, with the number in the form of a string, Elasticsearch will see a string and upon parsing it see a floating point value which it can will type as a double. Now later records with floating point values won't trigger an error.

isInteger?

There's one small JavaScript issue though; although Number.isInteger is supported in many modern browsers, as part of ECMAScript 6 – the upcoming new standard version – it isn't supported by the Transporter's JavaScript engine, it supports the current version as is.

The code above will run in the Transporter's Transformer creation page because in that page, it uses the browsers JavaScript implementation to test the code, but it will throw an error when the Transporter job is submitted.

That's not as big a problem as it may sound though. The same issue exists for older browsers and the solution is called a polyfill. That's some code which "fills" the functionality gap where the function isn't present. Here's a polyfill for Number.isInteger:

if (!Number.isInteger) {  
  Number.isInteger = function isInteger (nVal) {
    return typeof nVal === "number" &amp;&amp; isFinite(nVal) &amp;&amp; nVal > -9007199254740992 &amp;&amp; nVal < 9007199254740992 &amp;&amp; Math.floor(nVal) === nVal;
  };
}

Pop that snippet before the preceding code and whenever Number.isInteger is not defined, it'll add a definition and you can progress.

More mapping

The function we've offered here is designed to address one particular mapping problem but that doesn't mean it's all you can do. You could modify the function to correct any mapping deficiency you may encounter, reformat if fields or perform more complex modifications (though we suggest that the built in Underscore library may be a good place to start if you don't want to recurse through the entire document object). This should, though, show the power of having JavaScript in the Transporter... and this is only the beginning.