Importing Graphs into JanusGraph

Published

JanusGraph on Compose has recently acquired the ability to simply import data into the graph database. We'd like to show you how and, in the process, show how you can also make use of an incredibly useful graph database book.

Let's set the scene by looking at the book in question, Kelvin Lawrence's Graph Databases, Gremlin and Tinkerpop: A Tutorial. It's a great introduction to the practical application of graph databases and how to navigate and effectively query them. Core to its examples is an example graph, air-routes which contains around 3600 vertices (representing airports mostly) and nearly 50000 edges (representing the routes between the airports). The book shows you how to import that file on TinkerGraph or JanusGraph but only if it's running locally on your machine using this command:

// Create graph
graph.io(graphml()).readGraph('air-routes.graphml')  

If you are running on a remote server or cluster, like JanusGraph on Compose, it's not really possible to push that file into the server's local filesystem or run a console directly on the server. And, by extension, you can't tell the system to import that file you didn't push up. So another approach has to be found...

And that approach is to read the data from a web accessible resource. As an aside, if your data isn't a graph, you'll still need to write an application to transform it into a graph, picking out what you want to be vertices and what you want to be edges.

Going remote

What we are talking about here with Compose for JanusGraph is importing existing graphs that are already encoded as vertices and edges. The functionality comes from the Gremlin I/O packages. These I/O packages let you take InputStreams or OutputStreams and pass them to specialized readers and writers and those readers and writers convert the streams into creating the vertices and edges needed.

Let's show you the code and break it down.

def graph=ConfiguredGraphFactory.create("airroutes");

graph.io(IoCore.graphml()).reader().create().readGraph(ToInputStream.from("url", "https://github.com/krlawrence/graph/raw/master/sample-data/air-routes.graphml"), graph);

graph.tx().commit()  

sidenote-right We took the - out of "air-routes" as, currently, the Scylla underlying Compose for JanusGraph isn't keen on them.

All of this would be executed in a Gremlin.sh remote console. The def graph=... creates a graph database for our data. The graph variable now gives us access to all the capabilities of JanusGraph.

graph.io(IoCore.graphml()) asks the graph to retrieve for us an IOBuilder for GraphML. There's three of these specialized reader/writers: one for GraphML (graphml), one for GraphSON (graphson) and one for Kryo (gyro). There is a fourth but that's for migrating data from Tinkerpop 2 to 3. As their names suggest, GraphML is an XML-based format and GraphSON is a JSON-based format. Kryo works in terms of JVM object graphs.

By requesting the appropriate IOBuilder, the rest of this command can be structured in the same way for whatever format we are working with.

The next part, .reader().create(). asks our IOBuilder to give us a reader and create a new instance of it. Our next step will be to call readGraph() on that reader. We need to do that rather than use IO's own readGraph() method because the IO version of the method only takes a filename to read from and we want to do something else.

Reading over the net

We've pulled out the URL in the next part to make it clearer: .readGraph(ToInputStream.from("url", URL ), graph);. This calls on the GraphReader to read the graph. This method takes an InputStream and a graph as parameters. To get our InputStream we call on a new utility class, ToInputStream which we created and added to Compose. It has a from() method which makes takes two strings, the first one hints at how to use the second one, so "url" means use the second parameter as a URL and read from it to make an InputStream. That InputStream can then be consumed by the readGraph() method which feeds the resulting graph data into the graph database.

For the example we use the URL https://github.com/krlawrence/graph/raw/master/sample-data/air-routes.graphml from Lawrence's sample-data directory in the Graph book repository. We quite literally point the ToInputStream.from() at that URL, hit Return and the graph will be rapidly imported with all the parsing being doing on the server side. There is one last step though, making sure everything is committed to the database. That's where graph.tx().commit() comes in. You'll now have a graph database rich with airports and routes and you can dive into Lawrence's tutorials.

More than just a URL

You may wonder what else can be handled by ToInputStream. Well, currently the from() method can take "url" or "string" as a first parameter. We've seen what "url" does. The "string" option takes the second parameter as a literal string and creates the InputStream from that. You may worry that you don't want your graph data available over the internet if you are importing your own data. Worry not as there's a third form of from() which takes three parameters, a URL, a username and a password. This allows you to access URLs protected by basic authentication.

That covers nearly all the newly enabled importing capabilities of Compose for JanusGraph. Remember that you can import GraphSON by switching to using IoCore.graphson() too. Happy importing and traversing your graphs!


Read more articles about Compose databases - use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Bramserud Photography

Dj Walker-Morgan
Dj Walker-Morgan is Compose's resident Content Curator, and has been both a developer and writer since Apples came in II flavors and Commodores had Pets. Love this article? Head over to Dj Walker-Morgan’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.