JanusGraph on Compose has recently acquired the ability to simply import data into the graph database. We'd like to show you how and, in the process, show how you can also make use of an incredibly useful graph database book.
Let's set the scene by looking at the book in question, Kelvin Lawrence's Graph Databases, Gremlin and Tinkerpop: A Tutorial. It's a great introduction to the practical application of graph databases and how to navigate and effectively query them. Core to its examples is an example graph,
air-routes which contains around 3600 vertices (representing airports mostly) and nearly 50000 edges (representing the routes between the airports). The book shows you how to import that file on TinkerGraph or JanusGraph but only if it's running locally on your machine using this command:
// Create graph graph.io(graphml()).readGraph('air-routes.graphml')
If you are running on a remote server or cluster, like JanusGraph on Compose, it's not really possible to push that file into the server's local filesystem or run a console directly on the server. And, by extension, you can't tell the system to import that file you didn't push up. So another approach has to be found...
And that approach is to read the data from a web accessible resource. As an aside, if your data isn't a graph, you'll still need to write an application to transform it into a graph, picking out what you want to be vertices and what you want to be edges.
What we are talking about here with Compose for JanusGraph is importing existing graphs that are already encoded as vertices and edges. The functionality comes from the Gremlin I/O packages. These I/O packages let you take InputStreams or OutputStreams and pass them to specialized readers and writers and those readers and writers convert the streams into creating the vertices and edges needed.
Let's show you the code and break it down.
def graph=ConfiguredGraphFactory.create("airroutes"); graph.io(IoCore.graphml()).reader().create().readGraph(ToInputStream.from("url", "https://github.com/krlawrence/graph/raw/master/sample-data/air-routes.graphml"), graph); graph.tx().commit()
sidenote-right We took the
- out of "air-routes" as, currently, the Scylla underlying Compose for JanusGraph isn't keen on them.
All of this would be executed in a Gremlin.sh remote console. The
def graph=... creates a graph database for our data. The graph variable now gives us access to all the capabilities of JanusGraph.
graph.io(IoCore.graphml()) asks the graph to retrieve for us an IOBuilder for GraphML. There's three of these specialized reader/writers: one for GraphML (graphml), one for GraphSON (graphson) and one for Kryo (gyro). There is a fourth but that's for migrating data from Tinkerpop 2 to 3. As their names suggest, GraphML is an XML-based format and GraphSON is a JSON-based format. Kryo works in terms of JVM object graphs.
By requesting the appropriate
IOBuilder, the rest of this command can be structured in the same way for whatever format we are working with.
The next part,
.reader().create(). asks our IOBuilder to give us a reader and create a new instance of it. Our next step will be to call
readGraph() on that reader. We need to do that rather than use IO's own
readGraph() method because the IO version of the method only takes a filename to read from and we want to do something else.
Reading over the net
We've pulled out the URL in the next part to make it clearer:
.readGraph(ToInputStream.from("url", URL ), graph);. This calls on the GraphReader to read the graph. This method takes an InputStream and a graph as parameters. To get our InputStream we call on a new utility class,
ToInputStream which we created and added to Compose. It has a
from() method which makes takes two strings, the first one hints at how to use the second one, so "url" means use the second parameter as a URL and read from it to make an InputStream. That InputStream can then be consumed by the
readGraph() method which feeds the resulting graph data into the
For the example we use the URL
https://github.com/krlawrence/graph/raw/master/sample-data/air-routes.graphml from Lawrence's sample-data directory in the Graph book repository. We quite literally point the
ToInputStream.from() at that URL, hit Return and the graph will be rapidly imported with all the parsing being doing on the server side. There is one last step though, making sure everything is committed to the database. That's where
graph.tx().commit() comes in. You'll now have a graph database rich with airports and routes and you can dive into Lawrence's tutorials.
More than just a URL
You may wonder what else can be handled by
ToInputStream. Well, currently the
from() method can take "url" or "string" as a first parameter. We've seen what "url" does. The "string" option takes the second parameter as a literal string and creates the InputStream from that. You may worry that you don't want your graph data available over the internet if you are importing your own data. Worry not as there's a third form of
from() which takes three parameters, a URL, a username and a password. This allows you to access URLs protected by basic authentication.
That covers nearly all the newly enabled importing capabilities of Compose for JanusGraph. Remember that you can import GraphSON by switching to using
IoCore.graphson() too. Happy importing and traversing your graphs!
Read more articles about Compose databases - use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at firstname.lastname@example.org. We're happy to hear from you.
attribution Bramserud Photography