Graph 101: Traversing and Querying JanusGraph using Gremlin

Published

The Graph101 series is an introduction to Graph Databases for developers. Whether you're a seasoned expert looking to expand your knowledge, or a newbie who's first dipping your toe into the data layer, the Graph 101 series will help you find your bearings. If this is your first encounter with Graph Databases, check out our intro to graph databases and our JanusGraph concepts page.

JanusGraph on Compose is a great way to get started with Graph databases. Using JanusGraph and it's query language Gremlin, you can perform complex queries and traversals without having to worry about the details of implementing those complex queries.

In this article, we'll look at querying and traversing a graph database using Gremlin, and cover some of the basic queries you'll need to be effective with JanusGraph.

Vertices and Edges in Gremlin

Gremlin is the name of the language used to query JanusGraph, as well as the environment we'll use to execute those queries. If you haven't already done so, you can find instructions to install and connect to Gremlin in the Compose documentation.

Before we get too far, let's load up JanusGraph with some data. You can either choose and load a data set like we did in our getting started with graphs article, or you can load an example data set provided by JanusGraph. For this article, let's load up JanusGraph with the example data:

gremlin> :> def graph = ConfiguredGraphFactory.create("example")  
==>standardjanusgraph[astyanax:[10.65.196.132, 10.65.196.131, 10.65.196.130]]

gremlin> :> GraphOfTheGodsFactory.loadWithoutMixedIndex(graph,true);  
==>null

You can read more about this toy graph example on the JanusGraph getting started guide.

Now that we have some data to work with, we can start traversing the graph. All traversals start with a graph traversal object, which are created from a database by calling the traversal() method on the database.

gremlin> :> def g = graph.traversal();  
==>graphtraversalsource[standardjanusgraph[astyanax:[10.65.196.132, 10.65.196.131, 10.65.196.130]], standard]

You can see a visual depiction of your graph on the Compose JanusGraph Browser, which is found on the dashboard of your Compose JanusGraph deployment.

A lightbox link

Vertices

Once we have our traversal object, which is called g above, we can now start building up our query. The simplest query just lists all the vertices in the graph using the V() method of the traversal.

gremlin> :> g.V();  
==>v[20656]
==>v[8376]
==>v[12472]
==>v[8216]
==>v[4120]
==>v[4272]
==>v[4144]
==>v[8368]
==>v[4200]
==>v[12464]
==>v[4280]
==>v[16560]

The vertices are represented here by the letter v and then the ID of the vertex in brackets. We can see the specific values of the vertices by using the valueMap() method:

gremlin> :> g.V().valueMap()  
==>{name=[tartarus]}
==>{name=[neptune], age=[4500]}
==>{name=[alcmene], age=[45]}
==>{name=[hydra]}
==>{name=[sea]}
==>{name=[jupiter], age=[5000]}
==>{name=[sky]}
==>{name=[pluto], age=[4000]}
==>{name=[hercules], age=[30]}
==>{name=[nemean]}
==>{name=[saturn], age=[10000]}
==>{name=[cerberus]}

We can use the JanusGraph data browser in Compose to get a visual representation of these:

Visual of vertices in graph

Once we know how to query the vertices of a graph database, we can now start filtering those vertices based on various criteria. We can filter on a specific vertex property by using the traversal's has() method:

gremlin> :> g.V().has("name", "neptune");  
==>v[8376]

Visual of vertices in graph

This will filter out any vertices without a property called name and a value of neptune. We can inspect that result using the valueMap() method:

gremlin> :> g.V().has("name", "neptune").valueMap();  
==>{name=[neptune], age=[4500]}

Edges

We can also query all of the edges in the graph using the E() method on our traversal. Edges represent relationships between vertices in the graph, and those relationships will come into play shortly:

gremlin> :> g.E()  
==>e[55j-6go-9hx-36g][8376-lives->4120]
==>e[5jr-6go-b2t-3ao][8376-brother->4272]
==>e[5xz-6go-b2t-6gg][8376-brother->8368]
==>e[74m-3ao-6c5-3aw][4272-father->4280]
==>e[7iu-3ao-9hx-374][4272-lives->4144]
==>e[8ba-3ao-b2t-6gg][4272-brother->8368]
==>e[7x2-3ao-b2t-6go][4272-brother->8376]
==>e[9hy-6gg-9hx-fxs][8368-lives->20656]
==>e[9w6-6gg-aad-cs0][8368-pet->16560]
==>e[8pi-6gg-b2t-3ao][8368-brother->4272]
==>e[93q-6gg-b2t-6go][8368-brother->8376]
==>e[1zh-38o-6c5-3ao][4200-father->4272]
==>e[2dp-38o-74l-9mg][4200-mother->12472]
==>e[365-38o-7x1-6c8][4200-battled->8216]
==>e[2rx-38o-7x1-9m8][4200-battled->12464]
==>e[3kd-38o-7x1-cs0][4200-battled->16560]
==>e[aae-cs0-9hx-fxs][16560-lives->20656]

These edges have 2 different parts:

The first part, which takes the form of e[ID] part represents the edge object itself with some unique identifier for the edge.

The second part [ID-relationship->ID] represents two vertices that are related to each other, along with the name of the relationship between them. The first ID is called the incoming head vertex and the second is caled the outgoing tail vertex. Edges have a direction, represented by the -> arrow operator, going from the relationship name to the outgoing vertex.

graph<em>in</em>out

We can also find the vertices that are connected to each part of the vertex, either the incoming head or the outgoing tail by using the inV() and outV() methods, respectively:

gremlin> :> g.E().inV().valueMap();  
==>{name=[sea]}
==>{name=[jupiter], age=[5000]}
==>{name=[jupiter], age=[5000]}
==>{name=[jupiter], age=[5000]}
==>{name=[pluto], age=[4000]}
==>{name=[pluto], age=[4000]}
==>{name=[saturn], age=[10000]}

gremlin> :> g.E().outV().valueMap();  
==>{name=[neptune], age=[4500]}
==>{name=[neptune], age=[4500]}
==>{name=[neptune], age=[4500]}
==>{name=[jupiter], age=[5000]}
==>{name=[jupiter], age=[5000]}
==>{name=[jupiter], age=[5000]}
==>{name=[pluto], age=[4000]}
==>{name=[hercules], age=[30]}
==>{name=[hercules], age=[30]}
==>{name=[cerberus]}

Traversing Using Edges

Traversing a graph simply means going from one vertex to another through the edges that connect them together. That means there needs to be an edge between two vertices in order for them to connect to each other. This is an important property of graphs, and something that can sometimes trip up developers new to graph databases.

To traverse from one vertex to the next, we'll first need to pick a vertex as the starting point. We'll do that with the V() method, and will ask for the neptune vertex.

gremlin> :> g.V().has("name", "neptune")  
==>v[8376]

Once we have that vertex, we can see all of the other vertices related to it by using the out() and in() methods on that vertex. These methods both return an array of vertices, so keep in mind that using in() and out() will give you vertices rather than edges.

The out() method will show all of the other vertices that are connected to this one via an edge where this vertex is the output or tail of the connection. The in() method will show every vertex connected with an edge where this vertex is the input or head of the connection.

gremlin> :> g.V().has("name", "neptune").out()  
==>v[4120]
==>v[4272]
==>v[8368]

graph<em>in</em>out

gremlin> :> g.V().has("name", "neptune").in()  
==>v[4272]
==>v[8368]

graph<em>in</em>out

Chaining in the valueMap() method in gremlin at the end makes this more human-readable:

gremlin> :> g.V().has("name", "neptune").out().valueMap();  
==>{name=[sea]}
==>{name=[jupiter], age=[5000]}
==>{name=[pluto], age=[4000]}
gremlin> :> g.V().has("name", "neptune").in().valueMap();  
==>{name=[jupiter], age=[5000]}
==>{name=[pluto], age=[4000]}

Using the out() and in() methods, we can see that the vertices are related to each other but not how they are related. That information is contained in the edges, and we can access those using the outE() and inE() methods. They're the same as the in() and out() methods, except they return the edges instead of the vertices.

gremlin> :> g.V().has("name", "neptune").outE();  
==>e[55j-6go-9hx-36g][8376-lives->4120]
==>e[5jr-6go-b2t-3ao][8376-brother->4272]
==>e[5xz-6go-b2t-6gg][8376-brother->8368]
gremlin> :> g.V().has("name", "neptune").inE();  
==>e[7x2-3ao-b2t-6go][4272-brother->8376]
==>e[93q-6gg-b2t-6go][8368-brother->8376]

This now shows us how the relationships between vertices, and the name of the relationship. We now enough information to see all of the vertices connected to ours, and how they connect to each other. Let's use the .path() method to lay out the entire path, from the input vertex, following through the edge, and ending with the output vertex:

gremlin> :> g.V().has("name", "neptune").inE().outV().path();  
==>[v[8376], e[7x2-3ao-b2t-6go][4272-brother->8376], v[4272]]
==>[v[8376], e[93q-6gg-b2t-6go][8368-brother->8376], v[8368]]

graph<em>in</em>out

gremlin> :> g.V().has("name", "neptune").outE().inV().path();  
==>[v[8376], e[55j-6go-9hx-36g][8376-lives->4120], v[4120]]
==>[v[8376], e[5jr-6go-b2t-3ao][8376-brother->4272], v[4272]]
==>[v[8376], e[5xz-6go-b2t-6gg][8376-brother->8368], v[8368]]

graph<em>in</em>out

Queries and Paths Using Edges

So far, we've seen how to query using the .has() method on vertices and traverse a graph along edges. Now, let's see how we can query the graph based on the relationships represented by an edge.

Let's start by asking a question: Where do our gods in the graph live? We can answer this by finding all of the edges that represent a lives relationship and getting their path like we did above.

We can see all of the possible locations that a god can live in by querying for the output tail of all relationships that have a lives label.

gremlin> :> g.V().out("lives");  
==>v[4120]
==>v[4144]
==>v[20656]
==>v[20656]

graph<em>in</em>out

While this is an interesting result, it doesn't tell us much about who lives in these locations. Using our edge queries from above, we can get the input vertex AND the edge by using the inE() method and passing in the label for our edge. Let's create another path with the first vertex at the beginning, the edge in the middle, and the output vertex at the end.

gremlin> :> g.V().inE('lives').outV().path();  
==>[v[20656], e[9hy-6gg-9hx-fxs][8368-lives->20656], v[8368]]
==>[v[20656], e[aae-cs0-9hx-fxs][16560-lives->20656], v[16560]]
==>[v[4120], e[55j-6go-9hx-36g][8376-lives->4120], v[8376]]
==>[v[4144], e[7iu-3ao-9hx-374][4272-lives->4144], v[4272]]

graph<em>in</em>out

As we can see, building up paths is a matter of chaining together the right functions to gather the data you need. Let's see which of the gods have battled each other by building a path with all of the battled edges:

gremlin> :> g.V().inE('battled').outV().path();  
==>[v[8216], e[365-38o-7x1-6c8][4200-battled->8216], v[4200]]
==>[v[12464], e[2rx-38o-7x1-9m8][4200-battled->12464], v[4200]]
==>[v[16560], e[3kd-38o-7x1-cs0][4200-battled->16560], v[4200]]

graph<em>in</em>out

Traversing Between Vertices

Traversing the graph via edges is useful when you already know the relationship between two vertices. However, in some cases, such as a social network, the relationships between two vertices may be unclear. Let's take a look at how we can find all the relationships between two vertices in the graph.

We'll do this by querying all of the edges connected to a specific vertex and using the where() method to create a graph criteria. Let's look at how Hercules and Jupiter are related to each other.

gremlin> :> g.V().has("name", "hercules").bothE().where(otherV().has("name", "jupiter"))  
==>e[1zh-38o-6c5-3ao][4200-father->4272]

We can see that Hercules and Jupiter share an edge with the father relationship. We can update this query to show the path between Hercules and Jupiter by using the outV() and path() methods we used in the previous section:

gremlin> :> g.V().has("name", "hercules").bothE().where(otherV().has("name", "jupiter")).inV().path();  
==>[v[4200], e[1zh-38o-6c5-3ao][4200-father->4272], v[4272]]

graph<em>in</em>out

Let's break this query down. Here, we used the bothE() method to get edges where our vertex is both the input head and the output tail of the edge. We then chained in the where() method which allows us to define a condition for our edge. The otherV() method is used with the bothE() method since we don't know if our starting vertex is the input or output of the edge. Using otherV() will get the input vertex if our starting point is the ouput vertex, and vice-versa.

These methods are great for exploring the relationships between vertices, and in future articles, we'll look at some of the more complex queries you can perform using Gremlin.

Wrapping Things Up

Now that you have a feel for how to do basic traversals in JanusGraph, we can start to explore some of the algorithms that are made possible using these methods. In our next article, we'll take a look at Centrality and discover just how many degrees away from Kevin Bacon we are.


Read more articles about Compose databases - use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Guiseppe Volpini

John O'Connor
John O'Connor is a code junky, educator, and amateur dad that loves letting the smoke out of gadgets, turning caffeine into code, and writing about it all. Love this article? Head over to John O'Connor’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.