Graph 101 is an article series on graph databases that explores graph algorithms from the ground up. If you’ve ever wondered whether or not a Graph database is a good approach for a problem you’re trying to solve, Graph 101 is the series for you.
Almost all graph database algorithms can trace their mathematical underpinnings to a construct called Markov Chains. In this article, we’re going to explore Markov chains from a practical standpoint and demonstrate how they can be constructed and used in JanusGraph.
Understanding Markov Chains
To understand Markov Chains, you must first understand traveling.
Imagine you're exploring a new city. You leave the train station with shops on every side of you and your hotel in the distance. Once you leave the train station the probability of going back there is really low - we’ll say 2% - compared to your probability of going to one of the other shops. You are given the probability that you'll enter each shop, and that once you have entered the shop, your chance of going to another shop adjacent to it. You also know that if you go into the hotel, you will probably stay there for the evening but there’s a 20% chance that you’ll go to one of the 2 surrounding shops (ie: you may stay in for the night, or you may decide to go back out shopping).
background by Vector Open Stock
We can use a Markov chain to generate ALL of the possible different paths that we might traverse the town. All we have to know at each step of the way is where we came from (ie: the town square) and the probability that we'll go a certain direction (ie: to the hotel, or back to the square). We can represent these as a probability matrix showing the probability that we’ll go from one place to another.
Now, if you think back to your university computer science classes, you might recognize these as Finite State Machines. Markov chains are a form of state machine, with each state represented as a vertex and the probability of a transition happening represented as an edge (this is also called the “transition probability” in mathematical graph speak).
Of course, Markov Chains aren't just for modeling your travels. They find application in many different fields of science and engineering, from making massive-scale predictions of climate models to making stock market predictions. One of the more interesting features of Markov chains is that they’re parallelizable, meaning you can compute the probability of a transition happening between two states regardless of what happened before that state, and you can compute many of these in parallel. Thus, you can compute multiple links across multiple machines all at once and then bring them together quickly.
This is one of many reasons that Markov chains were used in the PageRank algorithm, which we’ll explore in a future article. For now, we’re more interested in understanding the effect of the rules governing a Markov chain.
Modeling our Travels with JanusGraph
For our first example, we won’t need to model everything in the travel scenario above. Let’s start with a simpler model - a pair of nodes we’ll call “A” and “B”. In each step of the model, we’ll be equally likely to transition between the states as we are to stay in the same state.
We’ll start out by spinning up a Compose JanusGraph instance. We’ll be using the Gremlin console for this section, so make sure you have that. You can also follow along for instructions on how to connect to your Compose JanusGraph instance
Once we have the gremlin console up-and-running, we’ll set up our graph with Vertices and Edges connecting them. Let’s create a new graph using the
ConfiguredGraphFactory built into Compose JanusGraph. We’ll call this one
gremlin> :> def graph = ConfiguredGraphFactory.create("markov")
:> before the command; this is important as it sends the command over to Composes’ JanusGraph instance. If you’re getting errors at this step, it’s probably because you forgot that part.
We can access our newly created graph object through the
graph variable we just created using the Groovy
def keyword. We’ll now create a transaction to that graph using the
tx() method, and use the
commit() method to commit that transaction to our remote JanusGraph.
gremlin> :> graph.tx().commit()
If at any time you lose connectivity, or the
graph variable seems to disappear, you can always re-open it by calling the following:
gremlin> :> def graph = ConfiguredGraphFactory.open("markov")
Now, we’ll create our two vertices representing our two states; we’ll call them
gremlin> :> def stateA = graph.addVertex(T.label, "state", "name", "stateA") ==>v gremlin> :> def stateB = graph.addVertex(T.label, "state", "name", "stateB") ==>v gremlin> :> graph.tx().commit() ==>null
Note that we haven’t put any probabilities in here yet; the vertices are only the states within our chain. Edges are what represent our state transitions and are where the probabilities of each transition occurring are located.
In particular, we’re going to use a weighted edge with the weight representing our probability of making the state transition.
gremlin> :> stateA.addEdge("self", stateA, "weight", 0.5d) ==>e[2ru-380-2dx-380][4176-self->4176] gremlin> :> stateA.addEdge("aToB", stateB, "weight", 0.5d) ==>e[362-380-3yt-6ds][4176-aToB->8272] gremlin> :> stateB.addEdge("self", stateB, "weight", 0.5d) ==>e[3ka-6ds-2dx-6ds][8272-self->8272] gremlin> :> stateB.addEdge("bToA", stateA, "weight", 0.5d) ==>e[3yi-6ds-4r9-380][8272-bToA->4176] gremlin> :> graph.tx().commit() ==>null
That’s it! We’ve now input our graph into the graph database. The only thing left to do now is to traverse the graph. First, we’ll need to define a traversal object. Then, we can see which states are connected to each other:
gremlin> :> def g=graph.traversal() ==>graphtraversalsource[standardjanusgraph[astyanax:[10.189.87.4, 10.189.87.3, 10.189.87.2]], standard] gremlin> :> g.V().bothE() ==>e[3ka-6ds-2dx-6ds][8272-self->8272] ==>e[3ka-6ds-2dx-6ds][8272-self->8272] ==>e[362-380-3yt-6ds][4176-aToB->8272] ==>e[3yi-6ds-4r9-380][8272-bToA->4176] ==>e[2ru-380-2dx-380][4176-self->4176] ==>e[2ru-380-2dx-380][4176-self->4176] ==>e[362-380-3yt-6ds][4176-aToB->8272] ==>e[3yi-6ds-4r9-380][8272-bToA->4176]
Using Graph Algorithms
Now that we have a simple graph, it’s time for us to traverse our graph. Traversing is done in steps and there are many different step types we can use. For a comprehensive list, check out the gremlin algorithm documentation. We’ll also be exploring more of these in future articles.
For this simple example, what we’d like to do is simply trace a path from one vertex to another along the edges we defined above, and we’d like those to happen according to the probabilities we laid out in our weights. Since this is a cyclic graph, we’ll use the Cyclic graph step algorithm to generate a path between our nodes.
gremlin> :> g.V().has('name', 'stateA').both().both().cyclicPath().path() ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v] ==>[v, v, v]
Using this cyclic path algorithm we can now see all the different ways we can go from stateA to stateB, and how those would be distributed based on the weights we provided.
Onward and Upward
Hopefully you’ll now have an understanding of not only Markov chains, but how they can be implemented in a graph database such as JanusGraph. Markov Chains lay the foundations for many other graph algorithms, and we’ll build off of this knowledge going forward. Spend some time exploring the resources presented throughout, and attempt to build the more complex travel example at the beginning of this article so we have a solid jumping off point for next time.
If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at firstname.lastname@example.org. We're happy to hear from you.
attribution Scott Webb