How To Persist User Sessions Without Much Effort
PublishedWould you consider tracking known user sessions in your current datastore if it wouldn't blow up your app server or overwhelm your database? We are going to show you a simple session tracking solution that uses Redis plus just a few a lines of code. For some of you, it might deliver most of the value that you need. For others, it may be a nice add-on to your app's data model.
We want you to consider a simpler point of view in regards to how you model and store the data you need to track sessions. Large analytics solutions view the web logs and application logs as the most basic record of events. In this type of modelling, these records are called the lowest grain. They are the most flexible data model available. If you capture events at this lowest level you can aggregate, or roll them up, as you see fit. In essence, you can query/process them in multiple ways at a later time. It's a trade for flexibility with the price of later processing and more storage. Logs, like the ones that follow, are denormalized and can appear unrelated, but they aren't. The problem is that to relate them requires processing.
75.189.143.121 - - [12/Feb/2016:16:44:14 -0500] "GET /page1 HTTP/1.1" 200 1103 0.8600
75.189.143.121 - - [12/Feb/2016:16:44:50 -0500] "GET /page2 HTTP/1.1" 200 1589 0.4018
75.189.143.121 - - [12/Feb/2016:16:46:20 -0500] "GET /page3 HTTP/1.1" 200 2833 0.3137
75.189.143.121 - - [12/Feb/2016:16:52:25 -0500] "GET /page2 HTTP/1.1" 200 1103 0.3275
There are many sophisticated solutions based on extracting information from logs and events. From the ELK stack (elasticsearch, logstash, and kibana) to segment.io to splunk, they all are pretty amazing at what they do. But what if we took a step back and considered not capturing the lowest level events? What if we also took the stance that we should store some of this data as part of our User model? What if we traded off all of that flexibility, processed the data upfront, and normalized the most important parts? What would it look like? A one to many of User to sessions? Might be nice.
"user1" -> [
{
_id: "user1:session1",
userId: "user1",
sessionId: "session1",
ip: "75.189.143.121",
date: "2016-02-12",
start: "16:44:14",
offset: "-0500",
estimatedDuration: "00:08:11",
timeOfDay: "Afternoon",
weather: "Mostly Cloudy"
tempF: "33",
location: "Elizabeth City, NC",
paths: [["/page1", 36], ["/page2", 90], ["/page3", 365], ["/page2", 0]]
}
]
While the dedicated analytics solutions of the world could and do provide this type of information, it is almost orthogonal to the application stacks which originally generate this data. Why? Don't these sessions seem to be valuable relations of your User entities?
What If?
What if we just built these sessions up as we went along? This could be very different than capturing detailed event data, pushing it around networks, and transforming it into queryable data structures. What if we tried something simple without venturing too far from our application and current data stores like Mongo or Postgres? Let's look.
What Does It Take To Make A Session?
To get a basic session, there are a few of things we will need to accomplish:
Define the boundaries of a session This can be rather arbitrary. An activity is part of a current session if it is within a timeout. The "rule of thumb" in the web analytics space is if an activity is within 30 minutes of the previous activity then it is part of that session. Otherwise, the activity starts a new session.
Gather and process related events We will do this much more simply than the full blown analytics solutions. They typically have to gather, transform, store, and then process the data to build up a session. We'll take advantage of the app that already has the session information and build the history up in Redis, treating Redis as a buffer, as we go along. This will save us a bunch of storage and processing costs.
Add some context data if you have it This is the place where outsourcing sessionization can "fall short". While the outsourced providers certainly have the basics down like geo ip address encoding and they have made professional choices for you in regards to session boundaries, they are by definition outside. With session data already in your application store, integrating with your business processes and annotating your user accounts with new information "gleaned" from these sessions isn't a full blown integration project anymore. It's already there.
Use Redis As Your "Gather" Buffer
This is really the crux of the matter: use Redis. By putting Redis "out in front" of our datastore, we protect it from all of the churn and chattiness of building up the sessions' data plus we offload any memory use which could hurt our app servers. This way all we'll need is three Redis data structures, an instrumented version of our app, and a small cron style processor to implement keyed user sessions in our normal app store.
Three Redis Data Structures
First, we need a Redis LIST to accrete the events. In essence this accumulates the line items, or log style events, under a combined user/session key.
user1:session1 -> [
'{"ip": "75.189.143.121","path":"/page1","at":"2016-02-12T16:30:08-06:00"}',
'{"ip": "75.189.143.121","path":"/page2","at":"2016-02-12T16:30:44-06:00"}',
'{"ip": "75.189.143.121","path":"/page3","at":"2016-02-12T16:32:14-06:00"}',
'{"ip": "75.189.143.121","path":"/page2","at":"2016-02-12T16:38:19-06:00"}',
]
Second, we need a Redis SET of all the known sessions. It's mostly a bookeeping set of userSessionKeys for both active sessions and ready to be processed sessions. The ready to process userSessionKeys will be removed in the post processing step.
known_sessions -> #{
user1:session1,
user2:session3,
user4:session56
}
Third, we need a volatile Redis KEY which disappears after a certain amount of time. This can be done in a straightforward manner by calling EXPIRE on a sentinel key. This sentinel key is just the userSessionKey, "user1:session1" in this example, plus ":timeout". If this sentinel key doesn't exist and the userSessionKey is in the known_sessions set then we can deduce that this session has timed out. Once it has timed out, the userSessionKey LIST is then ready to be processed and eventually flushed from Redis.
user1:session1:timeout -> "gobbledygook"
EXPIRE user1:session1:timeout 3600
Application Instrumentation
From within a typical web or RESTful style app, it is pretty straightforward to instrument an http call to your web server with a push to Redis. It is simple enough and doesn't make much of a performance difference at all.
In the ExpressJS example that follows, we have abstracted the group of calls necessary to populate the above three Redis data structures into one function. Plus, there is an example route which is instrumented by calling said function with the relevant info.
function recordEvent(userKey, sessionKey, ip, path) {
var userSessionKey = userKey + ":" + sessionKey;
var timeoutKey = userSessionKey + ":timeout";
var payload = {
ip: ip,
path: path,
at: moment().format(),
};
var multi = redisClient.multi();
multi.sadd("known_sessions", userSessionKey);
multi.rpush(userSessionKey, JSON.stringify(payload));
multi.set(timeoutKey, "gobbledygook");
multi.expire(timeoutKey, process.env.TIMEOUT);
multi.exec();
}
router.get('/somewhere', function(req, res) {
res.render('someTemplate', {title: 'Some Title'});
recordEvent(req.session.userID, req.sessionID, req.ip, req.path);
});
So to review the above, we'll break it down part by part. First, inside the recordEvent() function, we build up the keys. For performance reasons, it is typical for key-value stores like Redis to have a single, flat namespace for keys. This forces us to build up complex keys to avoid naming collisions which is what we do here:
var userSessionKey = userKey + ":" + sessionKey;
var timeoutKey = userSessionKey + ":timeout";
Next, we build up an object which will be serialized later. This object in some ways corresponds to a log entry. The interesting point is though that we won't give up the fact that this object belongs to a user and a session when we store it. Other info could easily be added here too.
var payload = {
ip: ip,
path: path,
at: moment().format(),
};
After this, the big Redis buildup multi-exec call. The bundling isn't that interesting other than it cuts down on the number of network calls. The actual execution of it is "fire and forget". It is nowhere near a transaction. It's more like sending over a little "filled out" script for Redis to execute for us. The interesting bits are the commands we send. They correspond directly to the three data structures above. The sadd() ensures the userSessionKey is in the "book keeping" set which will be iterated over each time we process sessions. The rpush() adds the serialized payload to the end of our userSessionKey list structure. The set() and expire() create the sentinel key and mark it as volatile. This key will disappear after the TIMEOUT.
var multi = redisClient.multi();
multi.sadd("known_sessions", userSessionKey);
multi.rpush(userSessionKey, JSON.stringify(payload));
multi.set(timeoutKey, "gobbledygook");
multi.expire(timeoutKey, process.env.TIMEOUT);
multi.exec();
So that's it for recordEvent(). The only thing to do from here is to call it, which is what we do from the following in the route handler after all of the application's work is done.
router.get('/somewhere', function(req, res) {
res.render('someTemplate', {title: 'Some Title'});
recordEvent(req.session.userID, req.sessionID, req.ip, req.path);
});
All of the above assumes a lot of things. None of them too difficult. It does make an assumption though that you would only instrument actions that have a known user. Otherwise, you might find yourself re-implementing a full web analytics stack which isn't really the point.
A Small Process, Called At An Interval, Maybe With Cron
Now that we have built up sessions, it is time to process them into the session structure we want to actually store and then to clean up Redis to reclaim those resources. This is the largest amount of code for this use case but it still fits pretty easily on this page. You could easily deploy it in a setTimeout() on a Node.js server. You could turn it into a cmdline package via npm and schedule it with cron. Or, you could even set it up as an AWS Lambda function that gets called every five minutes via a Scheduled Event Source. We'll provide the entire function for context and then afterwards will break it down. So, here it is, the whole "process the session" task:
function handler() {
redisClient.smembers("known_sessions", function(err, sessionKeys) {
sessionKeys.forEach(function(sessionKey) {
redisClient.exists(sessionKey + ":timeout", function(err, activeFlag) {
if(activeFlag === 1) {
//Session is still active
return;
} else {
//Session complete
var keys = sessionKey.split( ':' );
var sessionToStore = {
_id: sessionKey,
userID: keys[0],
sessionID: keys[1],
paths: []
};
redisClient.lrange(sessionKey, 0, -1, function(err, theSessionList) {
var parsedEvents = _.map(theSessionList, function(item) {
return JSON.parse(item);
});
var firstEvent = parsedEvents[0];
var lastEvent = _.last(parsedEvents);
var start = moment(firstEvent.at);
var end = moment(lastEvent.at);
var delta = moment.duration(end.diff(start));
//normalize
sessionToStore.ip = firstEvent.ip;
sessionToStore.date = start.format("YYYY-MM-DD");
sessionToStore.start = start.format("HH:MM:SS");
sessionToStore.offset = start.format("ZZ");
sessionToStore.estimatedDuration = numeral(delta.asSeconds()).
format("00:00:00");
parsedEvents.forEach(function(event) {
//you should do time deltas between each before adding
sessionToStore.paths.push(event.path);
});
//you could add business data, calls to forecast.io, and geo encode ip
MongoClient.connect(process.env.MONGO_URL,
{mongos: {sslValidate: false, ssl: true}},
function(err, db) {
var sessions = db.collection('sessions');
sessions.insert(sessionToStore, function(err, result) {
console.log("stored: ", result);
});
});
});
//Cleanup redis
redisClient.del(sessionKey);
redisClient.srem("known_sessions", sessionKey);
}
});
});
});
}
The handler starts by pulling all of the keys in the known_sessions set. This set contains keys to the LIST structures for both active sessions and completed sessions. It iterates through these keys, builds them up into sentinel keys by appending the ":timeout" portion, and then checks for sentinel key existence. If it exists, it's active. If it doesn't, it's complete.
redisClient.smembers("known_sessions", function(err, sessionKeys) {
sessionKeys.forEach(function(sessionKey) {
redisClient.exists(sessionKey + ":timeout", function(err, activeFlag) {
if(activeFlag === 1) {
//Session still active
return;
} else {
//Session complete
...
Once we know the session is complete, we split the keys and create a new object to hold the things we'll store.
var keys = sessionKey.split( ':' );
var sessionToStore = {
_id: sessionKey,
userID: keys[0],
sessionID: keys[1],
paths: []
};
Then we pull the entire LIST of serialized events with the lrange command. We iterate over each to deserialize them back into JSON objects.
redisClient.lrange(sessionKey, 0, -1, function(err, theSessionList) {
var parsedEvents = _.map(theSessionList, function(item) {
return JSON.parse(item);
});
In this next step, we normalize the data. We take things like ip which we know are the same on all of the events and just store them once. Also, we compute a lot of time attributes at the overall session level using both momentjs and numeraljs packages. In this example, there is no accounting for the last event's time duration. We'll leave that up to you. (Socket based apps like Meteor can report when the actual socket is disconnected. Since http servers are theoretically not session based it is very difficult to know exactly when that session is done without something else.)
var firstEvent = parsedEvents[0];
var lastEvent = _.last(parsedEvents);
var start = moment(firstEvent.at);
var end = moment(lastEvent.at);
var delta = moment.duration(end.diff(start));
//normalize
sessionToStore.ip = firstEvent.ip;
sessionToStore.date = start.format("YYYY-MM-DD");
sessionToStore.start = start.format("HH:MM:SS");
sessionToStore.offset = start.format("ZZ");
sessionToStore.estimatedDuration = numeral(delta.asSeconds()).
format("00:00:00");
The next part joins the paths. We left it as an "exercise" to compute things like time per path but it could be done with some variations to the code that computes time for the entire session.
parsedEvents.forEach(function(event) {
//TODO Should do time deltas between each before adding
sessionToStore.paths.push(event.path);
});
Finally, the last two parts store the reconstructed session in mongo in a sessions collection. Each document has it's corresponding userId. Then a little clean up by deleting the Redis LIST and remove the key from the known_sessions set. Done.
MongoClient.connect(process.env.MONGO_URL,
{mongos: {sslValidate: false, ssl: true}},
function(err, db) {
var sessions = db.collection('sessions');
sessions.insert(sessionToStore, function(err, result) {
console.log("stored: ", result);
});
});
});
//Cleanup redis
redisClient.del(sessionKey);
redisClient.srem("known_sessions", sessionKey);
Your Sessions Collection
Your users' sessions reconstructed and annotated with total time, possible business context, and a small history of interactions. Already in your data store. All from a few Redis data structures, a function call, and a simple process.
The Many Uses of Redis
As we've seen here, Redis can make some complex things simpler. It can protect our datastores from "thrashing" with many small,less important updates. It can relieve pressures from other parts of our application stacks by sharing data structures and memory for multiple processes. It is a well made tool. And whether or not this particular use case is for you, we do hope it expanded your notions of when to use Redis. Cheers.