In this Write Stuff article, Kirk Morales looks at how SQL database techniques can translate over to NoSQL databases like Scylla, starting with user and session management.
The modern advantages of NoSQL databases like ScyllaDB often come with few learning curves and a change in mentality. The basic, widely-used concepts of product architecture remain the same but need to be approached with a new conceptual mindset.
User and session management is the cornerstone of every product. Without it, the application has no idea who's performing what tasks, what roles are to be enforced, and what actions are permitted as a result. Products must be secured and proper user management is the guardian.
Here, we'll look at the comparisons between how basic user management in Scylla differs from the traditional approach in SQL.
Start with the schema
Let's start with the user's table first. This is the simplest as the base tables in both SQL and Scylla are very similar:
CREATE TABLE users ( user_id MEDIUMINT NOT NULL AUTO_INCREMENT, email varchar(255), password_hash varchar(255), PRIMARY KEY (user_id) );
We create a table that stores a user's ID, email address (which they'll use when logging in), and a hash of their password. The user ID is auto-generated when we do a new insert and is the primary key of the table:
INSERT INTO users (email, password_hash) VALUES ('email@example.com', '5f4dcc3b5aa765d61d8327deb882cf99');
CREATE TABLE users ( user_id uuid, email text, password_hash text, PRIMARY KEY (user_id) );
You'll notice this looks largely similar to the SQL version, except for the different data types. Also, ScyllaDB doesn't support auto-increment columns. Instead, we store the
user_id as a
uuid, which we supply when we create the entry:
INSERT INTO users (user_id, email, password_hash) VALUES (74d6b8a0-fc78-4815-acda-7a0eb076b617, 'firstname.lastname@example.org', '5f4dcc3b5aa765d61d8327deb882cf99');
Having our primary key on the
user_id column means we must provide it in every query to the table. We'll address how to handle email lookups in a bit.
When a user logs in with their email and password, we first need to do a simple lookup to make sure the password provided matches the email address. If it does, the user has successfully authenticated. This works fine in SQL, where we can do a single query to validate:
SELECT user_id, password_hash FROM users WHERE email = 'email@example.com';
If we get a result, we can test the password hash and, if it matches, we now have the ID of the user that successfully authenticated.
Validate a user with ScyllaDB
With ScyllaDB, this lookup gets a bit more complex. Since our primary key is on the
user_id, we can't do a lookup on that table without it. We need a way to look up users by their email address.
Why not make email the primary key?
This may be your knee-jerk thought - why not just lose the
user_id altogether, key off of the email, and use that to do the lookup? Sure, this would work fine, however it compromises the long-term maintainability of our data. As we begin to reference this user across other tables, we would then rely on mutable data (the email) as keys. With Scylla, we can not change the primary key of a row once created. So, if the email changed, we'd have to re-write any rows across the entire database where it's used. Add in the
user_id, a static value representing the same user, even if their email changes, solves this.
Create email lookup table
To solve this email lookup issue, we simply create a separate table that allows us to look up a
user_id by an email:
CREATE TABLE emails ( email text PRIMARY KEY, user_id uuid );
When we create a new user, we now make two inserts:
INSERT INTO users (user_id, email, password_hash) VALUES (74d6b8a0-fc78-4815-acda-7a0eb076b617, 'firstname.lastname@example.org', '5f4dcc3b5aa765d61d8327deb882cf99'); INSERT INTO emails (email, user_id) VALUES ('email@example.com', 74d6b8a0-fc78-4815-acda-7a0eb076b617);
To validate a user, we make two reads. The first lookups up the user_id by email:
SELECT user_id FROM users where email = 'firstname.lastname@example.org';
Once we have that, we can get the password and validate it:
SELECT password_hash from users where user_id = 74d6b8a0-fc78-4815-acda-7a0eb076b617;
Now, you may be thinking that this is terribly inefficient, however with high availability and low latency, Scylla is designed to handle these types of well-indexed requests quickly.
An alternative Scylla schema for email lookups
If you'd prefer not to make two queries for authentication, we can easily reduce this to one by storing the password redundantly in the
emails table. In this case, we would simply update both the
emails table each time either a user's email or password changes and our
emails table would look like this:
CREATE TABLE emails ( email text PRIMARY KEY, user_id uuid, password_hash text );
Tracking user's sessions
Once a user is validated, we may need to track information about their session, such as the expiration time, browser details, etc. To do this, let's define a basic
sessions table to store this data in.
CREATE TABLE sessions ( session_id uuid, user_id uuid, expires timestamp, PRIMARY KEY (session_id, user_id) );
Here, we create a compound primary key on both
user_id, requiring both as an extra measure of protection so that would-be assassins can't try to get session data without knowing the if the associated user and vice versa.
After validating a user, we assign a
session_id, create a record in the
sessions table, and return both the
session_id to the client, likely using some form of signed value that can be validated itself later. In subsequent requests, after validating that the
session_id are trusted, we can make a request from the
sessions table to get the session data:
SELECT * FROM sessions WHERE session_id = SESSION_ID and user_id = USER_ID;
Contrasting with SQL
The method for session management in this way is not so different with SQL. We'd still need a separate
sessions table, would need to have that session ID and the user ID pass between the client and server, and would make a single read with each request to get session data. With Scylla, however, the average latency of our reads are likely to be faster with higher availability.
When designing a schema in Scylla, often a good place to start is with how the schema would be designed for SQL. From there, you can determine what data types need to be changed and what lookups may need to be performed on non-primary key data.
Although there are some trade-offs with using Scylla for basic programming concepts like user management, there are many pros to help outweigh the possible inefficiencies:
- Low latency - on average, request times are shorter than with SQL
- Unlimited, linear scalability
- Use of common, well-supported CQL syntax, inherited from Cassandra
This article has been published as part of Compose's Write Stuff program - if you want to write about databases or share your experiences with database technology, we invite you to participate.
attribution Joe Hernandez