How Hyver uses ScyllaDB for API Key Management
PublishedBuilding a platform means building an API to go with it and building an API means managing the keys to that API efficiently. Kirk Morales, CEO at Hyver, has used ScyllaDB to solve this tricky problem and in this Write Stuff article, he shows how you can too.
When building a platform, clients will need to interface with it in various ways. At first, basic user authentication may be all that's necessary, but as you start allowing for integrations and open your platform up to developers, you'll need to develop an API key management solution that scales.
At Hyver, we consider what we've built a platform more than just a product. As such, we realize more and more types of clients and integration points for pushing and pulling data to/from us. We made the mistake early on of building highly-specific authentication mechanisms for our immediate clients: a web user and Zapier. Instead of a generic key management system, we authenticated Zapier requests uniquely and even had a custom table and columns for keys in our database. As our need for additional integration points surfaced, we realized we had to re-build how we authenticate requests and the underlying architecture for doing so.
ScyllaDB is great for storing, managing, and performing quick lookups of API keys when needed for authentication. Due to its high throughput and fast reads, request latency isn't increased drastically as it could be with other databases. Also, the schema and read/write operations are incredibly simple to maintain.
Defining the schema
At a high level, we need to consider the following entities:
Permissions A set of permissions for an API key, defining what its capabilities are. Each API key should have a unique set of permissions so as to control how it can be used. If extra or fewer permissions are needed by another client, a brand new API key should be generated for that specific client.
For the sake of simplicity, we'll define permissions by using just a string, consisting of one of the following values:
management - Access to management functionality, minus billing, e.g. updating account settings or getting a list of objects within an account.
billing - Ability to manage an account's billing settings, including the subscribed plan and card on file.
full - Full access to the account.
Accounts This is our parent object, to which API keys will belong and the object being authenticated. Whatever a key's permissions will be allowed only on the account it's associated with.
At a minimum, we'll need a table to store accounts with a map
containing its API keys.
CREATE TABLE accounts (
account_id uuid PRIMARY KEY,
api_keys map<uuid, text>
);
We use a map
type here for a few reasons. First, storing the keys directly in the accounts
table prevents us from having to make subsequent SELECT
statements to read all keys for an account. Additionally, we can update multiple keys at once with a single UPDATE
command. Lastly, since we're storing permissions as a basic string, retrieving API keys as a map
allows us to do a quick permissions lookup in code.
API Key A key that is provided in the request, belonging to a single account. An account may have multiple API keys for different purposes. When creating the table to store keys, we want to make sure each key is linked to the account it belongs to as well as the appropriate permissions for a quick look up.
CREATE TABLE api_keys (
api_key uuid PRIMARY KEY,
account_id uuid,
permissions text
);
Since a client will only ever be sending an API key with a request, we need API keys to be unique to a single account (therefore, unique universally), so we place the PRIMARY KEY
only on api_key
. If we made it across (api_key, account_id)
, we could have duplicate API keys in the wild, each associated with a different account...very bad.
Now, you may be wondering why we split these out. Having redundant data in both the accounts
and api_keys
tables seems like a waste of space. Extra storage? Yes. A waste? Absolutely not.
We could put an index on account.api_keys
to use when authenticating requests, but due to how ScyllaDB handles indexes, this is wildly inefficient and much slower. It is recommended not to use indexes on high-cardinality values and instead store them in a separate table. Read this Datastax article for more insight as to why.
So, by creating another table, yes, we're increasing our storage footprint, but we're ensuring a fast and efficient lookup.
Creating API Keys
Let's start to fill up our database. First, we'll create an account.
INSERT INTO accounts (account_id) VALUES (uuid());
SELECT account_id FROM accounts;
From this, we'll get a new Account ID, let's say 4e45f0e8-ade5-48fd-862d-d657f299828b
.
Next, we'll define three API keys, one for each permission type of "full", "management", and "billing".
UPDATE accounts
SET api_keys = {uuid(): 'full', uuid(): 'management', uuid(): 'billing'}
WHERE account_id = 4e45f0e8-ade5-48fd-862d-d657f299828b;
In this example, we're letting CQL create new UUIDs for us, however, if we already have IDs we want to use (such as ones generated in code), we can set them in a similar fashion:
UPDATE accounts
SET api_keys = {
4f6f1cf5-ba62-4b9c-be39-ee73a02e6dab: 'full',
462d3e55-4903-405f-aeec-cb46b327b025: 'management',
282a3b8b-f20c-4b5c-b38b-ec2e7d943c32: 'billing'
}
WHERE account_id = 4e45f0e8-ade5-48fd-862d-d657f299828b;
We now have an account with three active API keys. Last thing to do is add them to our api_keys
table for use in our API when authenticating requests. We'll add each key as a separate record, associating them with the account and the appropriate permissions:
INSERT INTO api_keys (api_key, account_id, permissions) VALUES (4f6f1cf5-ba62-4b9c-be39-ee73a02e6dab, 4e45f0e8-ade5-48fd-862d-d657f299828b, 'full');
INSERT INTO api_keys (api_key, account_id, permissions) VALUES (462d3e55-4903-405f-aeec-cb46b327b025, 4e45f0e8-ade5-48fd-862d-d657f299828b, 'management');
INSERT INTO api_keys (api_key, account_id, permissions) VALUES (282a3b8b-f20c-4b5c-b38b-ec2e7d943c32, 4e45f0e8-ade5-48fd-862d-d657f299828b, 'billing');
Authenticating Requests
Now for the easy part -- when a client makes a request, to check the validity of the supplied API key and get permissions, we make one request:
SELECT * from api_keys WHERE api_key = SOME_KEY;
We'll get the Account ID (which we'll likely need to execute the requested operation) and will get the text-based permissions for the key, telling us if we can authorize the request. From here, there are a few possibilities:
- The API key doesn't exist - we'll throw a
401 Unauthorized
- The API key exists, but its permissions don't match the request. We can throw a
401 Unauthorized
or403 Forbidden
.
Unless we need to see an existing account value, we can process the requested action without having to do another SELECT
since we already have the account ID.
Managing Existing API Keys
The trade-off for efficiency is seen when we have to update existing keys by making multiple writes. Not only do we need to make multiple writes, but we'll want to batch the requests to ensure they all happen at once. We're okay with this, though, because we can afford the extra write cost now more than a larger read cost when authorizing a request.
Update permissions
Updating the permissions for an existing key requires us to update the text
value for the map
in the accounts
table as well as the permissions
column in the api_keys
table.
Let's update the permissions for the API key 4f6f1cf5-ba62-4b9c-be39-ee73a02e6dab
to be "management".
BEGIN BATCH
UPDATE accounts SET api_keys[4f6f1cf5-ba62-4b9c-be39-ee73a02e6dab] = 'management' WHERE account_id = 4e45f0e8-ade5-48fd-862d-d657f299828b;
UPDATE api_keys SET permissions = 'management' WHERE api_key = 4f6f1cf5-ba62-4b9c-be39-ee73a02e6dab;
APPLY BATCH;
Add a key
To add a new key, we simply add it to the map
in accounts
and create a new row in api_keys
. Let's issue a new "full" API key 1939b3d5-34ae-436f-a7f0-02ac1368194b
:
BEGIN BATCH
UPDATE accounts SET api_keys = api_keys + {1939b3d5-34ae-436f-a7f0-02ac1368194b: 'full'} WHERE account_id = 4e45f0e8-ade5-48fd-862d-d657f299828b;
INSERT INTO api_keys (api_key, account_id, permissions) VALUES (1939b3d5-34ae-436f-a7f0-02ac1368194b, 4e45f0e8-ade5-48fd-862d-d657f299828b, 'full');
APPLY BATCH;
Remove a key
If you need to deactivate or remove a key, we simply remove all references to it. Let's delete the key we just created:
BEGIN BATCH
DELETE api_keys[1939b3d5-34ae-436f-a7f0-02ac1368194b] FROM accounts WHERE account_id = 4e45f0e8-ade5-48fd-862d-d657f299828b;
DELETE FROM api_keys WHERE api_key = 1939b3d5-34ae-436f-a7f0-02ac1368194b;
APPLY BATCH;
Wrap up
Although it may seem trivial, proper and efficient management of API keys on your platform is crucial to the security of your data and speed of fulfilling requests.
It's tempting to simplify your schema into a single table -- sure, you only have to make a single request to update keys and permissions and don't have to worry about data consistency across tables. Those benefits, however, are highly outweighed by the inefficiencies and subsequent speed loss due to performing an indexed lookup rather than a PRIMARY KEY
lookup on a separate table.
However you define your API key and permissions scheme, be sure to keep a few points in mind:
- Determine how your data will be looked up most frequently and determine if it makes sense to abstract it out to its own table rather than relying on an
INDEX
. - Your permissions scheme may be more complex than the
text
types we used in this example. In that case, consider linking each API key to auuid
that represents another object defining the permissions OR simply JSON-encode the permission data into the sametext
value. - Monitor common requests and their subsequent performance against your database. Look for long-running requests and other areas you can improve your schema to limit the number of reads from your database.
attribution Rubén Bagüés