Getting Started with Compose's ScyllaDB

Getting started with ScyllaDB is easy since it is a drop in replacement for Apache's Cassandra database. For all intents and purposes, Scylla looks just like Cassandra to your code. So much so that Scylla even uses Cassandra's drivers. The main difference is in implementation. Scylla is written in C++ while Cassandra is written in Java. Compose's ScyllaDB is the latest version: Scylla 1.3. This version corresponds to Cassandra 2.1.8 with a detailed compatibility matrix here.

One of the benefits of mimicking Cassandra is that the tool chain, drivers, and built in query language, cql, are already mature since they have evolved through multiple iterations and a great deal of use. The number of drivers on Planet Cassandra, all of which are compatible, are far beyond a typical 1.x project. cql, a SQL like language, has grown into being the de facto way to interact with Scylla/Cassandra and it even has its own shell, cqlsh, similar to many SQL shells for RDBMSs.

What follows is a brief run through of some of the highlights of connecting to ScyllaDB on Compose. After creating a deployment, we will look at getting connected with cqlsh then we will review connecting on the JVM, Python, and NodeJS runtimes to go over the basics of getting started.

Connect with cqlsh

Assuming you already have a Compose account (if not you can get a 30 day free trial here), creating a deployment of ScyllaDB is little more than hitting "Create Deployment" and choosing "ScyllaDB". After a couple of minutes, a three node cluster will have been created for you:

creds

The "Overview" page has all of the information needed to connect to your new Scylla cluster. The easiest way to verify your deployment and your tools is to connect directly with cqlsh. Depending on your platform there are multiple ways to get this tool onto your local device whether that be your laptop, a cloud VM, or even your own dedicated hardware. The easiest is to just install the latest Cassandra release (the latest versions still support version 2.1.8 which is what Scylla is) and use the builtin cqlsh. On a Mac with homebrew, it is nothing more than brew install cassandra. For others there are myriad ways from package managers to straight downloads. Use whatever suits your platform best.

From the "Overview" page it is easy to copy the cmd (any one of them will work):

sh_cmd

and then just paste it into your shell to execute it:

cqlsh

If you type HELP you can see that the shell has a lot of capability. What's even nicer is that all of those commands have TAB completion too. Let's try it. Type CREATE KEYSPACE my_new_keyspace <TAB><TAB><TAB> you should see the choices for the replication class. Go ahead and choose SimpleStrategy since the cluster won't be spanning multiple data centers. Hit <TAB><TAB> again and enter in 3 for the replication_factor. Then close the brace with } and finish the statement with ;<enter>.

You just created your first KEYSPACE and defaulted it to replicating your data to all three nodes in your cluster.

Now that you have a keyspace let's use it:

USE my_new_keyspace;

Your shell will show that your command prompt is using your keyspace by default:

keyspace

Every table has to have a keyspace and when we create one in the shell here it will default to my_new_keyspace.

While Scylla/Cassandra has evolved into having a schema language that looks very similar to SQL. It's not really the case. Unlike an RDBMS, a row here is much more like a key value lookup. It just so happens that the value has a flexible schema which we are about to define:

CREATE TABLE my_new_table (  
  my_table_id uuid,
  last_name text,
  first_name text,
  PRIMARY KEY(my_table_id)
);

Type that CREATE TABLE command in your cqlsh to give us a place to populate with the following examples.

Connect from the JVM

One of the most advanced drivers for Cassandra is the Java driver. This makes sense considering Cassandra is written in Java. What follows is a Groovy script. For those who utilize just about any JVM language translating from Groovy to your language of choice should be relatively straightforward:

@Grab('com.datastax.cassandra:cassandra-driver-core:3.1.0')
@Grab('org.slf4j:slf4j-log4j12')

import com.datastax.driver.core.BoundStatement  
import com.datastax.driver.core.Cluster  
import com.datastax.driver.core.Host  
import com.datastax.driver.core.PreparedStatement  
import com.datastax.driver.core.Row  
import com.datastax.driver.core.Session

import static java.util.UUID.randomUUID

Cluster cluster = Cluster.builder()  
    .addContactPointsWithPorts(
        new InetSocketAddress("aws-us-east-1-portal9.dblayer.com", 15399 ),
        new InetSocketAddress("aws-us-east-1-portal9.dblayer.com", 15401 ),
        new InetSocketAddress("aws-us-east-1-portal6.dblayer.com", 15400 )
    )
    .withCredentials("scylla", "XOEDTTBPZGYAZIQD")
    .build()

Session session = cluster.connect("my_new_keyspace")

PreparedStatement myPreparedInsert = session.prepare(  
  """INSERT INTO my_new_table(my_table_id, last_name, first_name)
     VALUES (?,?,?)""")

BoundStatement myInsert = myPreparedInsert  
    .bind(randomUUID(), "Hutton", "Hays")

session.execute(myInsert)

session.close()  
cluster.close()

To get started we pull in the latest Cassandra driver:

@Grab('com.datastax.cassandra:cassandra-driver-core:3.1.0')

After all of the imports we use a Cluster.builder() to build up the configuration. Just one of the ContactPoints is used to connect. From that connection the other nodes in the cluster are discovered. If that ContactPoint is unreachable on connect then another is used which is why we add all three.

PreparedStatements may be familiar since they are analogous to other DBs' features of the same name. The statement is parsed and held at the server ready to be used over and over again. The following calls to bind and execute populate and send the data over to the server for actual execution. While there are simpler methods for one off execution, it is good to highlight such a useful feature.

To prove that the script works go back to your cqlsh and query the table:

verifyQuery

Connect from Python

Support for languages other than Java is very solid too. Python is a great example. cqlsh is even written in Python. So make no mistake the support here is more than up to date:

pip install cassandra-driver

The above pulls in the driver with a python package manager pip. The following performs very similarly to the Java code of preparing a statement and executing an insert:

from cassandra.cluster import Cluster  
from cassandra.auth import PlainTextAuthProvider  
import uuid

auth_provider = PlainTextAuthProvider(  
                  username='scylla', 
                  password='XOEDTTBPZGYAZIQD')

cluster = Cluster(  
            contact_points = ["aws-us-east-1-portal9.dblayer.com"],
            port = 15401,
            auth_provider = auth_provider)

session = cluster.connect('my_new_keyspace')

my_prepared_insert = session.prepare("""  
    INSERT INTO my_new_table(my_table_id, first_name, last_name)
    VALUES (?, ?, ?)""")

session.execute(my_prepared_insert, [uuid.uuid4(), 'Snake', 'Hutton'])

To verify again we'll run the same SELECT:

verifyQuery2

Connect from NodeJS

Last but not least: Javascript.

npm install cassandra-driver  
npm install uuid  

We use the ubiquitous node package manager (npm) to install the driver and the needed uuid library. The very similar code to the above examples follows:

var cassandra = require('cassandra-driver')  
var authProvider = new cassandra.auth.PlainTextAuthProvider('scylla', 'XOEDTTBPZGYAZIQD')  
var uuid = require('uuid')

client = new cassandra.Client({  
                        contactPoints: [ 
                          "aws-us-east-1-portal9.dblayer.com:15399",
                          "aws-us-east-1-portal9.dblayer.com:15401",
                          "aws-us-east-1-portal6.dblayer.com:15400"
                        ],
                        keyspace: 'my_new_keyspace',
                        authProvider: authProvider});

client.execute("INSERT INTO my_new_table(my_table_id, first_name, last_name) VALUES(?,?,?)",  
               [uuid.v4(), "V8", "Hutton"],
               { prepare: true },
               function(err, result) {
                 if(err) { console.error(err); }
                 console.log("success")
               });

Once again we connect, prepare, and execute an insert statement. And finally we verify:

verifyQuery3

More

There is so much more to ScyllaDB. Modelling data from queries first. User defined data types. Tunable consistency. Building databases without joins. Timestamps. Architecting an app with eventual consistency. CAP theorem. PACELC theorem. Dynamo and BigTable. On and on...

The flexible availability guarantees of ScyllaDB/Cassandra really are a great tool and plumbing the depths of how to make them work well can take some time. We at Compose though are excited about ScyllaDB and look forward to seeing what you can do with such a great new database.