Securing Notebooks on IBM's Data Science Experience

Published

Exposing sensitive information in Jupyter notebooks is a serious concern if you intend to share them. In this article, we'll show you two ways to do this with notebooks on IBM's Data Science Experience.

One of the most common uses for Jupyter notebooks is to share data. However, sharing data can come with a cost if you're not careful with sensitive information. When you want to share a notebook, information like your database connection string, usernames, or passwords need to be hidden from view. We'll show you how using IBM's Data Science Experience (DSX).

Starting a Jupyter Notebook on DSX

Let’s assume we have a Compose PostgreSQL database set up and loaded with the data from Whitehouse staff salaries. And we’ll assume you are already set up with a DSX project after following our Better Decision Making with Watson Machine Learning and Compose article. We’ve made a project called Compose Project.

At the top of the project overview page, next to the "Notebooks" section click New notebook on the right.

You'll have to create a name for the notebook, we'll call ours "Salaries". Select Python 3.5 as the language then click Create Notebook at the bottom of the page.

A new notebook will open. At the top right, you'll see the version of Python and Spark the notebook uses and next to that "Not Trusted". Don't worry about this. It's a Jupyter security precaution that prevents code from being executed when a user opens a notebook. Once you've updated any cell or saved the notebook, it will become "Trusted".

So we have our salaries data in PostgreSQL, but we don’t want to give out our connection string to others when we share the notebook ...

Share Less

The simplest way you can share a notebook and keep information hidden is by using the # @hidden_cell comment which is only supported on DSX notebooks. When sharing a DSX notebook, any cell that contains the comment is removed from the shared notebook and is only visible to collaborators who have editing access.

Say we have a Compose PostgreSQL database we want to get data from. For that, we have a connection string and that includes a password. The first thing we want to do is isolate that data into its own cell. Then we can add the # @hidden_cell comment at the top of it like:

# @hidden_cell
conn_string = "postgres://admin:xxxxxxx@aws-us-west-2-portal.2.dblayer.com:99999,aws-us-west-2-portal.1.dblayer.com:99999/compose"  

We've just inserted the connection string here. But, if you use the Data Assets connections in DSX to connect to your Compose PostgreSQL database, when you insert the connection into the notebook clicking Insert to code, the "# @hidden_cell" comment is added automatically.

After adding the connection string, you can then create a connection to your database and run your queries in other cells like:

import psycopg2 as pg

conn = pg.connect(conn_string)  
cursor = conn.cursor()  
cursor.execute("SELECT * FROM salaries WHERE salary > '175000'")  
results = cursor.fetchall()  
print(results)  

Then, when you want to share the notebook, click on the share button in the top menu; it will say "Share" when you hover over it:

A window will appear giving you several options to choose from. Select "Share with anyone who has this link" at the top, and under "Cell content" click "All content excluding sensitive code cells". This means that all the code will appear, except those cells where # @hidden_cell appears.

The link at the bottom is what you send to others to share the notebook. To test that others can see, copy and paste that link in a new browser window:

In the shared notebook, you'll see that DSX removes the hidden cell with the comment # The code was removed by DSX for sharing., while the other cells are still visible.

Once again, be careful if you add collaborators as editors to the notebook. They'll be able to view the hidden cells and share their notebook with others, which could expose your credentials.

Ask More

As another solution, if you're just looking to hiding sensitive information like usernames and passwords in a notebook, you can do that using the portable password input library getpass. This library comes with all distributions of Python so it's supported on local instances of Jupyter notebooks as well as on DSX.

Since getpass comes with Python, we just have to import it into the notebook. Then create a variable to store our database connection URI. Instead of adding the URI as a string in the cell, we'll use the getpass() method that comes with the library.

import getpass  
pg_conn = getpass.getpass()  

When running the cell, it will bring up an input box where we will paste in the PostgreSQL connection string. You'll notice that getpass hides the string as dots in the input box. Now, hit return.

This will store it in the notebook. On DSX, once you've saved the notebook, you'll always have access to the pg_conn value unless you restart the kernel. On a local Jupyter notebook instance, once you shut down the notebook, then the saved value is gone.

Now, when we want to query our database again, we simply use the same setup as we did above. However, this time, we'll use the pg_conn variable as the connection string:

con = pg.connect(pg_conn)  
cur = con.cursor()  
cur.execute("SELECT * FROM salaries WHERE salary < '40000'")  
data = cur.fetchall()  
print(data)  

Be aware that collaborators with editing access will have access to connection string stored in pg_conn.

Summing up

In this article, we've looked at two ways you can secure your database credentials within a shared version of a notebook on DSX by using # @hidden_cell and getpass. With # @hidden_cell, you have the ability hide entire cells that may contain sensitive information or information you don't want to share with viewers. But, with getpass, you can hide sensitive information like passwords or even connection strings, but it's not really designed for hiding entire cells that you don't want to share. Nonetheless, both methods of hiding sensitive data have their use cases and they're both available for you to use now on DSX.


Read more articles about Compose databases - use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution James Sutton

Abdullah Alger
Abdullah Alger is a former University lecturer who likes to dig into code, show people how to use and abuse technology, talk about GIS, and fish when the conditions are right. Coffee is in his DNA. Love this article? Head over to Abdullah Alger’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.