"3","CSV and other databases"

In this final part, we'll be looking at how to import and export CSV (comma separated variable) files with the other databases available on Compose, including PostgreSQL and RethinkDB with tips for other databases. In the first part of this series we looked at exporting CSV from MongoDB, so we start this round up by covering how to import it.

MongoDB importing

It will come as no surprise, for those who read part 1, to find that the mongoimport command is remarkably similar to the mongoexport command. Here's an example of it:

mongoimport --host=c343.lamppost.3.mongolayer.com:10343 --db=samples --collection=newfilms -u example -p password --type csv --headerline films.csv  

Most of the command line is setting up the connection to the database. It's the last part we're interested in – --type csv --headerline films.csv. The --type csv sets the import to take in CSV files (it can also take TSV files). While CSV is generally agreed to have its field definitions on the first line of the file mongoimport will, by default, assume that you are going to define the fields separately.

Why define the fields yourself? Well, the fields (or fieldsFile) option does let you apply some structure to your documents. In the same way that mongoexport let you extract particular fields in a document to a CSV, you can have a fields file that says:

film_id  
about.title  
about.description  
about.release_year  
language_id  
rent.rental_duration  
rent.rental_rate  
rent.length  
rent.replacement_cost  
rating  
last_update  
special_features  
fulltext  

This takes the title, description and release_year fields and puts them as children of an about value and the rental_duration, rental_rate, length and replacement_cost as children of rent. If we use this with mongoimport:

mongoimport --host=c343.lamppost.3.mongolayer.com:10343 --db=samples --collection=morefilms -u example -p password --type csv --fieldFile film_fields.txt films.csv  

And if we look at the database, we'll find records like this

{
  _id: ObjectId("56a8bb3a414f0983b63a8b1b"),
  film_id: 6,
  about: {
    title: "Agent Truman",
    description: "A Intrepid Panorama of a Robot And a Boy who must Escape a Sumo Wrestler in Ancient China",
    release_year: 2006
  },
  language_id: 1,
  rent: {
    rental_duration: 3,
    rental_rate: 2.99,
    length: 169,
    replacement_cost: 17.99
  },
  rating: "PG",
  last_update: "2013-05-26 14:50:58.951000",
  "special_features ": "['Deleted Scenes']",
  fulltext: "'agent':1 'ancient':19 'boy':11 'china':20 'escap':14 'intrepid':4 'must':13 'panorama':5 'robot':8 'sumo':16 'truman':2 'wrestler':17"
}

There's one fly in the ointment and that is this was a properly formatted CSV file which means we just copied the header line in as a document. Just remember to do tails -n +2 to skip the first line and feed that to mongoimport like so:

tail -n +2 films.csv |mongoimport --drop --host=c343.lamppost.3.mongolayer.com:10343 --db=samples --collection=morefilms -u example -p password --type csv --fieldFile film_fields.txt  

The sharp-eyed will notice we added the --drop option in there to drop the collection before importing. It's one of the many other options for mongoimport.

RethinkDB

So you want to import or export CSV data into your RethinkDB database. Well, RethinkDB has you covered. You'll need to install rethinkdb locally to make use of the rethinkdb import and rethinkdb export but there's currently another catch, those commands don't support the SSL connections Compose supports by default. But, they do work if you enable an SSH portal and configure a connection that way. If you go to the Security tab on your Compose RethinkDB and turn the SSH portal on then follow the instructions in this article you'll have an SSH tunnel between your system and your Compose RethinkDB running in no time. Once you've done that it takes just one command:

rethinkdb import -c 127.0.0.1:28015 -a authkey -f films.csv --table example.films --format csv  

This will import a CSV file called films.csv into a RethinkDB database called example and a table called films; if they don't exist then they will be created. This uses the first line of the CSV file to determine field names. And that's almost all there is to it. The import, if you let the import create the table, will be a single replica on a single shard. Worry not, and head over to [this article] where we explain what that means and how to fix it.

Setting up SSH also allows you to use the export, dump and restore commands. To learn more about the import command, check out the RethinkDB documentation. For dump and restore there's also a page in the documentation but don't forget RethinkDB backups happen automatically on Compose.

PostgreSQL

In the previous part of this series, we talked about CSVkit and how that had the ability to extract or upload CSV files as part of CSVkit's SQL support. There is a built in way in PostgreSQL to import and export CSV files. The SQL COPY command is designed to work locally on the server so we can't use that on Compose; there's nowhere for the files to go. But the command line for PostgreSQL, psql, has a \copy command and that can do everything COPY can do remotely. This is great for exporting data from PostgreSQL; assuming we have a films table and we want to export all of it, at the psql> prompt we can do:

 \copy films to 'filmsout.csv' with csv header
 ```

And the entire table can be written to the file as CSV. You can be more selective and use a select to pick what data you want exported too:

 ```
\copy (select * from films where replacement_cost > 20) to 'filmsout.
csv' with csv header  

Simple. There are some caveats to using /copy. If you want your import to create tables, you'll be out of luck; it only imports into existing tables. If you have more extensive CSV uploading needs that either csvsql or \copy can offer you then your next port of call could be pgloader which is dedicated to uploading of data into PostgreSQL from CSV, SQLite, MySQL and others.

The Full Stop

We'll stop here as it would be easy to disappear down the rabbit-hole of the many other formatting options. The rule of thumb is that there's probably already a CSV import or export utility built into your database and if there isn't, you can use CSVkit to turn it into JSON and import that, or take JSON output and use CSVkit to flatten it into a comma separated variable file.