1,"Exporting CSV from MongoDB"

A look at MongoDB CSV exporting makes up the first part of our Comma Values series, all about the rarely loved CSV files and the databases that import/export them.

CSV formatted files are like the uncle that that no one really likes to mention in polite company. Yet they turn up all over the place, especially when getting data for spreadsheet analysis or pulling data from legacy systems for import.

So, over the next few weeks we'll be spreading a bit of goodwill to the CSV format and showing you how you can import and export it from the databases of Compose and how you can add some handy tools that make manipulating CSV files easy. In this first article, we're going to look at exporting CSV from MongoDB but first let's quickly refresh on what CSV format is.

CSV: Things to remember

Your CSV file is a tricky chap. Grew up in the field with no standards and goes all the way back to 1967 according to some. The idea is simple enough: a file which has lines which are delimited by a carriage return+line feed. Each line has values and each value is delimited by a comma.

That's about as far as you can go without adding "maybe" or "except for" to every attribute as the authors of 2005's RFC 4180 found. That is the current CSV file "stanadard" though it is more a gathering of "things we know about CSV files". There may be double-quotes around some values such as strings. The first row may be a set of headers describing the fields in subsequent rows. We mention this so you don't assume that a program that takes CSV as an input will accept any and all CSV variants. We'll try and flag where there may be oddities in any exported CSV files.

The most important thing to know about CSV files is that they are no use if you want to represent a hierachy like a JSON document. The data in structures like that will need flattening. Let's see an example of that in our look at exporting data from MongoDB...

MongoDB and CSV

Mongoexport is the go-to tool for exporting data from MongoDB. It can export data in CSV or JSON format, but if you want to export CSV you have to specify all the fields you want to export and the order they come in. So let's say you have a collection with records that look like this:

{
  _id: ObjectId("56685b53414f0983b63a8630"),
  title: "The Godfather",
  num_votes: 1072605,
  rating: 9.2,
  year: "1972",
  type: "feature",
  image: {
    url: "http://ia.media-imdb.com/images/M/MV5BMjEyMjcyNDI4MF5BMl5BanBnXkFtZTcwMDA5Mzg3OA@@._V1_.jpg",
    width: 333,
    height: 500
  },
  viewed: [
    {
      name: "Brian",
      rating: 10
    },
    {
      name: "Tim",
      rating: 8
    }
  ]
}

Now, the simplest export we might want to do is a CSV export of title, year and rating. Here's a mongoexport command to do just that:

[~] mongoexport --host=host.example.com:port --db=samples --collection=movies -u user -p password --type csv --fields title,year,rating
title,year,rating  
The Godfather,1972,9.2  
The Dark Knight,2008,9  
The Shawshank Redemption,1994,9.3  
The Lord of the Rings: The Return of the King,2003,8.9  
...

So, running through the parameters, we give the host and port using --host, database with --db, which collection to export with --collection, and the user and password are parameters for -u and -p.

And in the output we see the wonder of CSV as you wonder why there's no quotes around the titles or headings. Double-quotes are added only when needed here so further down the exported titles we see:

"Lock, Stock and Two Smoking Barrels",1998,8.2

That's escaped with double quotes because it contains a comma. Now, fields in CSV don't need to be quoted but then people have implemented importers that expect quotes because the CSV import may actually be tuned for the CSV export of some other application. And if you are wondering, if there's a double quotes in a string delimited by double quotes... then CSV should turn it into double double-quotes. Let's move back to getting some more data out of MongoDB.

You will notice we listed the fields to export as a comma separated list in the command line. You can move them to a new-line delimited file and use that instead. So let's create a fields.txt file:

title  
year  
rating  

And run our mongoexport command, switching --fields out for --fieldFile like so:

$ mongoexport --host=host.example.com:port --db=samples --collection=movies -u user -p password --type csv --fieldFile fields.txt
title,year,rating  
The Godfather,1972,9.2  
The Dark Knight,2008,9  
...

Of course, we really want to save the results in a file. Add a --out with a destination filename for that. You'll probably notice the log output then when the database connection is opened and when it's complete. Add a --quiet to silence that.

All easy enough, but there's the JSON elephant in the room. How do you refer to fields which aren't in the top level of the JSON. Here's an example of how:

title  
year  
rating  
image.url  
viewed.0.name  

This fields.txt file has added access to the url field within the JSON object image and to the name field of the object at index position 0 in the JSON list viewed. As with all these specifiers of fields, watch out; if it doesn't match a field, it doesn't throw an error, it just puts out an empty value.

There's a couple of other useful options on mongoexport worth mentioning. The --query option lets you use a query to select what records are exported. So adding --query '{rating:{$gt:8.5}}' would, on our movie collection, only give us the movies rated over 8.5. Note the use of the single quotes to supress the expansion of the "$gt" as if it were an environment variable. The more complex your command line, the more likely you are to trip over the shell trying to be helpful by expanding or replacing things like that.

You can also sort the export with --sort and a JSON sort specification so adding --sort '{rating:1}' will sort with an ascending rating. There's also a --skip and --limit, working the same way as their MongoDB equivalent commands, to reduce the number of items exported.

Finally, if you need to connect to an SSL enabled MongoDB, there's the various SSL options to mongoexport, including – most importantly – --ssl which will enable SSL on the connection.

So now we are exporting data as CSV from MongoDB. In the next part, we'll be having a look at tools that can manipulate those CSV files and send them on to other databases. Stay tuned.