Using Ruby & Resque With MongoDB? Improve Your Performance With a Single Gem.

Who should win: your web visitors or your background workers? Hopefully, you will not have to answer this question … or if you are planning for exponential growth, hopefully you will.

We have seen a pattern recently with MongoDB, Ruby, and Resque (the background processor). Resque behavior combined with the Ruby driver behavior prevents background worker performance from scaling linearly. When Resque forks, the Ruby driver creates a new connection to the database. This translates into one new connection per processed job. If you are running 1000s of jobs / second, it will cause MongoDB to run less than optimal.

The quickest solution is the ‘resque-jobs-per-fork’ gem. As the name implies, Resque will run multiple jobs per fork. Instead of a 1-to-1 job to connection pattern, you will have 500-to-1 or 1000-to-1 job to connection pattern.

The Cause

As simple as it sounds, your connection pattern to your MongoDB affects your customer’s experience with your product. Good connection patterns consist of long running, persistent connections. When looking at the logs, you should only see a “connection accepted from” every 10 - 15 seconds, even on large scale deployments. Most of the time, poor connection patterns exist from the beginning of an application’s development, but they only become evident due to performance issues. These performance issues arise as databases grow in size and the application grows in usage.

With MongoDB, poor connection behavior of an application is exposed due to the following constraints: 1) per connection memory overhead and 2) read lock causing slow authentication.

Per Connect Memory Overhead

As of the 2.0 branch, each connection in MongoDB, allocates 1MB of RAM. Before 2.0, it was dependent on system ‘stack size’ settings. The following code in MongoDB shows the per connection memory usage algorithm:

static const size_t STACK_SIZE = 1024*1024; // if we change this we need  
to update the warning

struct rlimit limits;  
verify(getrlimit(RLIMIT_STACK, &limits) == 0);  
if (limits.rlim_cur > STACK_SIZE) {  
  pthread_attr_setstacksize(&attrs, (DEBUG_BUILD
    ? (STACK_SIZE / 2)
    : STACK_SIZE));
} elsif (limits.rlim_cur < 1024*1024) {
  warning() << "Stack size set to " << (limits.rlim_cur/1024) << "KB. We
suggest 1MB" << endl;  
}

mongo/util/net/messageserverport.cpp#L78

As with most everything in life, one of anything is practically nothing (think cars in traffic). 1 MB is a rounding error of modern RAM sizes. However, a modern large scale application consists of many components with many requests. These 1000s of operations per second could turn into major RAM usage if implemented incorrectly.

Given MongoDBs reliance on good RAM usage, deficient RAM usage can quickly ruin performance.

Write Lock & Slow Authentication

When using authentication, each connection and authentication action is a database query. If a database is under heavy write load, the authentication will be as slow as the rest of your queries. Thus, rapidly connecting to a database in a high-write environment authentication will have to navigate other locks for your database.

As with Memory Overhead, these deficiencies are not typically noticed until the application grows in data and usage.

Resque’s Forking Code & Ruby Driver’s Reconnect

Resque uses process forking to spawn new workers for each job resque/blob/master/lib/resque/worker.rb#L137.

if Kernel.respond_to?(:fork)  
  Kernel.fork &amp;block if will_fork?
else  

Ruby driver reconnects mongo-ruby-driver/blob/master/lib/mongo/util/pool.rb#L240.

if socket.pid != Process.pid  
  @sockets.delete(socket)
  if socket
    socket.close unless socket.closed?
  end
  checkout_new_socket
else  

The Ruby Mongo Driver does this because managing connections between parent and child processes in Ruby is a beast. The fool-proof method is to re-initialize connections on each fork.

Resque & Ruby : The Effect

require 'rubygems'  
require 'mongo'

@conn = Mongo::Connection.new("localhost", 27017, :pool_size => 10,
:pool_timeout => 5)
@db   = @conn['resque_connection_test']
@coll = @db['users']

puts @db.command({getLastError: 1})

1.upto(10) do  
  client = fork do
    puts @db.command({getLastError: 1})
  end

  Process.wait(client)
end  

The code above mimics the effects of a standard Resque worker. Each “puts” prints a different “connectionId”, thus each fork establishes a new MongoDB connection. If you are watching the MongoDB logs, you will see 11 lines containing “connection accepted from.”

Debugging In MongoHQ

Mongostat hides most performance issues due to poor connection patterns. With the example above, mongostat would show the same number of connections as your have Resque workers. By looking at the logs, you will see a new line containing “connection accepted from” for each time that Ruby forks the process.

Debugging decreased performance due to connectivity issues requires access to the Mongo logs. If you see more than a few connection attempts every few seconds, please consider a method to use longer persistent connections. You will achieve better resource usage, and better product delivery for your clients.

MongoHQ shared plans have access to real time logs from the Mongo server. Using these real time logs, you can see your connection patterns. For assistance, please E-mail support@mongohq.com.
- Resque & Ruby are not the only offenders with poor connection patterns. The stock Node.js driver tests for Replica Set status every second for each process – issuing reconnections. PHP and Apache is evil when not configured properly – the continuous building up and tearing down of workers triggers new connections.