Long-lived background Ruby processes

I needed a way to run a process alongside a Rails application for an always-running live data ingestion task. Sidekiq is only meant for short jobs, but Sidekiq itself is a long-lived process. That gave me the idea to look at the source code of Sidekiq, Resque and Puma for inspiration.

Update: In the meantime, Sidekiq itself has removed this capability and relies on a supervising process like systemd.

In order to have a long-running process, we need:

A way to daemonize the process
A logger, so that we know what's going on
A PID file, so that we can stop the process
A signal handler to handle shutdowns gracefully (look at Sidekiq for a more flexible take on this)
A Rake task to easily launch the process with a Rails context

Start

The easiest way to start the daemon is with a Rake task:

task ingest: :environment do
  Ingestion.new(
    logfile: Rails.root.join('log', 'ingestion.log'),
    pidfile: Rails.root.join('tmp', 'pids', 'ingestion.pid')
  ).ingest
end

The dependency to environment is essential. It's what makes sure Rails is loaded.

The tasks delegates to the Ingestion class, which orchestrates the work:

def ingest
  IngestionDaemon.new(logger: logger, pidfile: pidfile).work do
    # ingest data
  end
end

Shutdown

Since we want to keep track of the background process, we write its ID to a file.

File.open(pidfile, 'w') { |f| f << Process.pid }
at_exit { FileUtils.rm_f pidfile }

The pidfile is cleaned up just before the program exits

We can terminate the daemon with the kill command:

kill $(cat tmp/pids/ingestion.pid)

Signal handling

We can politely ask the program to stop by sending it a signal.

Before putting the program in the background, we define which signals to handle and how using Signal.trap:

trap('INT') { interrupt }
trap('TERM') { interrupt }

The interrupt method above raises Interrupt which is also the exception raised by Ruby when you press Control-C to stop a program. Here we explicitly handle this case, but also the termination signal. That one is sent by default by the kill command (see the kill(1) and signal(7) manual pages).

Then it's a matter to rescuing the Interrupt exception at the appropriate place in the program and cleaning up.

Read up on the self-pipe trick for more robust signal handling. I did not need it for this program.

Detach the process from controlling terminal

The Ruby API for this is a bit confusing:

Process.daemon(true, false)

The first argument controls whether or not to change the current working directory to root (/). true tells Ruby to keep the current directory. The second argument controls whether to keep the input and output streams or redirect them to /dev/null. false tells Ruby explicitly to redirect to /dev/null. If you don't "close" the streams, the process will continue writing to the terminal even after it's detached.

Wrap-up

Here is the entire class:

class IngestionDaemon
  def initialize(logger:, pidfile:)
    @logger = logger
    @pidfile = pidfile
  end

  def work(&block)
    register_signal_handlers
    daemonize
    write_pid
    ingest(&block)
  rescue Interrupt
    shutdown
  end

  private

  attr_reader :logger, :pidfile

  def register_signal_handlers
    trap('INT') { interrupt }
    trap('TERM') { interrupt }
  end

  def daemonize
    Process.daemon(true, false)
  end

  def write_pid
    File.open(pidfile, 'w') { |f| f << Process.pid }
    at_exit { delete_pidfile }
  end

  def delete_pidfile
    FileUtils.rm_f pidfile
  end

  def ingest
    logger.info 'Starting ingestion daemon'
    yield
    logger.info 'Ingestion daemon ran out of work'
  end

  def interrupt
    raise Interrupt
  end

  def shutdown
    logger.info 'Shutting down ingestion daemon'
    exit(0)
  end
end