I have previously written about some techniques we are using at Fin to combat software entropy in our Ruby on Rails app. One of the most common problems that we deal with has to do with performance degradation due to database queries — unnecessary queries, inefficient queries, queries that should have been batched etc.

In future posts I will discuss approaches to actually optimize database queries, but a good first step is to understand where database accesses are happening, and how to control them. In this post I will describe how we can measure, annotate, and control database access using the ActiveRecord ORM.

Logging database access in Rails console

The simplest tool in our kit is enables logging database queries for human perusal, as they happen in the Rails console. We use the wonderful [pry](http://pryrepl.org/) for our Rails console (instructions to set this up are here). In our .pryrc, we have defined a custom command called !debug :

# Don't mess with vanilla pry
if defined?(Rails)
  # ,-----------------------------------------------------------
  # | !debug : ActiveRecord query logging + set log level :debug
  # `-----------------------------------------------------------
  Pry::Commands.block_command('!debug', 'Enable debug logging. Pass in an optional number of backtrace lines to print out.') do |*args|
    num_debug_lines = args.length > 0 ? args[0].to_i : 1
    
    # Turn on Rails debug logging
    Rails.logger.level = :debug

    # Point the activerecord logger at STDOUT
    ActiveRecord::Base.logger = Logger.new(STDOUT)

    # initialize a counter
    ActiveRecordQueryCounter.instance.reset
    ActiveRecordQueryCounter.instance.set_trace_lines(num_debug_lines)
    
    # subscribe to query events and print out the trace each time
    ActiveSupport::Notifications.subscribe('sql.active_record') do |_name, _start, _finish, _id, payload|
      if payload[:name] != 'SCHEMA'
        ActiveRecordQueryCounter.instance.trace.each { |line| ActiveRecord::Base.logger.debug(line) }
      end
    end
  end

view raw.pryrc.rb hosted with ❤ by GitHub

We can now type !debug in the Rails console to set the Rails logger log-level to DEBUG, and also print out every ActiveRecord query as it executes.

[1] development (main)> !debug
[2] development (main)> User.find(1)
D, [2017-12-24T09:10:26.513297 #466] DEBUG -- : [Shard: master]  User Load (3.5ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
D, [2017-12-24T09:10:26.521601 #466] DEBUG -- : [1] ↳ (pry):1:in `<main>'

view rawdebug-session.rb hosted with ❤ by GitHub

How does this work? It takes advantage of ActiveSupport’s notifications mechanism, which lets us subscribe to an event stream of every ActiveRecord query. We’ve abstracted some of the logic for this into the ActiveRecordQueryCounter class:

# frozen_string_literal: true

class ActiveRecordQueryCounter
  # We want only one of these active
  include Singleton

  attr_reader :query_count

  def initialize
    reset
    set_trace_lines(1)
    _subscribe_for_count
    @cleaner = _backtrace_cleaner
  end

  private def _backtrace_cleaner
    # Remove unnecessary lines from the stack trace
    skip_re = %r{(initializers)|(lib/active_record_query_counter)|(lib/core_ext/active_record)}
    _cleaner = Rails.backtrace_cleaner.clone
    _cleaner.add_silencer { |line| skip_re.match(line) }
    _cleaner.add_filter { |line| line.sub("#{Rails.root}/", '') }
    _cleaner
  end

  public def reset
    @query_count = 0
  end

  public def set_trace_lines(num)
    @num_trace_lines = num
  end

  public def trace(num=nil)
    # provide a cleaned stack trace
    @cleaner.clean(caller).first(num || @num_trace_lines).map { |line| "[#{@query_count}] ↳ #{line}" }
  end

  private def _subscribe_for_count
    # Subscribe to query events so we can count them
    ActiveSupport::Notifications.unsubscribe(@count_subscriber) if @count_subscriber
    @count_subscriber = ActiveSupport::Notifications.subscribe('sql.active_record') do |_name, _start, _finish, _id, payload|
      if payload[:name] != 'SCHEMA'
        @query_count += 1
      end
    end
  end
end

view rawactive_record_query_counter.rb hosted with ❤ by GitHub

ActiveRecordQueryCounter is a Singleton counter that when initialized, subscribes to sql.active_record events and tracks their count. It also lets us print a curated (and optionally truncated) stack trace of the code the query originates from; this is crucial to finding the culprit when you notice something off! 🕵

Logging database access programmatically

It is easy to see how we can extend this pattern to define a log_query_count utility that can be used to wrap arbitrary code blocks and add query logging to them:

    Contract Integer, Func[Any => Any] => Any
    public def log_query_count(num_trace_lines=1)
      # Initialize our counter
      ActiveRecordQueryCounter.instance.reset

      # Define a callback to be called each time we encounter a query
      # event. The callback will print out the trace
      callback = lambda do |_name, _start, _finish, _id, payload|
        if payload[:name] != 'SCHEMA'
          Rails.logger.info(tag: 'log_query_count', message: "Query # #{ActiveRecordQueryCounter.instance.query_count}: #{payload[:name]}")
          ActiveRecordQueryCounter.instance.trace(num_trace_lines).each do |line|
            Rails.logger.info(tag: 'log_query_count', message: line)
          end
        end
      end
      
      Rails.logger.info(tag: 'log_query_count', message: "Starting to count queries, #{ActiveRecordQueryCounter.instance.query_count} so far")
      
      # Subscribe the block to query events, and hook up the events to
      # the callback we defined above
      result = ActiveSupport::Notifications.subscribed(callback, 'sql.active_record') { yield }

      Rails.logger.info(tag: 'log_query_count', message: "Total query count: #{ActiveRecordQueryCounter.instance.query_count}")

      # Remember to return the result from our block
      result
    end

view rawfin_active_record_extensions.rb hosted with ❤ by GitHub

For example:

[1] development (main)> User.log_query_count(2) do
[1] development (main)* User.find(1)
[1] development (main)* end
* 2017-12-24T09:31:29 [INFO] [] <- Starting to count queries, 0 so far
* 2017-12-24T09:31:29 [INFO] [] <- Query # 1: User Load
* 2017-12-24T09:31:29 [INFO] [] <- [1] ↳ app/models/concerns/fin_active_record_extensions.rb:36:in `block in log_query_count’
* 2017-12-24T09:31:30 [INFO] [] <- [1] ↳ (pry):2:in `block in <main>’
* 2017-12-24T09:31:30 [INFO] [] <- Total query count: 1

view rawpry.rb hosted with ❤ by GitHub

Controlling database access

Now that we can watch and count database queries, we would like to control where they happen. Database queries are typically where most of the time in spent in a request. The n+1 query problem is well documented.

A particularly insidious kind of performance regression happens when you make a change to a method that was previously optimized to preloads all the data it needs, and inadvertently introduce new queries. It is non-trivial to predict where database queries will occur, as a consequence of ActiveRecord’s features that preload or cache associations under the hood. The logging extensions that I described above can help you identify what qeries are happening when you decide to investigate performance, but it is valuable to be able to raise an alarm when regressions are introduced, with respect to where database queries are expected.

This can be achieved by an extension of our log_query_countblock above; we can define a with_no_db block which will raise exceptions if any queries occur within it. Here’s how this works:

We begin by monkey-patching ActiveRecord.connection.execute to test a global db_access_mode flag. This logic is encapsulated in our ExecutionControl module below, and included into the ActiveRecord connection adapter:

# frozen_string_literal: true

module CoreExt
  module ActiveRecord
    module ConnectionAdapters
      module ExecutionControl

        # Declare an exception class
        class DBExecutionDisallowed < StandardError; end

        def execute(*args)
          # Monkey-patch execute to verify the db access mode flag
          # before calling super
          _verify_db_access_mode
          super
        end

        def exec_query(*args)
          # Monkey-patch exec_query to verify the db access mode flag
          # before calling super
          _verify_db_access_mode(*args)
          super
        end

        private def _verify_db_access_mode(*args)
          if [:prevent_strict, :prevent_allow_override].include?(Thread.current[:db_access_mode])
            if Thread.current[:db_access_report_only]
              Rollbar.error("Database queries are disallowed here: #{args}")
            else
              raise DBExecutionDisallowed.new("Database queries are disallowed here: #{args}")
            end
          end
        end
      end
    end
  end
end

# Include into our PostgreSQLAdapter
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.include CoreExt::ActiveRecord::ConnectionAdapters::ExecutionControl

view rawactive_record.rb hosted with ❤ by GitHub

We then define the with_no_db and with_db blocks that will manage the db_access_mode flag:

    Contract KeywordArgs[strict: Optional[Bool], report_only: Optional[Bool]], Func[Any => Any] => Any
    public def with_no_db(strict: true, report_only: false)
      # No queries permitted in this block
      mode = (strict ? :prevent_strict : :prevent_permit_override)
      _with_db_access_mode(mode, report_only: report_only) { yield }
    end

    public def with_db
      # Queries are permitted in this block when it is nested inside a
      # non-strict with_no_db block
      _with_db_access_mode(:allow) { yield }
    end

    Contract Enum[:prevent_strict, :prevent_permit_override, :allow], KeywordArgs[report_only: Optional[Bool]], Func[Any => Any] => Any
    private def _with_db_access_mode(mode, report_only: false)
      prev_mode = Thread.current[:db_access_mode]
      prev_report_only = Thread.current[:db_access_report_only]

      # We don't allow nested with_no_db blocks to downgrade
      # strictness
      if prev_mode == :prevent_strict && mode != :prevent_strict
        raise 'Cannot downgrade strictness in a strict with_no_db context'
      end

      begin
        Thread.current[:db_access_report_only] = report_only
        Thread.current[:db_access_mode] = mode
        result = yield
      ensure
        Thread.current[:db_access_report_only] = prev_report_only
        Thread.current[:db_access_mode] = prev_mode
      end
      result
    end

view rawfin_active_record_extensions.rb hosted with ❤ by GitHub

with_no_db will throw exceptions on querying, like so:

[2] development (main)> Entry.with_no_db do
[2] development (main)*   Entry.first
[2] development (main)* end
CoreExt::ActiveRecord::ConnectionAdapters::ExecutionControl::DBExecutionDisallowed: Database queries are disallowed here: ["SELECT  \"entries\".* FROM \"entries\" WHERE \"entries\".\"deleted_at\" IS NULL ORDER BY \"entries\".\"id\" ASC LIMIT $1", "Entry Load", [#<ActiveRecord::Attribute::WithCastValue:0x00562f9063e410 @name="LIMIT", @value_before_type_cast=1, @type=#<ActiveModel::Type::Value:0x00562f852874f8 @precision=nil, @scale=nil, @limit=nil>, @original_attribute=nil>], {:prepare=>true}]

view rawwith_no_db.rb hosted with ❤ by GitHub

By default, we use a strict version of with_no_db which cannot be overridden by a nested with_db block. This constrains all code that is invoked within the block to act without accessing the database. In some situations, we can choose to allow nested code to access the database, as long as it is annotated as with_db.

We also have a report_only version, which logs warnings to Rollbar instead of throwing exceptions. This can be useful for some critical codepaths where we prefer taking the performance hit to failing entirely.

We have only recently begun to use these extensions; but it’s already clear that they aren’t trivial to put into production in a mature app. Assumptions about on-demand querying are probably deeply ingrained in any significant Rails app, and it is also likely that a normal test suite isn’t sufficient to exercise these assumptions completely. It’s easy to ship a with_no_db block and have it unexpectedly throw exceptions. The report_only flag is a good way to start logging all such potential exceptions, and then switch to strict mode after gaining sufficient confidence that all inbound codepaths have been adjusted to preload data.

Conclusion

Query logging and database access control can be valuable tools to debug the performance of a Rails app. What are your favorite techniques to manage these? Let us know in the comments.

Thanks for reading!

— Venky Iyer