Common Table Expressions in Ruby on Rails

A Common Table Expression (CTE) is a temporary set of results created from a dataset in a database that gets used like any other table. They allow you to write complex queries by breaking them into more manageable (and reusable) blocks.

For Rails versions <=7.0, we must use Common Table Expressions as raw SQL. Here’s an elementary example:

sql = <<-SQL.squish
  WITH not_in_team AS (
    SELECT users.id FROM users
    WHERE team_id IS NULL
  )
  SELECT * FROM users
  WHERE id NOT IN(SELECT * FROM not_in_team)
SQL

result = User.find_by_sql(sql)
# User Load (4.6ms)  WITH not_in_team AS ( SELECT users.id FROM users WHERE team_id IS NULL ) SELECT * FROM users WHERE id NOT IN(SELECT * FROM not_in_team)

The SQL isn’t important here…what is essential is that we have to construct a query using raw SQL. Writing a query in SQL is a problem for a couple of reasons:

  • If we were using values provided by a user, we’d have to escape them manually.
  • We leave ourselves open to N+1 queries.

Let’s look into the N+1 issue. If we wanted to iterate over the users and print out all of the names of the teams that they’re in, we might do something like this:

result.map { _1.team.name }.join(', ')
# Team Load (1.1ms)  SELECT "teams".* FROM "teams" WHERE "teams"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
# Team Load (0.4ms)  SELECT "teams".* FROM "teams" WHERE "teams"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
# Team Load (0.3ms)  SELECT "teams".* FROM "teams" WHERE "teams"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
# Team Load (0.3ms)  SELECT "teams".* FROM "teams" WHERE "teams"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
# => "A Team, A Team, B Team, B Team"

The code above would give the desired result, but for each user, it would perform an SQL query which could be more efficient. Usually, when writing an ActiveRecord query, we’d use preload, which would preload the teams association, thereby reducing the number of database calls from N+1 to 2. We can’t do that with raw SQL.

There is a workaround — we can call the preloader directly:

ActiveRecord::Associations::Preloader.new(records: result, associations: [:team]).call
# Team Load (0.8ms)  SELECT "teams".* FROM "teams" WHERE "teams"."id" IN ($1, $2)  [["id", 1], ["id", 2]]

result.map { _1.team.name }.join(', ')
# => "A Team, A Team, B Team, B Team"

Manually calling the preloader solves the N+1 issue. It feels a little hacky, but it works.

Luckily, Rails 7.1 will make this easier — in it, we can use ActiveRecord to define Common Table Expressions, which means we can also use preload.

result = User.with(not_in_team: User.where(team_id: nil).select(:id))
             .where('users.id NOT IN(SELECT * FROM not_in_team)')
             .preload(:team)

# User Load (0.7ms)  WITH "not_in_team" AS (SELECT "users"."id" FROM "users" WHERE "users"."team_id" IS NULL) SELECT "users".* FROM "users" WHERE (users.id NOT IN(SELECT * FROM not_in_team))
# Team Load (0.5ms)  SELECT "teams".* FROM "teams" WHERE "teams"."id" IN ($1, $2)  [["id", 1], ["id", 2]]

result.map { _1.team.name }.join(', ')
# => "A Team, A Team, B Team, B Team"

We define the CTE using the with method; we can then use it in the where condition as we did with the raw SQL.

Notice that we can then use preload. Much cleaner, and no N+1!

This is a really simplified example to give a bare bone demonstration of what’s coming. There is a much more detailed discussion in the PR that added the functionality with more realistic examples if you’d like to dig a little deeper.