Home > Articles

  • Print
  • + Share This
This chapter is from the book

Disjointed City Pairs

Airlines are very good at resource optimization. An airplane goes from airport to airport, picking up and delivering passengers at each stop. I wondered, though, how often does an airplane fly an “empty leg” route? By that I mean, how often is the destination airport different from the next origination airport for a given airplane? We have the data that we need to answer that question!

Using the Lead Window Function to Find Empty Leg Flights

The idea is simple enough. We have the airplanes’ tail numbers, so we can track each plane’s movements. We can ask Postgres to tell us the next origin airport using the LEAD window function and compare that to that record’s destination airport. The query to do all of those things requires a few steps, though. I typically use Common Table Expression (CTE) when I have queries with multiple parts that build on each other.

If you wanted to just write out the query and see the data, it would look like this:

WITH base_query AS (
  SELECT id,
    unique_carrier,
    flight_num,
    tail_num,
    origin,
    dest,
    to_date((((year || '-'::text) || month) || '-'::text) || day_of_month, 'YYYY-MM-DD'::text) AS dep_date,
    dep_time,
    arr_time,
    actual_elapsed_time,
    dep_delay,
    arr_delay,
    diverted
   FROM departures
   WHERE unique_carrier = 'AA'
), departures_with_lead AS (
  SELECT *,
    LEAD(origin) OVER (PARTITION BY tail_num ORDER BY tail_num, dep_date, dep_time) AS next_origin
  FROM base_query
)
SELECT * FROM departures_with_lead
WHERE unique_carrier = 'AA'
  AND dest <> next_origin;

Go ahead and open your favorite SQL editor, or go into rails dbconsole, and run that query.

It’s not crazy slow, but it’s also not fast. For that reason, I will optimize and put the base query in a materialized view.

Optimizing Slow Queries with the Materialized View

A view is just a query that is stored in the database. A materialized view is where the query AND the results are stored in the database. You can index and update the data in a materialized view. We won’t need to update this data, but indexes will be handy.

We can use Rails’s migrations to do this for us. Create a migration (rails g migration create_aa_departures). Put the following code in the new migration and run the migration:

class CreateAaDepartures < ActiveRecord::Migration
  def up
    sql = <<-SQL.strip_heredoc
      CREATE MATERIALIZED VIEW aa_departures AS (
        WITH base_query AS (
          SELECT id,
            unique_carrier,
            flight_num,
            tail_num,
            origin,
            dest,
            to_date((((year || '-'::text) || month) || '-'::text) || day_of_month, 'YYYY-MM-DD'::text) AS dep_date,
            dep_time,
            arr_time,
            actual_elapsed_time,
            dep_delay,
            arr_delay,
            diverted
           FROM departures
           WHERE unique_carrier = 'AA'
        )
        SELECT *,
          LEAD(origin) OVER (PARTITION BY tail_num ORDER BY tail_num, dep_date, dep_time) AS next_origin
        FROM base_query
      );
      CREATE INDEX ON aa_departures USING btree (origin);
      CREATE INDEX ON aa_departures USING btree (dest);
      CREATE INDEX ON aa_departures USING btree (next_origin);
    SQL
  end

  def down
    execute("DROP MATERIALIZED VIEW aa_departures;")
  end
end

Draw the Disjointed City Pairs Chord Diagram

The query was the hard part. Now we just need to add the plumbing in the DeparturesController to query the data and serve it to the JavaScript.

The new controller methods should look very familiar:

def disjointed; end
def disjointed_matrix
  airports, matrix = Departure.disjointed_matrix
  render :json => {
    :airports => airports,
    :matrix   => matrix
  }
end

Don’t forget to add routes for the disjointed and disjointed_matrix actions. The view app/views/departures/disjointed.html.erb is simply a copy of index.html.erb with the route changed for the data call:

makeChordChart('/departures/disjointed_matrix.json');

Now for the Departure model. You’ve seen the full query, and you saw how we broke it up to create the materialized view. We put the pieces back together in Departure#disjointed_matrix.

def self.disjointed_matrix
  sql = <<-SQL.strip_heredoc
    SELECT dest, next_origin, count(*)
    FROM aa_departures
    WHERE dest <> next_origin
      AND next_origin IS NOT NULL
    GROUP BY 1, 2
    ORDER BY 1, 2
  SQL
  counts = connection.execute(sql)
  airports_matrix!(:counts => counts, :field1 => "dest", :field2 => "next_origin")
end

You may have noticed that we now have three parameters in the DepartureMatrix#airports_matrix! call. We need to update that method. The modified code is below with the updated lines in bold.

def airports_matrix!(counts:, field1: "origin", field2: "dest")
  h_matrix = counts.each_with_object({}) do |record, hash|
    hash[record[field1]] ||= Hash.new(0)
    hash[record[field1]][record[field2]] = Integer(record["count"])
  end
  airports = h_matrix.keys.sort
  total    = Float(h_matrix.values.flat_map(&:values).sum)
  matrix   = Matrix.build(airports.count) do |row, column|
    origin = airports[row]
    dest   = airports[column]
    h_matrix.fetch(origin, {}).fetch(dest, 0) / total
  end
  [airports, matrix]
end

The field names in the queries are different, but the matrix creation routine is the same. We can simply pass in the data’s hash key attributes, and everything still works.

You’ll need to restart the Rails app if you still had it running because you’ve added new files. Go to http://localhost:3000/departures/disjointed and you should see a chord chart that looks like Figure 6.3.

Figure 6.3

Figure 6.3 Disjointed City Pair Chord Chart

  • + Share This
  • 🔖 Save To Your Account