Home > Articles

  • Print
  • + Share This
This chapter is from the book

Transforming the Data

Now that we have our data in place we can start asking questions of it and working through the matrix transformation.

Fetching the Data

Looking at all the departures for all of 1999 would not be very interesting. Carriers have different routes and different hubs, and there are a LOT of flights. The chord diagram would be a mess. Looking at departures for a given airline makes more sense.

Let’s take a look at the origin/destination city pairs for American Airlines. The IATA code (abbreviation) for American Airlines is ‘AA’, Delta is ‘DL’, Southwest Airlines is ‘WN’, and so on. As a fun exercise later on, go back through and generate the chord diagrams for the other carriers in the data.

The following class method goes in the Departure class and begins the process of creating the matrix we need to feed the chord diagram:

def self.departure_matrix
  sql = <<-SQL.strip_heredoc
    SELECT origin, dest, count(*)
    FROM departures
    WHERE unique_carrier = 'AA'
    GROUP BY 1, 2
    ORDER BY 1, 2
  SQL
  counts = connection.execute(sql)
end

This query just gives us the counts. It does not generate a matrix for us. We need to take those counts and turn them into a matrix.

Generating the Matrix

Fortunately for us, Ruby has a Matrix class in the standard library. I’m not going to write this code directly in the Departure class, though. Create a module called DepartureMatrix (app/models/departure_matrix.rb).

require 'matrix'

module DepartureMatrix
  def airports_matrix!(counts:)
    h_matrix = counts.each_with_object({}) do |record, hash|
      hash[record["origin"]] ||= Hash.new(0)
      hash[record["origin"]][record["dest"]] = Integer(record["count"])
    end
    airports = h_matrix.keys.sort
    total    = Float(h_matrix.values.flat_map(&:values).sum)
    matrix   = Matrix.build(airports.count) do |row, column|
      origin = airports[row]
      dest   = airports[column]
      h_matrix.fetch(origin, {}).fetch(dest, 0) / total
    end
    [airports, matrix]
  end
end

The airports_matrix! method takes in a single parameter counts, which is what we just generated and returns a tuple (array with 2 items) with the list of airports and the matrix. We need both of these to tell D3 how to draw the chords and the labels.

There is a lot happening in this method, so let’s walk through it.

    h_matrix = counts.each_with_object({}) do |record, hash|
      hash[record["origin"]] ||= Hash.new(0)
      hash[record["origin"]][record["dest"]] = Integer(record["count"])
    end

Enumerable is pretty amazing. Let’s take a closer look at Enumerable#each_with_object. Here is a simplified example from the documentation:

evens = (1..10).each_with_object([]) { |item, array| array << item * 2 }
#=> [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

The each_with_object iterator takes a block that has two parameters. The first parameter is the item as we iterate through the collection. The second parameter is the object that we define in the parenthesis. In this case it is an array. In my code it is a hash.

The other tricky thing happening here is that I set a default value for each new origin hash. If there are missing values, they will be represented by the default value—zero in this case. The next thing we do is get a sorted list of airports. We can ask a hash for its keys, and what we get back is an array of keys. Pretty cool!

    airports = h_matrix.keys.sort

Next we need to calculate the grand total of all the things. This is how we will know what percent each individual count represents. We just asked the hash for its keys, and now we are asking for its values.

    total    = Float(h_matrix.values.flat_map(&:values).sum)

We have a multi-dimensional hash, so we get an array of arrays. We could map over those and flatten the resulting array, but flat_map gives us a nice shortcut for that combination of actions.

Do you remember significant digits from school? We need the total to be a float so that we don’t lose the decimal point for downstream calculations.

    matrix   = Matrix.build(airports.count) do |row, column|
      origin = airports[row]
      dest   = airports[column]
      h_matrix.fetch(origin, {}).fetch(dest, 0) / total
    end

The final thing we need to do is actually generate the matrix. Here is a simple example of Matrix#build from the documentation:

m = Matrix.build(2, 4) {|row, col| col - row }
#=> Matrix[[0, 1, 2, 3], [-1, 0, 1, 2]]

The example builds a matrix with two rows and four columns. Matrix#build takes up to two parameters for the row and column counts. If you omit the second parameter, the column count will be set to the row count. I rely on that behavior in my code to generate a square matrix.

Finalizing the Matrix

Now that we have our matrix generator we need to put it to work. Go back to the Departure class and have it extend DepartureMatrix. Now DepartureMatrix#airports_matrix! becomes a class method in Departure. We just need to call it, like so:

def self.departure_matrix
  sql = <<-SQL.strip_heredoc
    SELECT origin, dest, count(*)
    FROM departures
    WHERE unique_carrier = 'AA'
    GROUP BY 1, 2
    ORDER BY 1, 2
  SQL
  counts = connection.execute(sql)
  airports_matrix!(:counts => counts)
end

You can run this from the console to see what your matrix looks like. It should be a large array of arrays.

  • + Share This
  • 🔖 Save To Your Account