2.7 Performing Specialized String Comparisons

Ruby has built-in ideas about comparing strings; comparisons are done lexicographically, as we have come to expect (that is, based on character set order). But if we want, we can introduce rules of our own for string comparisons, and these can be of arbitrary complexity.

For example, suppose that we want to ignore the English articles a, an, and the at the front of a string, and we also want to ignore most common punctuation marks. We can do this by overriding the built-in method <=> (which is called for <, <=, >, and >=). Listing 2.1 shows how we do this.

Listing 2.1 Specialized String Comparisons

class String

  alias old_compare <=>

  def <=>(other)
    a = self.dup
    b = other.dup
    # Remove punctuation
    a.gsub!(/[\,\.\?\!\:\;]/, "")
    b.gsub!(/[\,\.\?\!\:\;]/, "")
    # Remove initial articles
    a.gsub!(/^(a |an |the )/i, "")
    b.gsub!(/^(a |an |the )/i, "")
    # Remove leading/trailing whitespace
    # Use the old <=>


title1 = "Calling All Cars"
title2 = "The Call of the Wild"

# Ordinarily this would print "yes"

if title1 < title2
  puts "yes"
  puts "no"         # But now it prints "no"

Note that we “save” the old <=> with an alias and then call it at the end. This is because if we tried to use the < method, it would call the new <=> rather than the old one, resulting in infinite recursion and a program crash.

Note also that the == operator does not call the <=> method (mixed in from Comparable). This means that if we need to check equality in some specialized way, we will have to override the == method separately. But in this case, == works as we want it to anyhow.

Suppose that we wanted to do case-insensitive string comparisons. The built-in method casecmp will do this; we just have to make sure that it is used instead of the usual comparison.

Here is one way:

class String
  def <=>(other)

But there is a slightly easier way:

class String
  alias <=> casecmp

However, we haven’t finished. We need to redefine == so that it will behave in the same way:

class String
  def ==(other)
    casecmp(other) == 0

Now all string comparisons will be strictly case insensitive. Any sorting operation that depends on <=> will likewise be case insensitive.

