Home > Articles > Programming > Ruby

📄 Contents

  • Print
  • + Share This
This chapter is from the book

2.8 Tokenizing a String

The split method parses a string and returns an array of tokens. It accepts two parameters, a delimiter and a field limit (which is an integer).

The delimiter defaults to whitespace. Actually, it uses $; or the English equivalent $FIELD_SEPARATOR. If the delimiter is a string, the explicit value of that string is used as a token separator.

s1 = "It was a dark and stormy night."
words = s1.split          # ["It", "was", "a", "dark", "and",
                          #  "stormy", "night"]
s2 = "apples, pears, and peaches"
list = s2.split(", ")     # ["apples", "pears", "and peaches"]

s3 = "lions and tigers and bears"
zoo = s3.split(/ and /)   # ["lions", "tigers", "bears"]

The limit parameter places an upper limit on the number of fields returned, according to these rules:

  1. If it is omitted, trailing null entries are suppressed.
  2. If it is a positive number, the number of entries will be limited to that number (stuffing the rest of the string into the last field as needed). Trailing null entries are retained.
  3. If it is a negative number, there is no limit to the number of fields, and trailing null entries are retained.

These three rules are illustrated here:

str = "alpha,beta,gamma,,"
list1 = str.split(",")     # ["alpha","beta","gamma"]
list2 = str.split(",",2)   # ["alpha", "beta,gamma,,"]
list3 = str.split(",",4)   # ["alpha", "beta", "gamma", ","]
list4 = str.split(",",8)   # ["alpha", "beta", "gamma", "", ""]
list5 = str.split(",",-1)  # ["alpha", "beta", "gamma", "", ""]

The scan method can be used to match regular expressions or strings against a target string:

str = "I am a leaf on the wind..."

# A string is interpreted literally, not as a regex
arr = str.scan("a")        # ["a","a","a"]

# A regex will return all matches
arr = str.scan(/\w+/)      # ["I", "am", "a", "leaf", "on", "the",
"wind"]

# A block can be specified
str.scan(/\w+/) {|x| puts x }

The StringScanner class, from the standard library, is different in that it maintains state for the scan rather than doing it all at once:

require 'strscan'
str = "Watch how I soar!"
ss = StringScanner.new(str)
loop do
  word = ss.scan(/\w+/)    # Grab a word at a time
  break if word.nil?
  puts word
  sep = ss.scan(/\W+/)     # Grab next non-word piece
  break if sep.nil?
end
  • + Share This
  • 🔖 Save To Your Account