Home > Articles > Programming > Ruby

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Strings

A string holds characters and nothing else. It is Ruby's only single-purpose container.

As the name suggests, a string has a beginning and an end; it is an ordered container. It might just as accurately have been called a "chain." If the organizing principle were a "bag" instead, then the characters could be scrambled; we would be able to distinguish list from still, but not fare from fear.

Two strings are considered equal only if they have the exact same characters in the exact same sequence.

 "fare" == "fear"  #–> false
 "Night" == "night" #–> false
 "nine " == " nine " #–> false
 "rabbit" == "rabbit" #–> true

Specifying Substrings by Position

Each position in a string has a numerical index. In good computer science tradition, Ruby's indices are counted starting with zero instead of one. Segments of a string, or substrings, are referred to by their starting positions and lengths. Putting this pair of numbers in square brackets, we can examine or modify individual characters or longer substrings.

 foo = "wishbone"
 foo[0,1] #–> "w"
 foo[2,5] #–> "shbon"
 foo[0,1] = "f"  # foo == "fishbone"
 foo[5,1] = "a"  # foo == "fishbane"
 foo[0,4] = "wolf" # foo == "wolfbane"

A replacement string does not need to be the same length as the segment it is replacing. The affected string expands or contracts as needed.

 foo[1,5] = "i" # foo == "wine"
 foo[1,2] = "edg" # foo == "wedge"

It's sometimes more convenient to count from the end of the string than from the beginning. The last character is considered to be in position -1, the next-to-last in position -2, and so on.

 bar = foo.upcase + foo.reverse # bar == "WEDGEegdew"
 bar[–1,1]      #–> "w"
 bar[–7,4]      #–> "GEeg"
 bar[–7,4] = ""     # bar == "WEDdew"

Let's not get lost in notation. If you take a look at bar.methods and scan through the long list it gives you, you'll see "[]" and "[]=" in there. This means that using square brackets to feed position information to a string is just another way of applying a method to an object; the notation may be new, but the concept is no different from what we were talking about on Day 2. "[]" could have been called "substring" or "slice", and "[]=" could have been called "replace_substring."

NOTE

In fact, Ruby offers slice as a synonym for [].

 	 "Radio".slice(2,3) #–> "dio"

If we wanted to be absolutely strictly consistent with method notation, we could do this:

 qaz = "Mona Lisa"
 qaz.[] (0,5)   #–> "Mona " (refer to substring)
 qaz.[]= (3,6,"day") # qaz == "Monday" (replace substring)

It looks strange, but it works, and it follows the standard dot notation for applying methods: object, dot, method name, and arguments, in that order. Always remember: In Ruby, all you are doing is applying methods (that is, passing messages) to objects. Sometimes it isn't obvious, because some specialized notation is provided to let you say things in another form, but under the hood, it's always the same story.

Providing an alternate way to write something is sometimes known as sugaring the syntax; it doesn't make the language any more nutritious, so to speak, but does make it a little more pleasant to work with. This particular syntax sugar is borrowed from Perl, and it helps Perl programmers feel at home using Ruby's strings.

Individual Characters

The length parameter can be left off when using "[]" or "[]=", in which case the length defaults to one. The results are what you would expect when replacing substrings:

 s = "012345678"
 s[3] = "waffle" # s == "012waffle45678"

But then something surprising happens when you look at an individual character:

 s = "AaBbCc 012"
 s[0] #–> 65 
 s[1] #–> 97
 s[6] #–> 32
 s[7] #–> 48

What's that all about? Depending upon your experience, you might or might not recognize the above as ASCII codes, a common way of representing characters as numbers in the range 0 to 255. Ruby doesn't have a separate Character type, so when we talk about characters, we really mean these numbers.

NOTE

Don't expect to understand how this works just yet, but if you want to see the ASCII codes of all characters of a string at once, you can try this:

		 "Book".split(//).collect{|c| c[0]}
		   #–> [66, 111, 111, 107] 

Being able to get a visible indication of which ASCII codes are associated with which characters is handy when you're trying to figure out lexicographic (or alphabetical) order. To get characters in this form, you can omit the length parameter to [] as shown above, but to see characters as tiny strings, you can either specify the substring length as 1 or use the chr method to do the necessary conversion. chr is a method of the Integer class.

 s[0,1] #–> "A" 
 s[2].chr #–> "B"
 70.chr #–> "F"
 10.chr #–> "\n" (linefeed)

You may not often have occasion to use it, but Ruby provides a simple way of expressing characters in ASCII form. A character literal is a question mark followed by a single character.

 87.chr #–> "W"
 ?W  #–> 87
 ?\t  #–> 9 (tab character)
 "W"[0] #–> 87
 "W"[0,1] #–> "W"
 ?W < ?X #–> true (because 87 < 88)
 "W" < "X" #–> true (correct lexicographic order)
 ?W < "X" # error

NOTE

Here is an example of Ruby refusing to try to read your mind. If comparisons between integers and strings were allowed, you would run into situations like this:

		 "4" < ?3

This is ambiguous because the ASCII code for "3" is 51, which is of course larger than 4. So the comparison might be either true or false depending on which conversion you had in mind.

		 "4" < ?3.chr #–> false (comparing 1-character strings)
		 "4".to_i -< ?3 #–> true (interpreting "4" as the number it  represents)

Specifying Substrings by Matching

Often it's useful to deal with substrings based on content rather than position. We've been supplying a position and length to [], but if we supply a string instead, Ruby will search the target string for it and figure out the position and length for itself.

 footwear = "blue suede shoes"
 footwear["suede"]    #–> "suede" 
 footwear["leather"]   #–> nil
 footwear["blue"] = "red"  # footwear == "red suede shoes"
 footwear["socks"] = "sandals" # footwear == "red suede shoes"

Notice that if the search fails, no replacement happens, but there is also no error; the target string is simply unaffected.

NOTE

Looking ahead: You can search not only for an exact substring but also for an abstract pattern. Here we replace the first vowel in a word with an asterisk:

		 s = "strongbox"
		 s[/[aeiou]/i] = "*" # s == "str*ngbox"

We'll learn all about string matching patterns on Day 8.

A Few Useful String Instance Methods

<<

Append either another string or a character:

 x = "one "
 x << 49  #–> "one 1"  (49 is ASCII for "1")
 x << " two " #–> "one 1 two "
 x << ?2  #–> "one 1 two 2" (same as x << 50)

ljust(length), center(length), rjust(length)

Pad with leading and trailing spaces as necessary to grow to the given length.

 "abc".ljust(7) #–> "abc "
 "abc".rjust(7) #–> " abc"
 "abc".center(7) #–> " abc "
 "abc".center(6) #–> " abc " (odd spaces go to the right)
 "abc".center(2) #–> "abc"  (no change if length is too small) 

count(description)

Return the number of characters that match those in the description string. A range of characters can be specified with a dash, as in "a-c".

 s = "abcde abcde"
 s.count("c")  #–> 2
 s.count("b–e") #–> 8

delete(description)

Like count, but return a copy with all matching characters removed.

 s = "abcde abcde"
 s.delete("ac–e") #–> "b b"

downcase, upcase, swapcase, capitalize

Return a copy with capitalization changed.

 "aBc".downcase #–> "abc"
 "aBc".upcase  #–> "ABC"
 "aBc".swapcase #–> "AbC"
 "aBc".capitalize #–> "Abc"

include?(spec)

Return true or false depending on whether the string contains spec, which can be a string or a character.

 "Haystack".include?("needle") #–> false
 "Haystack".include?("sta") #–> true
 "Haystack".include?(72)  #–> true (because ?H is 72)

index(spec, [offset])

Find the index where spec is found, starting either from the beginning or from offset. Again, spec can be either a string or a character.

 "Mississippi".index("ssi") #–> 2
 "Mississippi".index("ssi",3) #–> 5
 "Mississippi".index("sp") #–> nil (not found)

rindex(spec, [limit])

Like index, but find the last match instead of the first. The limit stops the search.

 "Mississippi".rindex("i") #–> 10
 "Mississippi".rindex("i",6) #–> 4

strip

Remove whitespace (invisible characters such as spaces, tabs, linefeeds, and so forth) from the beginning and end.

 " Erie Canal \n".strip #–> "Erie Canal"

tr(spec, repl)

Short for "translate." Return a copy with characters from spec replaced by the corresponding characters from repl. The first example here simulates the downcase method.

 "DOS_FILE.EXT".tr("A–Z","a–z") #–> "dos_file.ext"
 "Monkey".tr("ym–q","O*:^)?") #–> "M^:keO"
  • + Share This
  • 🔖 Save To Your Account