Home > Articles > Programming

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

1.6 Fetching URLs Concurrently

One of the most interesting and novel aspects of Go is its support for concurrent programming. This is a large topic, to which Chapter 8 and Chapter 9 are devoted, so for now we’ll give you just a taste of Go’s main concurrency mechanisms, goroutines and channels.

The next program, fetchall, does the same fetch of a URL’s contents as the previous example, but it fetches many URLs, all concurrently, so that the process will take no longer than the longest fetch rather than the sum of all the fetch times. This version of fetchall discards the responses but reports the size and elapsed time for each one:

gopl.io/ch1/fetchall
   // Fetchall fetches URLs in parallel and reports their times and sizes.
   package main

   import (
       "fmt"
       "io"
       "io/ioutil"
       "net/http"
       "os"
       "time"
   )

   func main() {
       start := time.Now()
       ch := make(chan string)
       for _, url := range os.Args[1:] {
           go fetch(url, ch) // start a goroutine
       }
       for range os.Args[1:] {
           fmt.Println(<-ch) // receive from channel ch
       }
       fmt.Printf("%.2fs elapsed\n", time.Since(start).Seconds())
   }

   func fetch(url string, ch chan<- string) {
       start := time.Now()
       resp, err := http.Get(url)
       if err != nil {
           ch <- fmt.Sprint(err) // send to channel ch
           return
       }

       nbytes, err := io.Copy(ioutil.Discard, resp.Body)
       resp.Body.Close() // don't leak resources
       if err != nil {
           ch <- fmt.Sprintf("while reading %s: %v", url, err)
           return
       }
       secs := time.Since(start).Seconds()
       ch <- fmt.Sprintf("%.2fs  %7d  %s", secs, nbytes, url)
   }

Here’s an example:

$ go build gopl.io/ch1/fetchall
$ ./fetchall https://golang.org http://gopl.io https://godoc.org
0.14s     6852  https://godoc.org
0.16s     7261  https://golang.org
0.48s     2475  http://gopl.io
0.48s elapsed

A goroutine is a concurrent function execution. A channel is a communication mechanism that allows one goroutine to pass values of a specified type to another goroutine. The function main runs in a goroutine and the go statement creates additional goroutines.

The main function creates a channel of strings using make. For each command-line argument, the go statement in the first range loop starts a new goroutine that calls fetch asynchronously to fetch the URL using http.Get. The io.Copy function reads the body of the response and discards it by writing to the ioutil.Discard output stream. Copy returns the byte count, along with any error that occurred. As each result arrives, fetch sends a summary line on the channel ch. The second range loop in main receives and prints those lines.

When one goroutine attempts a send or receive on a channel, it blocks until another goroutine attempts the corresponding receive or send operation, at which point the value is transferred and both goroutines proceed. In this example, each fetch sends a value (ch <- expression) on the channel ch, and main receives all of them (<-ch). Having main do all the printing ensures that output from each goroutine is processed as a unit, with no danger of interleaving if two goroutines finish at the same time.

Exercise 1.10: Find a web site that produces a large amount of data. Investigate caching by running fetchall twice in succession to see whether the reported time changes much. Do you get the same content each time? Modify fetchall to print its output to a file so it can be examined.

Exercise 1.11: Try fetchall with longer argument lists, such as samples from the top million web sites available at alexa.com. How does the program behave if a web site just doesn’t respond? (Section 8.9 describes mechanisms for coping in such cases.)

  • + Share This
  • 🔖 Save To Your Account