- A famous New Yorker cartoon from 1993 showed two dogs at a computer, with one saying to the other, “On the Internet, nobody knows you’re a dog.” That may no longer be true.4
- —Louise Story
Advertising is the fuel behind virtually all free online tools. Advertising is also the means for tracking your web surfing across the Internet. Anytime you visit sites that serve advertisements from a common advertising network, your activities can be logged. These logs can then be used to create precise profiles, facilitating tailored advertising. Importantly, log analysis isn’t a static process. For example, advertising companies are actively developing technologies to anticipate people’s next steps.5 Based on the popularity of the largest web companies, the amount of information they can collect is staggering (see Figure 7-3).
Figure 7-3 New York Times analysis of the data collection conducted by some of the largest online companies.
Web advertisements are big business, and some of the largest services are offerings by Google, including AdSense, AdWords, and DoubleClick. However, advertisements are appearing in many other forms. Microsoft is quietly offering an ad-funded version of its Works office suite.6 Pudding Media, a San Jose–based startup, is offering advertisement-supported phone service.7
Google understands the growth potential of advertising. In 2007, the company announced that it would pay $3.1 billion to acquire DoubleClick, a leading online advertiser, whose current estimated revenues at the time were $150 million.8 A year ago, Yahoo! made a similar, smaller-scale move by acquiring the online global ad network BlueLithium for approximately $300 million.9 At the time of this writing Yahoo! was reportedly considering an agreement to carry search advertisements from Google, amid a potential hostile takeover attempt by Microsoft.10 Although it is impossible to know for sure at this time, such an allegiance carries the very real potential of allowing Google to gather search terms, issue cookies, and conduct other activities based on Yahoo!’s extremely large user base. Make no mistake, a key component of such acquisitions and alliances is access to user data and the power it provides.
Chapter 3 should have convinced you that users scatter significant, and often personally identifiable, information behind as they surf the web. When New York Times reporter Louise Story decided to determine how personally identifiable the information maintained by large web companies is, she asked four large online companies a single question: “Can you show me an advertisement with my name in it?” Story provided the following summary of the responses.12
- Microsoft says it could use only a person’s first name. AOL and Yahoo! could use a full name, but only on their sites, not the other sites on which they place ads. Google isn’t sure—it probably could, but it doesn’t know the names of most of its users.
Although these results are telling in their own right, keep in mind that these are official responses provided to a New York Times reporter. In other words, these are legal opinions on the subject, as opposed to technical capability. What the four companies have the capability to do is an entirely different matter. Advertising networks are the net that allows large online companies to gather this precise information and use it for user profiling, data mining, and targeted advertising. Because of this capability to aggregate and analyze user information, advertising campaigns can follow users as they switch from independent sites. You might have encountered this technique when shopping online. For example, if you were searching for flat panel monitors on site X, when you hopped over to visit site Y, low and behold, there were advertisements for flat panel monitors.
Microsoft is openly touting an “Engagement Mapping” approach that seeks to move beyond the “outdated ‘last ad clicked’” model by understanding “how each ad exposure—whether display, rich media or search, seen multiple times on multiple sites and across many channels—influenced an eventual purchase.”13 The article also states that Microsoft intends to use data on user behavior before clicking an ad to be able to say that an unclicked ad still made an impression. If the article is to be believed, Microsoft intends to collate that information with search queries and sites visited within the period of a day or a couple of days. Two questions immediately arise. Where does that other data come from (and is it a mere coincidence that Microsoft is seeking to acquire Yahoo! and other search companies)? Now that we are talking about keeping tabs of long-term user behavior, and hypothesizing why a user did this or that, what other kinds of data mining will this floodgate open? They are essentially saying, “We are going to dedicate a lot of resources to watching where you go and what you see on the web.” The fact that Microsoft Windows is the most popular operating system on Earth14 just magnifies the concern. The law surrounding online advertising and the collection of user data is still immature, so advertisers have a very wide lane in which to operate. For example, a U.S. Federal Court ruled that ads displayed by search engines are protected as free speech when deciding what advertisements to display.15
Google AdSense16, sometimes called Google Syndication, is an advertising service Google provides that allows webmasters to earn advertising revenue by hosting AdSense ads (see Figure 7-4). These revenues aren’t trivial, commonly ranging from a few hundred dollars a month to $50,000 or more per year, making the service extremely popular.17 AdSense advertisements are context-sensitive ads served by Google based on the hosting site’s content. Unfortunately, merely visiting a web site hosting these advertisements informs Google of the user’s IP address and gives Google the opportunity to log the user’s visit and tag the user’s browser with a cookie.
Figure 7-4 Screenshot of Google AdSense. Notice the (debatably) unobtrusive advertisement in the bottom-right corner. Embedded advertisements such as these alert the advertising network of your presence on a site.
AdSense isn’t limited to textual ads on traditional web pages. Google is experimenting with AdSense for other forms of content, including RSS Feeds,18 web site search boxes,19 mobile content,20 video, and Cost Per Action AdSense.21, 22 Nor are AdSense and similar services limited to minor sites. Major online retailers also participate. For example, eBay signed deals to run ads from Google and Yahoo!.23 Figure 7-5 shows an example with eBay and Yahoo!. When searching for an item on eBay, Yahoo! servers provide contextual advertisements, leaving open the likelihood of Yahoo!’s logging of eBay visitors.24
Figure 7-5 Screenshot of Yahoo! advertisements embedded in an eBay web page. Because these ads are pulled directly from Yahoo! servers, user information such as cookies and IP addresses is disclosed directly to Yahoo!.
The future of AdSense is difficult to determine. Some analysts believe that Google is acquiring sites that will provide traffic itself instead of paying adverting fees to third-party sites.26
AdWords is a fundamental part of Google’s business model. According to the BBC, every time a user conducts a search on Google, the company makes 12¢ in revenue.27 When you consider that Google receives more than 60 billion searches per year in the United States alone, you can see that the program generates huge profit. Google believes that AdWords “is the largest program of its kind.”28 Using AdWords, would-be advertisers bid on search terms that are displayed as part of the user’s search results; the better the placement, the higher the cost. AdWords are relatively unobtrusive, but quite effective, advertisements (see the right side of Figure 7-6).29
Figure 7-6 Screenshot of Google AdWords. Notice the advertisements on the right side of the image.
AdWords poses both information-disclosure risk and other security risks. Attackers have used AdWords and similar services to misdirect users to malicious sites; see the “Malicious Ad Serving” section later in the chapter.30 However, Google’s AdWord Partners are a significant information-disclosure risk because searches from these sites can be sent to Google. According to Google those who have already joined their “growing advertising network” include AOL, Ask.com, Ask Jeeves, AT&T Worldnet, CompuServe, EarthLink, Excite, and Netscape.31 Even third-party search engines that delete their logs locally are still at risk. Take, for example, Ask.com, who took an industry-leading position by offering AskEraser, a function that deletes search activity from Ask.com servers.32 However, Google delivers the bulk of Ask’s advertisements, so user information, including the search query and IP address, are passed back to Google each time a page is served to a visitor.33
DoubleClick is a major online advertising service, long criticized for using cookies and IP addresses to track users as they surf the web.34 DoubleClick is an extremely popular advertising service and counts a large number of Fortune 500 companies as clients. In 2007, Google announced a definitive agreement with DoubleClick for $3.1 billion in cash to acquire the company. The acquisition drew the attention of the U.S. Federal Trade Commission and European Regulators, who investigated antitrust and privacy implications but eventually acquiesced.35, 36, 37, 38 Google closed the acquisition of DoubleClick shortly thereafter.39
The implications of a combined Google–DoubleClick dreadnaught are significant. Google excels in search advertising (AdWords) and simple textual advertisements (AdSense). On the other hand, DoubleClick excels in “display advertising,” such as flashy banner ads and video advertisements, which reach between 80% and 85% of the web population.42, 43 The end result is a broad net that permits Google to track a user’s web searches and web site visits, with the potential to impact the privacy interests of more than 1.1 billion Internet users worldwide.44 This acquisition underscores the fact that mergers and acquisitions are about data, including both existing data stockpiles and access to continued data streams.