Home > Articles > Home & Office Computing > The Web/Virtual Worlds/Social Networking

The Importance of the New Googlebot

  • Print
  • + Share This
We have always advised search marketers to limit the use of JavaScript. Why? If content or other assets are hidden within JavaScript or other kinds of scripts, the crawler couldn’t unlock their hidden secrets. Google recently announced that its crawler can “see” inside JavaScript and AJAX code. How does search marketing change in the wake of this announcement? Web specialist and self-proclaimed IBMer James Mathewson (author of Audience, Relevance, and Search: Targeting Web Audiences with Relevant Content) has some ideas.
Like this article? We recommend

IBM has a lot of content—millions of URLs supporting thousands of offerings, hundreds of solutions, and dozens of industries. The company for which I serve as search strategy lead publishes in more than 90 countries and some 50 languages. It has a diverse audience from administrators to CEOs and every business role in between.

No matter what the role, every member of the ibm.com audience is time-challenged and attention-starved. To speed up their information tasks, the overwhelming majority of them use search. According to a recent study by TechTarget, search is the leading venue our audience uses to do their research.

Developing a content strategy that tends to optimize this massive content footprint for our audience’s search behaviors is not optional. But the sheer size and complexity of IBM’s content footprint makes optimizing the collection extremely challenging.

One of the few ways we can better manage our content inventory is by building reusable content components that can be assembled dynamically. But the very thing that could help us present less cluttered experiences to users and search crawlers also limits us. Until very recently, search spiders primarily ignored content served within scripts or other dynamic content applications such as AJAX. Why? Because they couldn’t execute scripts to “see” inside them.

I say “until recently” because Google just released a version of its spider that can “see” into JavaScript and other applications to find relevant content. To quote the Twitter feed of Google’s search quality chief Matt Cutts, "Googlebot keeps getting smarter. Now has the ability to execute AJAX/JavaScript to index some dynamic comments."

The context of the quote was Cutts’ confirmation that Google now indexes some Facebook comments. He seemed clear that the new function of Googlebot—the spider that crawls through sites looking for content—is deployed for this special purpose. Still, this is big news. If Googlebot can crawl through dynamic content, it is not hard to imagine a day sometime soon when it will do this pervasively.

When that day comes, search marketers and content strategists will have a much easier time improving user experiences with dynamic content. This article helps these folks prepare for that day in terms of three examples in the ibm.com environment.

Using JavaScript to Reduce Broken Links

One of the most difficult things about managing such a large content footprint is broken links. We try to be aggressive with retiring old content. But this often causes pages that link to the retired content to serve up error messages to users who click those links. Broken links not only cause usability problems, but they’re embarrassing to the brand.

One solution to this problem was to build an application called the Merchandizing Trading Exchange (MTE) tool. Originally built to help teams dynamically share merchandizing modules between relevant pages, teams quickly learned it could be used to serve relevant links on pages without the threat of broken links. If a page is retired within MTE, the modules that have that link embedded within them automatically delete the link without further maintenance from the content team. Figure 1 shows an example.

Figure 1 A typical MTE module on ibm.com

MTE has significantly reduced broken links, but it has had an adverse affect on our search results. We used to get credit for all the hard-coded links we developed between pages in our environment. These links are not as valuable as external links into our pages. But ibm.com is a valued domain whose links provide some equity, or link juice. Also, it is easier to request a relevant link from a colleague in a sister brand or business unity than it is from an external third party such as a publication.

When page owners stopped hard-coding links in favor of using MTE to dynamically render links, they lost that link equity. Why? Because MTE is built in JavaScript. There really is no way to build such a dynamic widget without a scripting language. But Googlebot (and all other crawlers) could not index these links because crawlers can’t execute code.

If Googlebot can now execute JavaScript code, it will have a huge impact on ibm.com search results. Suddenly, pages that look like orphans to the crawler, because all the links into it and out of it are built in JavaScript, will look like they belong to a family of relevant pages. Tens of millions of links for which IBM currently gets no credit will suddenly show up in Google’s index.

The other benefit will be to our efforts to reduce broken links. Because page owners often care more about search results than broken links, they continue to hard-code a lot of their links within the white space of their pages rather than use MTE. When Googlebot crawls links within MTE, that practice should be phased out.

  • + Share This
  • 🔖 Save To Your Account