The Importance of the New Googlebot
IBM has a lot of contentmillions of URLs supporting thousands of offerings, hundreds of solutions, and dozens of industries. The company for which I serve as search strategy lead publishes in more than 90 countries and some 50 languages. It has a diverse audience from administrators to CEOs and every business role in between.
No matter what the role, every member of the ibm.com audience is time-challenged and attention-starved. To speed up their information tasks, the overwhelming majority of them use search. According to a recent study by TechTarget, search is the leading venue our audience uses to do their research.
Developing a content strategy that tends to optimize this massive content footprint for our audience’s search behaviors is not optional. But the sheer size and complexity of IBM’s content footprint makes optimizing the collection extremely challenging.
One of the few ways we can better manage our content inventory is by building reusable content components that can be assembled dynamically. But the very thing that could help us present less cluttered experiences to users and search crawlers also limits us. Until very recently, search spiders primarily ignored content served within scripts or other dynamic content applications such as AJAX. Why? Because they couldn’t execute scripts to “see” inside them.
The context of the quote was Cutts’ confirmation that Google now indexes some Facebook comments. He seemed clear that the new function of Googlebotthe spider that crawls through sites looking for contentis deployed for this special purpose. Still, this is big news. If Googlebot can crawl through dynamic content, it is not hard to imagine a day sometime soon when it will do this pervasively.
When that day comes, search marketers and content strategists will have a much easier time improving user experiences with dynamic content. This article helps these folks prepare for that day in terms of three examples in the ibm.com environment.
One of the most difficult things about managing such a large content footprint is broken links. We try to be aggressive with retiring old content. But this often causes pages that link to the retired content to serve up error messages to users who click those links. Broken links not only cause usability problems, but they’re embarrassing to the brand.
One solution to this problem was to build an application called the Merchandizing Trading Exchange (MTE) tool. Originally built to help teams dynamically share merchandizing modules between relevant pages, teams quickly learned it could be used to serve relevant links on pages without the threat of broken links. If a page is retired within MTE, the modules that have that link embedded within them automatically delete the link without further maintenance from the content team. Figure 1 shows an example.
Figure 1 A typical MTE module on ibm.com
MTE has significantly reduced broken links, but it has had an adverse affect on our search results. We used to get credit for all the hard-coded links we developed between pages in our environment. These links are not as valuable as external links into our pages. But ibm.com is a valued domain whose links provide some equity, or link juice. Also, it is easier to request a relevant link from a colleague in a sister brand or business unity than it is from an external third party such as a publication.
The other benefit will be to our efforts to reduce broken links. Because page owners often care more about search results than broken links, they continue to hard-code a lot of their links within the white space of their pages rather than use MTE. When Googlebot crawls links within MTE, that practice should be phased out.