Eliminate Spider Traps
Search engines use special programs, known as spiders (or crawlers) to harvest your pages on a regular basis. The spiders come visiting your site and can usually grab your content and store it away in the search index, but not always. What is it about your Web site that traps a spider, anyway? The list is long and distinguished:
- You are telling the spider to scram. Spiders will stay away from parts of your site (or even your whole site) if your robots.txt file tells them to. In addition, robots tags in your Web pages can request that the spider leave those pages out of the search index. Check your robots directives to ensure that nothing is excluded from search indexes in error.
- Your navigation requires a human being. If your site requires visitors to click buttons in pop-up windows or fill out registration forms to see some of your pages, the spiders won't be able to do that. To open your site to the spiders, you must change these navigation techniques or offer alternatives that use regular links on your pages.
- Your pages are poorly coded. Browsers are incredibly forgiving of incorrect HTML coding, but spiders are far less so. Most Web sites are rife with coding errors that visitors never notice, but may cause content to be lost or misinterpreted if uncorrected. Make sure that all new and changed Web pages on your site are run through the HTML validator before going public.
- You don't answer the door. You won't know in advance when the spider will come to call, so you need to be ready all the time. If your server is down for maintenance, or just plain slow, spiders will go knock on someone else's door.
Every page on your site may contain one or more of these spider traps, so it pays to be vigilant. Every trap you remove allows more pages on your site to be indexed.