Google Automated Scanning
Google frowns on automation: "You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that 'sending automated queries' includes, among other things:
using any software which sends queries to Google to determine how a web site or web page 'ranks' on Google for various queries;
'meta-searching' Google; and
performing 'offline' searches on Google."
Any user running an automated Google querying tool (with the exception of tools created with Google's extremely limited API) must obtain express permission in advance to do so. It's unknown what the consequences of ignoring these terms of service are, but it seems best to stay on Google's good side.
Gooscan is a UNIX (Linux/BSD/Mac OS X) tool that automates queries against Google search appliances (which are not governed by the same automation restrictions as their web-based brethren). For the security professional, gooscan serves as a front end for an external server assessment and aids in the information-gathering phase of a vulnerability assessment. For the web server administrator, gooscan helps discover what the web community may already know about a site thanks to Google's search appliance.
For more information about this tool, including the ethical implications of its use, see http://johnny.ihackstuff.com.
The term "googledork" was coined by the author and originally meant "An inept or foolish person as revealed by Google." After a great deal of media attention, the term came to describe those who "troll the Internet for confidential goods." Either description is fine, really. What matters is that the term googledork conveys the concept that sensitive stuff is on the web, and Google can help you find it. The official googledorks page lists many different examples of unbelievable things that have been dug up through Google by the maintainer of the page, Johnny Long. Each listing shows the Google search required to find the information, along with a description of why the data found on each page is so interesting.
The concept of a honeypot is very straightforward. According to http://www.techtarget.com, "A honey pot is a computer system on the Internet that is expressly set up to attract and 'trap' people who attempt to penetrate other people's computer systems."
To learn how new attacks might be conducted, the maintainers of a honeypot system monitor, dissect, and catalog each attack, focusing on those attacks that seem unique.
An extension of the classic honeypot system, a web-based honeypot or "page pot" (click here to see what a page pot may look like) is designed to attract those employing the techniques outlined in this article. The concept is fairly straightforward. Consider a simple googledork entry like this:
This entry could easily be replicated with a web-based honeypot by creating an index.html page that referenced another index.html file in an /admin/userlist directory. If a web search engine such as Google was instructed to crawl the top-level index.html page, it would eventually find the link pointing to /admin/userlist/index.html. This link would satisfy the Google query of inurl:admin inurl:userlist, eventually attracting a curious Google hacker.
The referrer variable can be inspected to figure out how a web surfer found a web page through Google. This bit of information is critical to the maintainer of a page pot system, because it outlines the exact method the Google searcher used to locate the page pot system. The information aids in protecting other web sites from similar queries.
GooPot, the Google honeypot system, uses enticements based on the many techniques outlined in the googledorks collection and this document. In addition, the GooPot more closely resembles the juicy targets that Google hackers typically go after. Johnny Long, the administrator of the googledorks list, utilizes the GooPot to discover new search types and to publicize them in the form of googledorks listings, creating a self-sustaining cycle for learning about and protecting from search engine attacks.
Although the GooPot system is currently not publicly available, expect it to be made available early in the second quarter of 2004.