The Robots.txt protocol
The Robots.txt protocol, also called the "robots exclusion standard" is designed to lock out web spiders from accessing part of a website. It is a
security or privacy measure, the equivalent of hanging a "Keep Out" sign on your door.
This protocol is used by web site administrators when there are sections or files that they would rather not be accessed by the rest of the world. This
could include employee lists, or files that they are circulating internally. For example, the White House website uses robots.txt to block any inquiries on
speeches by the Vice President, a photo essay of the First Lady, and profiles of the 911 victims.
How does the protocol work? It lists the files that shouldn't be scanned, and places it in the top-level directory of the website. The robots.txt protocol
was created by consensus in June 1994 by members of the robots mailing list (email@example.com). There is no official standards body or
RFC for the protocol, so it's difficult to legislate or mandate that the protocol be followed. In fact, the file is treated as strictly advisory, and does not
have absolute guarantee that those contents won't be read.
In effect, robot.txt requires cooperation by the web spider and even the reader, since anything that is uploaded into the internet becomes publicly
available. You aren't locking them out of those pages, you are just making it harder for them to get in. But it takes very little for them to ignore these
instructions. Computer hackers can also easily penetrate the files and retrieve information. So the rule of thumb isâ€”if it's that sensitive, it shouldn't be
on your website to begin with.
Care, however, should be taken to ensure that the Robots.txt protocol doesn't block the website robots from other areas of the website. This will
dramatically affect your search engine ranking, as the crawlers rely on the robots to count the keywords, review metatags, titles and crossheads, and
even register the hyperlinks.
One misplaced hyphen or dash can have catastrop