Robots Text File

Every site should have a robots.txt file. It can often be overlooked but it doesn’t take much to add one. Wikipedia says “The robot exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard complements Sitemaps, a robot inclusion standard for websites.

A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data.
For websites with multiple sub-domains, each sub-domain must have its own robots.txt file. If example.com had a robots.txt file but a.example.com did not, the rules that would apply for example.com will not apply to a.example.com.”

Tags: ,

Leave a Reply