Robots.txt: Understanding and Directing the Crawlers Part 1

January 21, 2010  |   Internal Architecture   |   0 Comment

Welcome to part 1 of the Understanding & Directing the Crawlers series. We’re going to discuss Robots.txt. Simply put, it is a text file containing instructions telling crawlers what to crawl and what not to crawl on the site.

The only time that Robots.txt should be used is when you don’t want a page to be neither crawled nor indexed. (i.e. website admin page, special membership access page, etc)

To help me describe this topic, I’m going to use my pal Steve and his website (www.steveswebsite.com) as an example.  Steve has a few pages that he wants to keep from getting crawled and also from getting indexed in the search engines. Steve needs to put this file in the website’s main folder.  This is the same folder that his index.html file is in.  (If this doesn’t make sense, be sure to talk with your webmaster).

I don’t want to get too into the technical jargon, so here are some examples from Steve’s website:

If Steve has only 1 page that he wants the crawlers to crawl, his robots.txt file would be:

User-agent: *
Disallow: /
If Steve wants the crawlers to crawl & index every single page on his website, his robots.txt file would be:
User-agent: *
Disallow:

(Steve could’ve technically created an empty robots.txt file or just not have created one at all)

If Steve has a few pages he doesn’t want the crawlers to crawl, his robots.txt file would be:
User-agent: *
Disallow: /personal/
Disallow: /finances/

(www.StevesWebsite.com/personal/SSN & www.StevesWebsite.com/finances/statements will not be crawled)

If Steve wants to block one particular crawler bot, his robots.txt file would be:
User-agent: (name of bot)
Disallow: /
If Steve wants to allow one particular crawler bot, his robots.txt file would be:
User-agent: (name of bot)
Disallow:

User-agent: *
Disallow: /

I want to thank Steve for being so helpful!  If this post was a little confusing, it’s definitely more on the technical aspect of SEO, but I think it’s important to at least know what it is.

One more important thing to remember, make sure the file is all lowercase.  Use “robots.txt” NOT “Robots.TXT

Stay tuned for the No Index tag.





About the author


Maximus Kang is the Director of SEO Strategy & Founder of Ranking Channel, a Seattle-based SEO consulting agency. With enterprise level experience at Expedia and agency experience at Optify, his SEO knowledge covers a wide spectrum. He also started his very . Follow him on Twitter or connect with him on Facebook.

New to SEO? You can learn How to Win Users & Influence Google.