Robots.txt Guidelines

Why do I need a robots.txt file?  Are there robots.txt options?  Where can I get robots.txt free download?  In this article I am going to give answers to all these questions.

I have just created a new blog situated at http://irinaovery.com A few days after the blog submission to Google AdSense I noticed that AdSense ads stopped coming.  I checked Google AdSense Site Diagnostics page and found that my blog URL were blocked, because of Robots.txt file.  What does it mean?

  • The robots.txt file ‘disallows’ or does not let Google crawl my blog
  • The robots.txt file does not exist in the root directory of my blog

First thing I had to do was to check if I had a robots.txt file in my blog root directory.  For this purpose I used FTP FileZilla Client application.  By the way it is free and you can download following the link below.

http://filezilla-project.org/

I opened my blog root directory and found that a robots.txt file did not exist.

Next step was to create a robots.txt file.  The purpose of robots.txt file is to provide the search engine spiders with information. The information lets the search engines know which pages should be crawled and which should not.  Theoretically, if you do not have a robots.txt file, the search engines would crawl and index all your pages.  However, in reality, it could be a different situation.  This is what had happened in my case.

Even if you have the file, I would recommend amending it in the way you want the search engines crawl your pages.

For example, your website has a lot of images.  These images could be spidered by the search engines such as Google Images.  It could increase your server bandwidth and create a problem.  For this reason you may not want to authorize search engines spiders to crawl your image folder, which is on the root directory.

Another reason why you may not want Google to crawl your pages is a sale of a copyrighted digital product on your website.  If you prevent the search engine spiders from crawling a page where your product is located, then nobody can download your product without paying.

For creating a robots.txt file you can use any text editor.  For example, it could be Notepad.  However, if you know nothing about creating a robots.txt file, you can use the robots.txt generator at http://www.mcanerin.com/EN/search-engine/robots-txt.asp

For my blog I created the following code:

User-agent: googlebot-image

Disallow: /

User-agent: psbot

Disallow: /

User-agent: *

Disallow:

Disallow: /cgi-bin/

‘User-agent’ line names search engine spiders.  The next line is a message for these search engines spiders.  I do not want Google Images to crawl images on pages.  I let know ‘googlebot-image’ do not spider pages – ‘Disallow:/’.

I also did not allow MSN PicSearch crawling pages:

User-agent: psbot

Disallow: /

For all other search engines spiders (User-agent: *) I permitted to crawl (Disallow:) pages.

Restricted directories for every search engine is /cgi-bin/.

Thus creating a robots.txt file could help you instruct search engines spiders to crawl a particular page.

Do not forget to upload your robots.txt file to your server using FTP application.

  • Share/Bookmark

1 comment to Robots.txt Guidelines

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

You must be logged in to post an
interactive video comment.