Many talks about this little file... robots.txt. Is it realy usefull ? what do you have to put inside ?

Usefull ?

In fact robots.txt is an robot/spider exclusion protocol. Every spider/reboot should read it and apply the rules it contains. The robots.txt file directive can be used to not allow a spider/rebot to index a specific directory, file, or more generally every single ressource you have on your hosting space/account..

 

Robots.txt is a simple text file, that you should create at the top of your website. Yous souhlud be able to view it, using : www.mysite.com/robots.txt

To disallow indexing for your /pictures/ directory write theses lines in robots.txt

 

User-agent: *
Disallow: /pictures/

 

In details :

  • User-agent  : Allow to specify a spider/robot name to whom the following rules should be applied, you can use, googlebot, yahoo, bing etc... here, we're using * wich is a wildcard to specify all robots.
  • Disallow : This directive is used to specify what should not be indexed. It supports folder, file or even wildcard *, for example if have many /pictures/ directory, like /pictures-a/ /pictures-b/ ... you can simply write Disallow : /pictures-*

But, be warned even the robots.txt has to been read by every spider/robot, some of them  can by pass theses directive. Human people can read this file too, and by analysing it, have a precise idea of what CMS or script you use on your website (investitatging directory structure, specific readme file etc...). Honestly even you fear about that, they are plent of other ways to do that, you should not ban robots.txt for that, just use in a smart way. C'est exact... mais franchement il y a des dizaines d'autres moyens que de lire le robots.txt, son utilisation n'est pas remise en cause pour ça. Utilisez le avec intelligence.

What about adding a sitemap into it ?

Yes robots.txt is an efficiient way to tell everys spider/robot crawoling your site that you also have a sitemap.xml avaible. This can simply be done this adding this line to robots.txt file

 

Sitemap: http://www.mon-domaine.fr/sitemap.xml

 

A "good" robot will easily find the sitemap.xml and you would'nt have to submit manually to all knowed/unknowed search engines. If you have more than one sitemap to declare, just add a line by sitemap.

To have more details of this subject, i suggest you some further reading :

Search & Share