robots.txt ??? ??
A
robots.txt
file stored in the root of your website will tell web robots like search engine spiders what directories and files they are allowed to crawl. It’s easy to use a robots.txt file, but there are some things you should remember:- Black hat web robots will ignore your robots.txt file. The most common types are malware bots and robots looking for email addresses to harvest.
- Some new programmers will write robots that ignore the robots.txt file. This is usually done by mistake.
- Anyone can see your robots.txt file. They are always called
robots.txt
and are always stored at the root of the website. For example, About.com’s robots.txt file is here http://www.samplesite.com/robots.txt. - Finally, if someone links to a file or directory that is excluded by your robots.txt file from a page that is not excluded by theirrobots.txt file, the search engines may find it anyway.
Don’t use robots.txt files to hide anything important. Instead you should put important information behind secure passwords or leave it off the web entirely
How to Use These Sample Files
Copy the text from the sample that is closest to what you want to do, and paste it into your
robots.txt
file. Change the robot, directory, and file names to match your prefered configuration.Two Basic Robots.txt Files
User-agent: *
Disallow: /
This file says that any robot (
User-agent: *
) that accesses it should ignore every page on the site (Disallow: /
).
User-agent: *
Disallow:
This file says that any robot (
User-agent: *
) that accesses it is allowed to view every page on the site (Disallow:
).
You can also do this by leaving yourrobots.txt file blank or not having one on your site at all.
Protect Specific Directories from Robots
User-agent: *
Disallow: /cgi-bin/
Disallow: /temp/
This file says that any robot (
User-agent: *
) that accesses it should ignore the directories /cgi-bin/ and /temp/ (Disallow: /cgi-bin/ Disallow: /temp/
).Protect Specific Pages from Robots
User-agent: *
Disallow: /jenns-stuff.htm
Disallow: /private.php
This file says that any robot (
User-agent: *
) that accesses it should ignore the files /jenns-stuff.htm and /private.php (Disallow: /jenns-stuff.htm Disallow: /private.php
).Prevent a Specific Robot from Accessing Your Site
User-agent: Lycos/x.x
Disallow: /
This file says that the Lycos bot (
User-agent: Lycos/x.x
) is not allowed access anywhere on the site (Disallow: /
).Allow Only One Specific Robot Access
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow:
This file first disallows all robots like we did above, and then explicitly lets the Googlebot (
User-agent: Googlebot
) have access to everything (Disallow:
).Combine Multiple Lines to Get Exactly the Exclusions You Want
While it’s better to use a very inclusive User-agent line, like
User-agent: *
, you can be as specific as you like. Remember that robots read the file in order. So if the first lines say that all robots are blocked from everything, and then later on in the file it says that all robots are allowed access to everything, the robots will have access to everything.
If you’re not sure whether you’ve written your robots.txt file correctly, you can use Google’s Webmaster Tools to check your robots.txt file or write a new one.
0 comments :
Post a Comment
Thank you for feedback