robots.txt  ??? ??

robots.txt file stored in the root of your website will tell web robots like search engine spiders what directories and files they are allowed to crawl. It’s easy to use a robots.txt file, but there are some things you should remember:

  1. Black hat web robots will ignore your robots.txt file. The most common types are malware bots and robots looking for email addresses to harvest.
  2. Some new programmers will write robots that ignore the robots.txt file. This is usually done by mistake.
  3. Anyone can see your robots.txt file. They are always called robots.txt and are always stored at the root of the website. For example, About.com’s robots.txt file is here http://www.samplesite.com/robots.txt.
  4. Finally, if someone links to a file or directory that is excluded by your robots.txt file from a page that is not excluded by theirrobots.txt file, the search engines may find it anyway.
  5. Don’t use robots.txt files to hide anything important. Instead you should put important information behind secure passwords or leave it off the web entirely

    How to Use These Sample Files

    Copy the text from the sample that is closest to what you want to do, and paste it into yourrobots.txt file. Change the robot, directory, and file names to match your prefered configuration.

    Two Basic Robots.txt Files

    User-agent: *
    Disallow: /
    This file says that any robot (User-agent: *) that accesses it should ignore every page on the site (Disallow: /).
    User-agent: *
    Disallow:
    This file says that any robot (User-agent: *) that accesses it is allowed to view every page on the site (Disallow:).
    You can also do this by leaving yourrobots.txt file blank or not having one on your site at all.

    Protect Specific Directories from Robots

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /temp/
    This file says that any robot (User-agent: *) that accesses it should ignore the directories /cgi-bin/ and /temp/ (Disallow: /cgi-bin/ Disallow: /temp/).

    Protect Specific Pages from Robots

    User-agent: *
    Disallow: /jenns-stuff.htm
    Disallow: /private.php
    This file says that any robot (User-agent: *) that accesses it should ignore the files /jenns-stuff.htm and /private.php (Disallow: /jenns-stuff.htm Disallow: /private.php).

    Prevent a Specific Robot from Accessing Your Site

    User-agent: Lycos/x.x
    Disallow: /
    This file says that the Lycos bot (User-agent: Lycos/x.x) is not allowed access anywhere on the site (Disallow: /).

    Allow Only One Specific Robot Access

    User-agent: *
    Disallow: /
    User-agent: Googlebot
    Disallow:
    This file first disallows all robots like we did above, and then explicitly lets the Googlebot (User-agent: Googlebot) have access to everything (Disallow:).

    Combine Multiple Lines to Get Exactly the Exclusions You Want

    While it’s better to use a very inclusive User-agent line, like User-agent: *, you can be as specific as you like. Remember that robots read the file in order. So if the first lines say that all robots are blocked from everything, and then later on in the file it says that all robots are allowed access to everything, the robots will have access to everything.
    If you’re not sure whether you’ve written your robots.txt file correctly, you can use Google’s Webmaster Tools to check your robots.txt file or write a new one.

0 comments :

Post a Comment

Thank you for feedback

 
How to Lose Weight at Home Top