September 8

Stone Ji Magical robots witnessed the rise and fall of the site

User-agent: *

1. shielded all search engines to crawl information, if your website is your personal website, do not want too many people know it, you can use the robots screen all search engines, such as your personal blog. You can put the search engine all shield


using robots Several

Disallow: /.bmp$

Disallow: /

2. if you just want a search engine to crawl your information, this time can be used to set up robots, for example: I want my love Shanghai this site was included, and do not want to be other search engines. You can set the

User-agent: *

4. can also be used to shield the relevant URL *, some sites do not allow search engines crawl dynamic address can use the * wildcard matching set >


User-agent: Baiduspider

Disallow: /.gif$

Disallow: /.jpeg$

User-agent: *

3. can be corresponding to the deployment site using wildcards, I don’t want to grab all the pictures of my website for example, this time you can use to set. In general our common picture format is BMP, JPG, GIF, JPEG and other formats. This time is set up:

Disallow: /

had promised to write an article for ah bin, thank him for a help to me, but until now there is no written, a few days ago to see Zhuo ask a few questions about the robots problem, to collate all about some robots. Robots.txt file in the root directory of the web site, is the first time the file search engine in the website to view. When a search spider to visit a site, it will first check whether robots.txt exists, the site root directory if it exists, the robot will search range according to the contents of the file to determine access; if the file does not exist, all search spiders will be able to access the website all pages are not password protected the. Each site should have a robots, it tells the search engine of my site in what is not allowed to crawl, which pages are welcome to crawl and crawl.

Disallow: /.jpg$

Posted September 8, 2017

