Business Process Outsourcing: Controlling Crawling and Indexing

Search engines generally have two main stages to make content available for users in search results. These are crawling and indexing. Crawling is the act of search engine crawlers accessing publicly available web pages. This involves looking at the web pages and following the links on those pages, just as a human user would do. Indexing involves gathering together information about a page so that it can be made available (“served”) through search results.
Automated website crawlers are powerful tools to help crawl and index content on the web. As a web master, you may wish to guide them towards your useful content and away from irrelevant content. The robots.txt file controls crawling, and the robots meta tag and X-Robots-Tag HTTP header element controls indexing. The robots.txt standard predates Google and is the accepted method of controlling crawling of a website.

How to use Robot.txt: It’s a simple text file to tell search robots which pages you don’t want them to index in search engine.
How to use Robots meta tags: It’s for those users who can’t control Robot.txt file like blogspot users. Blogger users can keep their content out of the search engine by using these robot meta tags.
How to use X-Robots-Tag Header: Just another method to restrict the access control of search engine. You can also prevent pdf file to index with this method.

Controlling Crawling

Location of the robots.txt file

Business Process Outsourcing

Business Process Outsourcing

Wednesday, December 22, 2010

Controlling Crawling and Indexing

Controlling Crawling

No comments:

Post a Comment