![]() |
|
|||||||||
Search engines collect information about your site with scan robots (that are also called: Crawlers, Spiders and Robots) that constantly scan the Internet. These robots are actually programs (pretty primitive - although they are constantly improving) whose purpose is to download webpages into a database, to search for links to new pages, download the new pages and so on.
A robot friendly websiteSince robots are pretty primitive, they like simple sites. Websites that are based on advanced technologies risk not being understood by the robots. Technologies that you should avoid:
Lets repeat the basics - simple is good. GoogleBot - Google's scan robotFollowing are some points that you should know about Google's scan robot - GoogleBot: Scanning frequency: GoogleBot scans pages and websites with a varying frequency. The parameters that GoogleBot uses in order to determine which pages should be scanned more often are the PageRrank (PR), the number of links leading to the page, and a number of URL parameters (e.g.: is it a dynamic php or asp page). There are obviously additional parameters, but these are hard to figure out. Id parameter: GoogleBot may not scan dynamic sites that include a parameter called "id" since this variable is often used only for saving the session id. It is very possible that you should avoid using these two letters even inside a longer name (e.g.: catid) - but there is no concrete verification of this. Website scan levelsRobots that scan the internet have three different main scan levels, as follows: Scanning for new pages: This scan is performed in order to locate new pages that do not exist in the search engine database. The robot can "discover" the page if it was submitted through the search engine's "add a website" page, or if it encounters a links to the new page in one of the pages that already exist in its page database. Shallow scan of important pages: This scan includes only the most important pages on the website (usually the homepage), and is performed more often. Deep Scan: This scan includes all of the site pages that appear in the database in order to locate new pages and alterations to existing pages. This type of scan is not performed frequently. Limiting robot accessSometimes you will want to prevent search robots from accessing a certain area of your site. A basic example is a folder that you don't want to expose by mistake, or a page that is no longer updated. There are two ways to block robots' access to certain areas of your site: Robots.txt FileYou will often be interested in blocking a certain search engine robot's access to your site (or to a part of it), or blocking access to a certain area from all robots. This is what the robots.txt file is used for. Note: blocking a search engine from accessing a certain page will indeed prevent the content collection of that page, but sometimes, if there are links to that page from other pages that are not blocked, the page will still appear in the search results, but without its information (title, description etc). If you want to completely prevent this page from being displayed, you should use the other method (robots tag). The robots.txt file should be placed in your site's root folder (usually, it will not appear there naturally, you need to create it). Each part of this file contains a type of robot and the limitations that apply to it. In addition, there may be limitations that apply to all robots.
Robots Meta TagIn order to control the way search robots process certain pages on your site, you can use the robots tag. This tag controls the following:
|
||||||||||
|
||||||||||
|
||||||||||
© 2004-2013 SEO Israel Technologies Ltd. |