1.2 Common search engine principles
To understand seo you need to be aware of the architecture of search engines. They all
contain the following main components:
Spider - a browser-like program that downloads web pages.
Crawler – a program that automatically follows all of the links on each web page.
Indexer - a program that analyzes web pages downloaded by the spider and the crawler.
Database– storage for downloaded and processed pages.
Results engine – extracts search results from the database.
Web server – a server that is responsible for interaction between the user and other
search engine components.
Specific implementations of search mechanisms may differ. For example, the
Spider+Crawler+Indexer component group might be implemented as a single program
that downloads web pages, analyzes them and then uses their links to find new resources.
However, the components listed are inherent to all search engines and the seo principles
are the same.
Spider. This program downloads web pages just like a web browser. The difference is
that a browser displays the information presented on each page (text, graphics, etc.) while
a spider does not have any visual components and works directly with the underlying
HTML code of the page. You may already know that there is an option in standard web
browsers to view source HTML code.
Crawler. This program finds all links on each page. Its task is to determine where the
spider should go either by evaluating the links or according to a predefined list of
addresses. The crawler follows these links and tries to find documents not already known
to the search engine.