Welcome to Net66!
Today, we’re going to take a look at how a search engine works from the inside. Lets take a look at the innards of a search engine.
First, we’ve got the world wide web. Next the spider traverses the world wide web. The Spider is a specialised computer program built by search engines to crawl or traverse the world wide web.
The spider passes its findings onto the search engine where the data is extracted, tokenised by which it means that the data is broken up into little bits, discrete bits such as title tags and h1 tags. It is then selected and unwanted bits are discarded, and finally stored.
It is then passed into the data warehouse that houses all of the search engine’s cache. Now lets take a look at what happens on the other side when a user makes a query in the search engine. We’ve got the user here, the query is passed, first to the proxy server, the proxy server is responsible for all the local communication to and from the user.
Next it is passed to the personalisation server. This is responsible for the geographical and other localised preferences for the user.
From there the data is passed to the web server. The web server is like any other web server on the internet that deals primarily with TCP/IP communication on the internet that passes through TCP/IP port 80.
From there, the data is passed into the search engine. The query is then extracted from the data stores and passed to the data server.
The Data server serves the data from the data warehouse to the semantic algorithms. The semantic algorithms are responsible for translating the data from the data warehouse into a 1-10 search engine results page.
This is further propagated into the search results we see, on Google for example.
The search results are then passed into the web server and from there they’re served back to the personalisation and geo-targeting server, from there to the proxy server and then finally, to the user as a 1-10 list of search results.
And that is briefly how a search engine works from the inside.
Thanks for watching!