Google Dance : The Google's Update
Topics: The update of Google archives , Google dance (the monthly one) is dead...almost. Fresh Crawl, Deep crawl.
The so-called Florida Update. Localrank, Topic sensitive Pagerank™
(also known as Google Dance) Google's crawlers (GoogleBot) are "agents". They're small pieces of software that continuosly run through the web searching for new contents among websites and webpages.
These crawlers perform two types of search:
1. Fresh-crawl (each day)
It updates pages that are already indexed and add new contens since the last DEEP crawl. This kind of crawl is useful to have always an updated cache version of many news sites, whose contents change very often.
THE GOOGLE's EVERFLUX: Fresh-Crawl adds its findings to a temporary database which is incorporated into the results returned from the main archive which allows Google to continue its normal update cycle but also return very fresh and up-to-date content. Some confusion may come from the fact that Fresh-Crawl database is rewritten each day with the results from the latest round of crawling. This means that a page that was in the temporary database on one day may be completely disappear the next one. This can cause a new site to be added to the temporary database only to be overwritten and disappear the next day.
If this happens to your site, dont worry: new sites that are found and then disappear will almost always reappear permanently once the deep-crawl (see below) index them and they are added to the main google archive.
2. Deep-crawl (once per month)
Deep Crawl is performed once per month and all the web is scanned, page by page, document by document, content by content, updating indexes, ranking, pagerank and cache.
When Deep crawl is finished Google needs almost 6/8 days to completely refresh his indexes and propagate them to the 8 data centers (listed below). This is called Google Dance because results can change often and be very different each time. After some days results are stable.
Progression of Google Dance can be watched by querying the domain addresses of Google's data centers.
| Data Center Domain |
IP address |
www-ex.google.com
www-sj.google.com
www-va.google.com
www-dc.google.com
www-ab.google.com
www-in.google.com
www-zu.google.com
www-cw.google.com
www-fi.google.com |
216.239.33.100
216.239.35.100
216.239.37.100
216.239.39.100
216.239.51.100
216.239.53.100
216.239.55.100
216.239.57.100
216.239.41.100 |
notice that querying directly IP address will cause you to be redirected to main url www.google.com
Those that keep an eye on Google's index updates often think that the Google Dance is over, when they see the new index at www.google.com or when they don't see the old index at www.google.com for some time. In fact, the update is not finished until all the domains listed above provide results from the new index.
The index updates at the single data centers seem to happen at one point in time. As soon as one data center shows results from the new index, it won't switch back to the old index. This happens most likely because the index is redundant at each data center and at first, only one part of the servers (eventually half of them) is updated. During this period, only the other half of the servers is active and provides search results. As soon as the update of the first half of servers is finished, they become active and provide search results while the other half receives the new index. Thus, from the user's perspective, the update of one data centers happens at one point in time.
www2 and www3 Test Domains
When Google Dance begins, webmaster that are interested in their upcoming rankings can watch them at the test domains www2.google.com and www3.google.com. These have stable DNS which make the domains resolve to one IP address. Before Google Dance starts, one of the test domains is assigned the IP address of the data center that receives the new index first.
Reason for having www2 and www3 is to show the new index to webmasters before normal and common users can see it.SERP's that are seen on these test domains are going to be the same SERP which will soon appear in the main domain www.google.com as long as a regular, with no malfunctions, update has happened.
New Pagerank Value during Google Dance
PageRank of any page is provided to the toolbar by the 9 Google Data Centers. As the dance goes, these data centers (and pagerank values) are updated. Checking each of the centers during the dance reveals the new PageRank values as they gradually spread through the centers. If the PageRank isn't going to change, the centers show the same values throughout, of course.
Querying the data centers
For this, it is necessary to have the Google Toolbar installed and the PageRank indicator on. Every time a page is received by the browser, the Toolbar requests it's PageRank from one of Google's data centers. The information is returned as a one-line text file and stored in the Temporary Internet Files folder.
The Toolbar's request URL includes the URL of the page that it wants the PageRank for (the target page), and a checksum that matches that URL. Of course, the checksum must match the target page's URL.
A fat URL for a typical Toolbar request:-
http://216.239.33.102/search?client=navclient-auto&ch=5150615727&features=
Rank:FVN&q=info:http%3A%2F%2Fwww%2Eyoursiteurl%2Ecom%2F
Do not cut n' paste that fat URL into your browser, you will get the typical 403 Error page. That's because the target page and checksum don't match - it's just an example of the request URL. To get the new PageRank for a particular page, you need to make the same request that the Toolbar makes for it. I.e. you need the fat URL that the Toolbar uses. And you need to request the PageRank from all 9 of Google's data centers. The method is a bit long-winded but it works. here's how to do it:-
Use your browser to browse to the page. This makes sure that the page and the Toolbar's PageRank request are in your Temporary Internet Files folder. You only need to do this once - not every time.
Open the index.dat file from the Temporary Internet Files folder into a text editor, and perform a search in it for the target page. You'll find the entire fat URL, similar to the one above, for the Toolbar's PageRank request.
NOTE: Because the target page is escaped in the fat URL, search only for an unescaped part; e.g. "exampledomain".
When you've found the fat URL, copy and paste it into your browser's address box and press Return or click Go. If the page is in Google's directory, the returned line includes the directory path. The last element in the first part of the line is the Toolbar PageRank value for the target page.
To see the page's new PageRank spread across the centers during the dance, use the same fat URL, but replace the IP address with each of the 9 data centers. This is also a good way to see the progress of the dance in general.
Data centers
216.239.33.100
216.239.35.100
216.239.37.100
216.239.39.100
216.239.41.100
216.239.51.100
216.239.53.100
216.239.55.100
216.239.57.100
: If you want to check the same pages during future dances, save the fat URLs into a text document so that you don't need to go through the process of finding them in the Temporary Internet Files folder each time.
|
|