Page Path: googlerank.com / ranking / Ebook / howgoogleworks.html
how google works - overview features hardware limitations

Topics: Google hardware architecture , servers, what Google indexes , features and limitations, Google ranking principles and tips, Googleplex

Before starting to show you WHAT is Google, before demonstrating all optimization techniques you can apply to your website, you must learn the THREE FUNDAMENTAL RULES of Google ranking: If you want to have your website ranked high, you must THINK like Google does. And how does Google think? Well, to make it simple, Google acts like a PARANOID, LAZY, CURIOS, SLOW-CONNECTED HUMAN would (check out the "random surfer model", which is not limited to Pagerank technology but also to the whole Google crawling philosophy)

so:
1. Server Speed:
must be as fast as it is possible. Your website pages must be downloaded nearly at the speed of light. Yes it is, Google gives more visibility to websites that are resident on fast servers.
Weird but reasonable: Google puts authoritative sites higher in its archives, and a webmaster that spends MONEY for a good server/hosting must also own a website he wants to be important (at least he tries to).
Google also prefers fast servers because it's supposed to give its users links to relevant resources, fast to be accessed to. 'Nuff said, if you are planning to get much quality traffic from Google, you must wisely choose a very good hosting. If you are scared of spending too much, remember how many customers can Google bring you for FREE.

: DNS: you must choose those ISP that assign your website (even if virtual hosted) a unique DNS. Usually virtually hosted sites share their DNS with dozens of sites: this makes hard for a search engine to find your website (Google clearly has this limitation). Anyway, lately many providers have understood the importance of a unique DNS: unique DNS option is now offered by most ISP's. Easy to say, you have to choose those ones.

2. Site Updating:
Googlebot has the ability to check out WHEN your pages have been uploaded to the server. You can take advantage from this feature rather than being penalized (it definitely penalizes "old" pages, almost regardless of their contents), by keeping your pages FRESH and updated.
Obviously it's impossible to update many pages each day, and it is also impossible that you really have new contents each day, but here's the big trick: YOU CAN JUST UPLOAD ALL PAGES OF YOUR WEBSITE TO YOUR SERVER ON A DAILY BASIS, EVEN IF THEY HAVEN'T CHANGED. Let Google know that your pages are published often.

3. Lots of light HTML pages:
Google adores simple websites with hundreds of pages. Even if it may sound strange, Google doesn't like LONG BORING pages too much. Technical details of google crawling assume that googlebot indexes up to 101KB of code, but be sure that it already stops 'loving' your page very before those 101KB!
Here's the deal: produce HTML 4 pages, lighter than 50KB, keep them as simple as you can. If you are building a page that (because of its extensive contents) is going to be larger than 50K , split it in two or three pages.
Remember:
MANY PAGES, HTML format, LESS THAN 50KB (no more than 3 o 4 paragraphs). This Strategy Guide will teach you HOW TO MAKE THOSE 50KB Pages (and much more else).
Sit back and relax! Enjoy.

Google has a very complex hardware architecture . Its 8 data centers* handle almost 10.000 servers. Those 10.000 servers are something like "thin-clients", with large EIDE harddisks and powerful LAN cards to perform high throughput and satisfy multiple queries at once.
Google's Operative System is a modified version of well known Linux Red Hat; web server is Apache in a particular version called GWS (Google Web Server).
Easy to understand, this distribution-based architecture guarantees power, low costs and most of all, flexibility. To enhance this system Google has only to buy some more servers and get them working.
If you want to know something more about Google file system and architecture, here are very detailed documents about it. The Google File System http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf The Google (cluster) Architecture
http://www.computer.org/micro/mi2003/m2022.pdf
Google does not only index web pages, but Newsgroups' posts, Images, News and also items like books, software etc.... File types it recognizes are html, htm, and dynamic pages (asp, jsp....). It indexes also Microsoft Word files (.doc), Rich text format (.rtf) , text files (.txt), Acrobat PDF (.pdf). Many people do not use it only with the default "keyword AND keyword" search: knowing exactly how to use Google and its many features is more than important when optimizing your own website.
* data centers list: www-sj.google.com ; www-fi.google.com ; www-cw.google.com ; www-dc.google.com ; www-va.google.com ; www-ab.google.com : www-ex.google.com ; www-in.google.com
see also "Google update" section.
Default search:

All keywords as if connected by AND. If all the words occur as a phrase, that page will rank higher.
Search options:

Understands Boolean +
- and OR. Cannot be nested using ()
Use the * (Asterisk) as a Wildcard in a Phrase Search, such as:
"Lord * * Flies"
"George * Bush"
(cannot be used for part of a word, such as "George Bu*")
To search within a site, use site: ie examinations site:www.stanford.edu
To find out who links to a site, use link:www.yoursite.com
Title search allintitle:googlerank , with the only variation allowed is the minus symbol, i.e. allintitle:googlerank -This tutorial
URL search is the same, with minus symbols allowed: allinurl:yahoo -shopping
File Type - to search for specific file types, use filetype:pdf or filetype:txt etc after your keyword(s).
Results:
Google ignores capitalization. Any page larger than 101k will be listed as 101k (Google only indexes the first 100k of any page).

Google indexes Acrobat (pdf), Microsoft Word (doc), Rich Text (rtf), Text (txt), Excel (xls), Postscript (ps) and PowerPoint (ppt) files.
Fresh: 3 million sites considered important/very relevant are indexed daily
Google’s Cache: a copy of each page is stored in Google’s cache, as it was when it was last indexed. For web sites with daily-updated content (i.e. news sites) the current contents might be irrelevant. The page is rated on when it was cached.
Special Features:
Google has a long list of special features (Usa phone listing, stock quotes, Dictionary, Spell checker….), almost unknown to common users. The list of these features is here (the link will take directly to specific Google page)  More special features...
Google limitations:
Google ignores capitalization (this makes easy the most of users’ queries, but it could penalize researches that require capitalization (i.e. a movie or song title)
Google does not yet index
Flash content
I have heard a lot about "GOOGLEPLEX"...what exactly is it?...
Nothing more complicated than Google company's offices. We have also a picture below...
googleplex
Google's world headquarters building is located in Mountain View, California, a stone's throw from the Shoreline Regional Park wetlands
(source: http://www.google.com/corporate/culture.html)
While Google is a play around the word "Googol" (which is a number equal to 1.0 × 10100) , Googleplex is a play around "Googolplex" (which is a number equal to 10 x 10100)
The controversial March 2005 "20050071741 Us Patent file".
Acclaimed by most as the definitive Google ranking secrets vault, it's more likely to be a spam prevention method.
Read it on "The Definitive Google Ranking Strategy Guide"

  1. Start page
  2. Disclaimer / Intro to This tutorial
  3. How Google works
    General Overview - features
    Google's Spam Prevention
    Google SandBox
  4. Analysis
    Analyze yourself/your enemies
    Choose your keywords
    Market and keyword study
  5. Site Structure
    Words in U.r.l.
    Graphical view
    Explaination
    Rich Content Pages
  6. This tutorial Goodies
    Glossary
    Seo Equipment and skills

The Definitive Google Ranking Strategy Guide - Copyright 2005 Googlerank.com