Topics: Google hardware architecture , servers, what
Google indexes , features and limitations, Google ranking principles
and tips, Googleplex
Before starting to show you WHAT
is Google, before demonstrating all optimization techniques you can
apply to your website, you must learn the THREE FUNDAMENTAL RULES of
Google ranking: If you want to have your website ranked high,
you must THINK like Google does. And how does Google think? Well, to
make it simple, Google acts like a PARANOID, LAZY, CURIOS,
SLOW-CONNECTED HUMAN would (check out the "random surfer
model", which is not limited to Pagerank technology but also to the
whole Google crawling philosophy)
so:
1. Server Speed:
must be as fast as it is
possible. Your website pages must be downloaded nearly at the speed of
light. Yes it is, Google gives more visibility to websites that are
resident on fast servers.
Weird but reasonable: Google puts authoritative sites higher in its
archives, and a webmaster that spends MONEY for a good server/hosting
must also own a website he wants to be important (at least he tries
to).
Google also prefers fast servers because it's supposed to give its
users links to relevant resources, fast to be accessed to. 'Nuff said,
if you are planning to get much quality traffic from Google, you must
wisely choose a very good hosting. If you are scared of spending too
much, remember how many customers can Google bring you for FREE.
:
DNS: you must choose those ISP that assign your website (even if
virtual hosted) a unique DNS. Usually virtually hosted sites
share their DNS with dozens of sites: this makes hard for a search
engine to
find your website (Google clearly has this
limitation). Anyway, lately
many providers have understood the importance of a unique DNS: unique
DNS option is now offered by most ISP's. Easy to say, you have to
choose those ones.
2. Site Updating:
Googlebot has the ability to check out WHEN your pages have been
uploaded to the server. You can take advantage from this feature rather
than being penalized (it definitely penalizes "old" pages, almost
regardless of their contents), by keeping your pages FRESH and updated.
Obviously it's impossible to update many pages each day, and it is also
impossible that you really have new contents each day, but here's the
big trick: YOU CAN JUST UPLOAD ALL PAGES OF YOUR WEBSITE TO
YOUR SERVER ON A DAILY BASIS, EVEN IF THEY HAVEN'T CHANGED.
Let Google know that your pages are published often.
3. Lots of light HTML
pages:
Google adores simple websites with hundreds of
pages. Even if it may sound strange, Google doesn't like LONG BORING
pages too much. Technical details of google crawling assume that
googlebot indexes up to 101KB of code, but be sure that it already
stops 'loving' your page very before those 101KB!
Here's the deal: produce HTML 4 pages, lighter than 50KB, keep them as
simple as you can. If you are building a page that (because of its
extensive contents) is going to be larger than 50K , split it in two or
three pages.
Remember:
MANY
PAGES, HTML format, LESS THAN 50KB (no more than 3 o 4 paragraphs).
This Strategy Guide will teach you HOW TO MAKE THOSE 50KB Pages (and
much more else).
Sit back and relax!
Enjoy.
Google has a very complex
hardware architecture . Its 8 data centers* handle almost 10.000
servers. Those 10.000 servers are something like "thin-clients", with
large EIDE harddisks and powerful LAN cards to perform high throughput
and satisfy multiple queries at once.
Google's Operative System is a modified version of well known Linux Red
Hat; web server is Apache in a particular version called GWS (Google
Web Server).
Easy to understand, this distribution-based architecture guarantees
power, low costs and most of all, flexibility. To enhance this system
Google has only to buy some more servers and get them working.
Google does not only index web pages, but Newsgroups'
posts, Images, News and also items like books, software etc.... File
types it recognizes are html, htm, and dynamic pages (asp, jsp....). It
indexes also Microsoft Word files (.doc), Rich text format (.rtf) ,
text files (.txt), Acrobat PDF (.pdf). Many people do not use it only
with the default "keyword AND keyword" search: knowing exactly how
to use Google and its many features is more than important when
optimizing your own website.
Default search:
All keywords as if connected by AND. If all the words occur as a
phrase, that page will rank higher.
Search options:
Understands Boolean + - and OR.
Cannot be nested using ()
Use the * (Asterisk) as a Wildcard in a Phrase Search,
such as:
"Lord * * Flies"
"George * Bush"
(cannot be used for part of a word, such as "George Bu*")
To search within a site, use site: ie examinations
site:www.stanford.edu
To find out who links to a site, use link:www.yoursite.com
Title
search allintitle:googlerank ,
with the only variation allowed is the minus symbol, i.e.
allintitle:googlerank -This tutorial
URL search is the
same, with minus symbols allowed: allinurl:yahoo -shopping
File
Type - to search for specific file types, use
filetype:pdf or filetype:txt etc after your keyword(s).
Results:
Google ignores capitalization. Any page larger than
101k will be listed as 101k (Google only indexes the first 100k of any
page).
Google indexes Acrobat (pdf),
Microsoft Word (doc), Rich
Text (rtf), Text (txt), Excel (xls), Postscript (ps) and PowerPoint (ppt) files.
Fresh: 3 million sites considered important/very
relevant are indexed daily
Google’s Cache: a copy of each
page is stored in Google’s cache, as it was when it was last indexed.
For web sites with daily-updated content (i.e. news sites) the current
contents might be irrelevant. The page is rated on when it was
cached.
Special Features:
Google has a long list of
special features (Usa phone listing, stock quotes, Dictionary, Spell
checker….), almost unknown to common users. The list of these features
is here (the link will take directly to specific Google page) More special features...
Google limitations:
Google ignores capitalization (this makes easy the most of users’ queries,
but it could penalize researches that require capitalization (i.e. a
movie or song title)
Google does not yet index Flash content
I have heard a lot
about "GOOGLEPLEX"...what exactly is it?...
Nothing more complicated than Google
company's offices. We have also a picture below...

Google's world headquarters building
is located in Mountain View, California, a stone's throw from the
Shoreline Regional Park wetlands
(source:
http://www.google.com/corporate/culture.html)
While Google is a play around
the word "Googol" (which is a number equal to 1.0 × 10100)
, Googleplex is a play around "Googolplex" (which is a number equal to
10 x 10100)
The controversial March 2005 "20050071741
Us Patent file".
Acclaimed by most as the definitive Google ranking
secrets vault, it's more likely to be a spam prevention method.
Read it on "The
Definitive Google Ranking Strategy Guide"
|
|