Page Path: googlerank.com / ranking / Ebook / google-spam-prevention.html
Google Spam Prevention - patent 20050071741

Topics: How Google prevents its index from potentially spammer sites . The controversial "Information retrieval based on historical data" - United States Patent Application 20050071741 - March 31, 2005

See also: The Google Sandbox theory

Information retrieval based on historical data" - United States Patent Application 20050071741
Preliminary statements:
1. We know, and you know, that when a big company (whose copyrighted unique technology is studied, analyzed, debated) files a Patent, it is mainly for misleading competitors rather than revealing its secrets.
You don't need to place a copyright on something that nobody can really understand, copy and steal. On something like Google Algorythm, in example (with exception of Pagerank, which has been patented - see Pagerank section).
Therefore: don't take this Patent (made public on March 2005) as the greatest Google revelation ever.
2. This Patent seems to describe a stereotyped, predictable, standard spamming activity, then the way Google robots are set up to prevent it.
Many parts of this patent reveal its misleading intentions rather than true technical issues. It all seems to be realistic and logical. Turn on your brain, read it twice and you'll feel a strange smell of fried air all around...
3. What is described in the patent looks like a "method" rather than a new Google's algorythm add-on. As the patent's title implies, it's about a method to retrieve informations filtering them on an historical basis. "Google" name NEVER appears in patent contents. Never.
Since this is a Google Ranking strategy guide we thought writing about this Patent was our duty; whether this patent really describes something that is actually implemented in Google's algo or not (we believe it's not implemented), you can use this patent to refine your your website features (i.e. by adding fresh contents on a regular basis), and strategies (i.e. by looking for qualified inbound links and forgetting link farms forever)
4. On the Net you can find articles about this Patent that are more accurate and detailed that the one you're reading now. Especially on websites whose owner is convinced that 20050071741 doc is actually a revealed part of the algorythm.
If you want to read other opinions about it (and we expect you do) try to make a search with Google

Read original United States Patent Application 20050071741 Document.
What follows is a resume of the most interesting contents we have extracted from it.

Quote from patent: "A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."
And here's how Google (oops! the authors of the patented document) may score a document:
a. Inbound Links
Quality, number and text factors of links are taken into account, that's not a news. The news is the history data used for calculation: how often links to your page are updated? Which is the anchor text for those links? A bunch of links with the same anchor text may indicate spam activity. Many links with different anchor text may indicate fresh and interesting contents. Historical data include also the time your website takes to get inbound links: too many inbound links in a too short period of time may indicate spam activity.

b. Contents
Relevant and fresh contents are preferred. Content scoring is also related to users' behaviour (see f. point below), because even a stale content may be still useful (i.e. a biography).
Significant changes in contents are taken into account. The Date a page is updated is recorded.
Websites contents should be updated and increased on a regular basis, just like an "organic growth". Too much pages added in a short period may be a spam indication. (same for links, see point a.)


c. Domain Names
It seems spammers registers domains for just one year, often providing false admin and contact details. Domains registered for more that one year may get a higher score.
Patent does not refer to any historical data related to domain age: it seems to focus on domain expiration rather than domain creation. Therefore if your domain is 5 years old, and this would mean you're serious about it, you could even be penalized because you renew it yearly.


d. CTR (click-thru rate)
CTR monitoring and recording is to check out which kind of content is preferred in a certain period of time (seasonal, fashion or trend related). In example, a ski website receive more clicks in Winter, and so on.


e. Traffic
Traffic factors such as how much time visitors spend on your page are taken into account


f. Users' behaviour
Duration of the visits, bookmarks.
Quote: "Information relating to how often the document is selected when the document is included in a set of search results". This mean that there's a historic variable that calculates a ratio between the number of times your page appears into the results and the number of times users click on it.

To make it short, this Patent assigns a higher score to websites that are developed gradually, in a natural way, with a day-by-day marketing, with a regular growth of contents. C.t.r. and User's behaviours monitoring are applied to make sure you really work on your contents to make them interesting, fresh, useful.

--------------------------------------------------------------------------------

Google's actual Spam indicators

While Information retrieval based on historical data is (or may be) a great method to leave spammers out of the door, there are some indicators Google actually use to catch spammer sites.
1. Multiple domains with the same contents. Not uncommon to see many similar websites ranking high for certain keywords then disappear from index after a certain period of time (usually one month)
2. Google's ABUSE service. Based on users' input. Users can report a website they suspect it's spamming the search engine to Google abuse service. The website will be investigated (by a human) and, if it's caught on some spamming activities is banned.
3. Keyword Stuffing. Pages that present same keyword repeated over and over usually rank high for a short period of time then disappear from index. Keyword Stuffing infact is considered the most annoying spam practice.
4. Don't forget the so-called Sandbox effect (read about it on Google Sandbox effect)


  1. Start page
  2. Disclaimer / Intro to This tutorial
  3. How Google works
    General Overview - features
    Google's Spam Prevention
    Google SandBox
  4. Analysis
    Analyze yourself/your enemies
    Choose your keywords
    Market and keyword study
  5. Site Structure
    Words in U.r.l.
    Graphical view
    Explaination
    Rich Content Pages
  6. This tutorial Goodies
    Glossary
    Seo Equipment and skills

The Definitive Google Ranking Strategy Guide - Copyright 2005 Googlerank.com