Glossary
Agent : A
browser, or any other piece of software that can approach web servers
and browse their content. In example: Microsoft Internet Explorer,
Netscape, Search Engine Spiders
Algorithm :
The way in which the search engine is "tuned". An algorithm is the way
the search engine will determine ranks - it is the way the search
engine is programmed to determine ranks. An algorithm may take only
certain things into account - like keywords in the url or title tag, or
whatever.
Automatic Update :
When the spider returns to your pages at periodic intervals to check to
see if you have made any changes.
Boolean Search :
A search allowing the inclusion or exclusion of documents containing
certain words through the use of operators such as AND, NOT and OR.
Cloaking : A
method to deliver different content to different agents. Used to send
optimized pages to specific search engines. A very dangerous tool, that
can cause a spam flag by the engines (They hate it)
Clustering :
Listing of one page from each website within a search engine or
directory. This avoids occupation of all the top results by a small
number of web sites and makes the list of results clearer and more
useful to the user.
Crawler :
(also, Spider) A crawler goes to your site and finds (=crawls) your
pages. It then stores those pages in a database for future retrieval by
the search engine.
Cross-Linking
: linkage between different websites/domains that share the same third
block of the IP (xxx.xxx.YYY.xxx) . This means they are hosted in same
ISP, so Google assumes this as a single webmaster that links to
himself. This results in minor value to outbound and inbound links.
Sometimes can cause strong penalization to the whole IP class.
CSS: (acronym for Cascading Style
Sheet).
Check out CSS validator
Doorway
(page): also known as Gateway page or Keyword rich pages. A
search engine friendly web page generally optimized for a single
keyword whose purpose is to rank high into search engine result pages
and bring traffic to main site.
Dynamic Content :
Information on web pages, based on database content or user
information. Sometimes it is possible to spot that this technique is
being used, e.g. if the URL ends with .asp, .cfm, .cgi or .shtml.
Search engines currently index dynamic content in a similar fashion to
static content, with some limitations (i.e. just one variable after the
main url).
FFA (Free for
all) : also known as Link farm. A FFA is a webpage whiose
purpose is to allow users to add their links for free. Therefore, a FFa
page appears like a long list of thousands Url's, of any kind, with no
filter, no topic relevancy. Inbound links from a FFA are definitely
useless. In addiction if YOU link TO a FFA page you'll get your website
strongly penalized.
Font and Background Spoofs
: Various techniques used to place invisible text in a web page,
to improve positioning without affecting the appearance of the page.
These are mostly based on setting the font and background colors to the
same value. Most search engines now detect these tricks
Frames : A
technique for combining separate HTML documents within a single browser
screen. A framed web site might cause problems for search engines, and
may not be indexed correctly, yet some search engines do support framed
pages.
FTP: File Transfer Protocol. A
protocol specifically used to transfer files between server and local
machine (need Ftp client).
Gateway (page): see "Doorway page"
Googleplex: Google physical
headquartes building is located in Mountain View,
California (a play on the word googolplex)
Googlerank: (1) unofficial term
(Google NEVER use it) used by many people (and many websites) to
describe their website's position in Google results pages
Googlerank: (2) a cool website,
active since june 2002, dedicated to Google ranking tutorials and
software. see www.googlerank.com
HTTP :
HyperText Transfer Protocol - the (main) protocol used to communicate
between web servers and web browsers (clients).
Inbound Links :
A hypertext link to a particular page from elsewhere, bringing traffic
to that page. Inbound links are counted to produce a measure of the
page popularity.
Indexing :
When the search engine takes the pages from the database that the
spider has created and places them in an order based on the algorithms
of that engine. All search engines have a different indexing process -
due to different algorithms - that is why you
IP delivery :
technique to present different contents depending on the IP address of
the client
Javascript :
A simple scripting language used for small programming tasks within
HTML web pages. The scripts are normally interpreted (or run) on the
client computer by the web browser.
Keyword
Diluition : the opposite of keyword stuffing. Too many
(unrelated) keywords on a page weaken the relevance of most important
ones.
Keyword Stuffing :
The repeating of keywords and keyword phrases in META tags or elsewhere.
OOP (Over Optimization Penalty):
Since the Florida Update, a penalty has been assigned to those sites
that, even if not spamming, were too optimized. Google considers over
optimization a
negative factor when ranking a website.
Outbound Link :
A hypertext link from a particular page to elsewhere, bringing traffic
to that page. (i.e. from your website to another).
Page swapping :
a weird, deprecated, technique, considered spamming. It's the act of
submitting pages to search engines that are appealing to them and
getting high ranking positions for some keywords. Knowing that Google
Dance (see section) happens once per month, when those webmasters see
that their pages have been indexed they quickly substitute the same
pages with other pages with the same URL. Therefore Google users click
through after finding a high ranking page but reach pages that differ
from Google cached ones. When Google Dance is about to begin again,
webmasters put the old pages back again, and then substitute them again
and again...
(update): since Google update is now made continuosly, page swapping
technique has become impossible to act.
Proximity Search :
A search where users to specify that documents returned should have the
words near each other
Recall :
Related to precision, this is the degree in which a search engine
returns all the matching documents in a collection. There may be 100
matching documents, but a search engine may only find 80 of them. It
would then list these 80 and have a recall of 80%.
Robots : Any
browser program which follows hypertext links and accesses web pages
but is not directly under human control. Examples are the search engine
spiders, the "harvesting" programs which extract e-mail addresses and
other data from web pages and various inputs
Sandbox (Google's): relatively new
(and theorical) filter that appeared to be put into the algorythm in
March, 2004. The sandbox filter places all new websites in a
"quarantine" status; after a certain period, they're included with
other estabilished sites and get properly ranked (see Google Sandbox theory)
Search Query :
A string that contains one word or more, sent to a search engine by the
users.
Spamming
(Search Engines') : Anything done to defeat the main
search engine purpose (providing relevant results after a user's
inquiry) is considered search engine spamming. This includes , but not
only, keyword stuffing, ip/agent cloaking , hidden text, doorway pages
redirecting to main site, Page swapping.
Spider:
see "Crawler"
Stemming :
The ability for a search to include the "stem" of words. For example,
stemming allows a user to enter "swimming" and get back results also
for the stem word "swim." Stemming technology is used also by Google,
with some limitations.
Stop Words :
Conjunctions, prepositions and articles and other words such as AND, TO
and A that appear often in documents yet alone may contain little
meaning.
|
|