It all sounds Geek to me! [Gautam Arora]

Thursday, April 28, 2005

The Mystery of Google's PageRank Algorithm

Chris Ridings,(owner of http://www.searchenginesystems.net/), a custom search engine developer wrote a paper about the enigmatic Google PR. The following contains parts of his original paper ( PageRank Explained or Everything you’ve always wanted to know about PageRank” v1.1, 9 Nov,2001) :

PageRank is Google's method of measuring a page's "importance." When all other factors such as Title tag and keywords are taken into account, Google uses PageRank to adjust results so that sites that are more "important" will move up in the results page of a user's search accordingly.

That is, the order of ranking in Google works like this:

1. Find all pages matching the keywords of the search.
2. Rank accordingly using "on the page factors" such as keywords.
3. Calculate in the inbound anchor text.
4. Adjust the results by PageRank scores.


A synergy of many factors is responsible for being listed on the top search engines like Yahoo,Google and PR is one of them.

A few points to note before we move on:

1. PageRank is a number that assesses solely the voting ability of all incoming links to a page, and how much they recommend that page.
2. Every unique page of a site that is indexed in Google has a PageRank. People often, mistakenly, think of the PageRank of a site being the PageRank of that site’s home page.
3. Internal site links do count in passing PageRank to other pages of the site.
4. PageRank stands on its own; It's not tied in with the anchor text (titling) of links, etc. Sure, they’re related, but saying they’re the same thing is like saying Title tags are the same as keywords in text.


The closest we can get to knowing a pages PR is by using the Google Toolbar (but its not completely accurate, dont expect Google to give out the real PR ! )
Also that the PR calculation is not a linear equation. Say, a jump from PR3 to PR4 is not as BIG as that from PR4 to PR5 (The closer you get to the peak the tougher it is to climb higher!)

The formula for PR calculation is:

* PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where,

* PR(A) is the PageRank of Page A
* D is a dampening factor (approx. 0.85)
* PR(T1) is the PageRank of a site pointing to Page A
* C(T1) is the number of links off that page
* PR(Tn)/C(Tn) means we do that for each page pointing toPage A

A crucial aspect of the PR algo is :
A PR of a page is a measure of its vote, which it can split between the links. Simply stated, if PageA has a PR1 then it can provide:

* 1 link-out of PR 0.85
* 2 link-outs of PR 0.425
* 3 link-outs of PR 0.283

Note: d=0.85

Till here everything seems to be moving smoothly till the author introduces parallel concept, MiniRank (" This should help us to better understand it. We’ll call it MiniRank. ")

The author then presents 2 iterations for 4 webpages, and the MiniRank calculation for each and detailed analysis for PageRank Feedback.

The extent of similarity between MR and PR is never discussed. Following are the issues raised by Ian Rogers:
1. Equation of PR calculation is altered:

* PR(A) =PR(A’) + (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

which will never converge to setteled values and spiral ever upwards.
2.Erroneous analysis of Feedback Loops

The author also covers PR analysis for:
* Links to your site
* Links out from your site
* Internal structure and Linkages

The paper might not have completely solved the PR mystery, but its a great start and a must read too along with The Original 'Google PR' Paper by Sergey Brin and Lawrence Page (Google's Founders)

No comments: