It all sounds Geek to me! [Gautam Arora]

Sunday, April 24, 2005

Web Mining and Google's PageRank

Web mining is associated with finding interesting content(or patterns) on the WWW. This concept is based on the Data Mining of large Datawarehouses.

The issues involved with Web Mining include:
  • Lots of distributed data
  • Volatile data
  • Unstructured and redundant data
  • Problems with quality of data
  • Hetrogeneous data

But the advantages, in comparison are:
  • Structural framefork provided my HTML
  • Link structure of the web

The web mining taxonomy is:
  • Web Content Mining
    • Web Page Content Mining
    • Search Result Mining
  • Web Structure Mining
  • Web Usage Mining
    • General Access Pattern Tracking
    • Customized Usage Tracking

Keeping our focus on 'Web Structure Mining', which mines the structure(links,graph) of the web and uses the techniques, PageRank and CLEVER.

PageRank is Google's "original" algorithm and the reason for its success as the most powerful search engine today and for years to come. ( Try YaGoohoo!gle)

Its the technique to prioritize pages returned from search. The importance of a page is calculated based on number of pages which point to it i.e. Backlinks. Weighting is used to provide more importance to backlinks coming from important pages.
The formula used for calculating a PageRank can be stated as:
  • PR(p)=c(PR(1)/N1+...+PR(n)/Nn)

There are concerns that Google's PageRanking may not be comprehensively updated these days as Bloggers "mess things up" :)

Google's PageRankExplained is a must read.

6 comments:

Vikram said...

Yea bloggers linking to links with thousands of blogs around might mess search results up.

By the way did you know it was possible to hijack search priorities using a google hack ages ago.

Cant recall how it is done or if it still can be done though.

Anonymous said...

I must be missing something with the whole pagerank thing. I mean, if I go after a keyword, and I get it to the top position in the search engines, why should I care at all about the pr that is going to show in the google bar. It shouldn't matter at all, to anyone. After all, the whole point is to grab those top positions, and you can do that just fine without having high google pr.

Increased Pagerank said...

Hi Gautam Arora, I found your blog while searching for the latest blog news covering Pagerank, and although this post isn't a perfect match to what I am looking for, I certainly like the look and feel of your blog.

PageRank said...

Hey Gautam Arora, Your this post message is well received. I am just out searching for information on Page Rank and related and ended up on your blog. Although I'm not an avid "blogger", I have decided to save yours and come back since the information provided has substance.

Gautam Arora said...

>>After all, the whole point is to grab >>those top positions, and you can do >>that just fine without having high >>google pr

To get to the top position for a keyword in Google, your website's PR does matter...

The Google toolbar is just there to show you the approximate PR...it doesnt affect your PR, its just an indicator for the user...

Pagerank said...

Page Rank is the information that I was originally looking for, I do admit though, that this post does in fact hold my interest just as well.... Thanks for the great post!