Login


Get Google PageRank Programatically

By Jonathan Wood on 12/18/2010 (Updated on 4/30/2012)
Language: C#
Technology: WinForms
Platform: Windows
License: CPOL
Views: 28,787
Web Development » Web Services and APIs » General » Get Google PageRank Programatically

Screenshot of Demo Program

Download Source Code Download Source Code

Introduction

If you've ever installed the Google Toolbar, you've probably seen those little green bars that rate each web page from 0 to 10. These bars display Google's PageRank for a particular URL.

PageRank is an internal metric Google maintains that indicates how popular Google thinks a web page is. The value is calculated by how many websites link to that page, and how popular those websites are.

While PageRank is an internal metric that is being recalculated all the time, Google also maintains a public PageRank, which is updated from the internal PageRank several times a year. The Google Toolbar uses the public PageRank to show those web page rankings.

The Importance of PageRank

PageRank is one metric that determines what order the search engine will display results. The order of search results has become a hot topic. A website can make a lot of money if it shows up on the first page of Google's search results. And it can make far less if it appears on page two or three. As a result, many webmasters have become quite interested in PageRank. And an entire industry has emerged that includes selling and exchanging link to increase PageRank.

Since Google's goal is to rank sites according to which sites the user is most likely to be looking for, selling and exchanging links is a bad thing in their eyes. Google has tried to downplay the importance of PageRank, and points out that PageRank is only one of over 200 metrics used to determine ranking. Google encourages webmasters to just produce good content if they want to improve their ranking on Google.

But the cat is out of the bag now and PageRank remains an important and very public rating of a website. Many advertisers determine how much they will pay to show an ad on a site according to that site's PageRank. For right or wrong, PageRank is still considered fairly important in the search-engine-optimization (SEO) industry.

Accessing PageRank Programatically

Okay, so after that little introduction, we can start to talk about how you can find out the PageRank of a site. If you don't have the Google Toolbar, there are still many sites on the web that will tell you how your site measures up. In addition (and the point of this article), you can also obtain a site's PageRank programatically straight from Google.

Listing 1 shows my GooglePageRank class. This class only has one public method, GetPageRank(), which is static. This method returns the PageRank for the specified URL. The return value is 0 through 10, or -1 if the URL is not in Google's index or an error occurred.

Listing 1: The GooglePageRank Class

using System;
using System.IO;
using System.Net;
using System.Text;

public class GooglePageRank
{
    /// <summary>
    /// Returns the PageRank of the given URL. Return values are 0 through 10 or
    /// -1 (N/A), which indicates there was an error or the URL is not in the
    /// Google index.
    /// </summary>
    /// <param name="url">URL to test</param>
    /// <returns></returns>
    public static int GetPageRank(string url)
    {
        int rank = -1;

        try
        {
            // Form complete URL
            url = String.Format("http://toolbarqueries.google.com/tbr" +
                "?client=navclient-auto&features=Rank&ch={0}&q=info:{1}",
                ComputeHash(url), UrlEncode(url));

            // Download page
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            StreamReader stream = new StreamReader(request.GetResponse().GetResponseStream());
            string response = stream.ReadToEnd();

            // Parse page rank value
            string[] arr = response.Split(':');
            if (arr.Length == 3)
                rank = int.Parse(arr[2]);
        }
        catch (Exception)
        {
            // Do nothing but return -1;
        }
        return rank;
    }

    /// <summary>
    /// URL-encodes the given URL. Handy when HttpUtility is not available
    /// </summary>
    /// <param name="url">URL to encode</param>
    /// <returns></returns>
    private static string UrlEncode(string url)
    {
        StringBuilder builder = new StringBuilder();

        foreach (char c in url)
        {
            if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'))
                builder.Append(c);
            else if (c == ' ')
                builder.Append('+');
            else if ("()*-_.!".IndexOf(c) >= 0)
                builder.Append(c);
            else
                builder.AppendFormat("%{0:X2}", (byte)c);
        }
        return builder.ToString();
    }

    /// <summary>
    /// Computes a hash value required by Google
    /// </summary>
    private static string ComputeHash(string url)
    {
        UInt32 a, b;
        UInt32 c = 0xE6359A60;
        int k = 0;
        int len;

        // Modify URL
        url = string.Format("info:{0}", url);

        a = b = 0x9E3779B9;
        len = url.Length;

        while (len >= 12)
        {
            a += (UInt32)(url[k + 0] + (url[k + 1] << 8) + (url[k + 2] << 16) + (url[k + 3] << 24));
            b += (UInt32)(url[k + 4] + (url[k + 5] << 8) + (url[k + 6] << 16) + (url[k + 7] << 24));
            c += (UInt32)(url[k + 8] + (url[k + 9] << 8) + (url[k + 10] << 16) + (url[k + 11] << 24));
            Mix(ref a, ref b, ref c);
            k += 12;
            len -= 12;
        }

        c += (UInt32)url.Length;
        switch (len)
        {
            case 11:
                c += (UInt32)(url[k + 10] << 24);
                goto case 10;
            case 10:
                c += (UInt32)(url[k + 9] << 16);
                goto case 9;
            case 9:
                c += (UInt32)(url[k + 8] << 8);
                goto case 8;
            case 8:
                b += (UInt32)(url[k + 7] << 24);
                goto case 7;
            case 7:
                b += (UInt32)(url[k + 6] << 16);
                goto case 6;
            case 6:
                b += (UInt32)(url[k + 5] << 8);
                goto case 5;
            case 5:
                b += (UInt32)(url[k + 4]);
                goto case 4;
            case 4:
                a += (UInt32)(url[k + 3] << 24);
                goto case 3;
            case 3:
                a += (UInt32)(url[k + 2] << 16);
                goto case 2;
            case 2:
                a += (UInt32)(url[k + 1] << 8);
                goto case 1;
            case 1:
                a += (UInt32)(url[k + 0]);
                break;
            default:
                break;
        }
        Mix(ref a, ref b, ref c);
        return string.Format("6{0}", c);
    }

    /// <summary>
    /// ComputeHash() helper method
    /// </summary>
    private static void Mix(ref UInt32 a, ref UInt32 b, ref UInt32 c)
    {
        a -= b; a -= c; a ^= c >> 13;
        b -= c; b -= a; b ^= a << 8;
        c -= a; c -= b; c ^= b >> 13;
        a -= b; a -= c; a ^= c >> 12;
        b -= c; b -= a; b ^= a << 16;
        c -= a; c -= b; c ^= b >> 5;
        a -= b; a -= c; a ^= c >> 3;
        b -= c; b -= a; b ^= a << 10;
        c -= a; c -= b; c ^= b >> 15;
    }
}

This GetPageRank() method starts by constructing a URL that accesses Google's toolbarqueries.google.com domain. It must pass a number of arguments including the URL being tested and a numeric hash value based on the URL. The hashing algorthm used is called Minimal Perfect Hashing and was developed by Bob Jenkins (who is not affiliated with Google).

The method then downloads the results and parses out the PageRank value.

I should point out that the class contains the UrlEncode() method. If you are writing an ASP.NET application, you can use the method of the same name in HttpUtility. However, a WinForms application won't normally include the libraries for this. You could include them but adding the method here just seemed to make it easier to use the class in a WinForms application.

Using the Code

Using the code is very easy. Just call GetPageRank() with the URL of a web page, and it returns the PageRank value for that page. This function may take a few seconds or even longer to execute because it needs to request and download information from one of Google's websites.

Conclusion

I feel the need to include a disclaimer here: This article is for informational purposes only. Directly accessing the PageRank this way violates Google's Terms of Service for pretty much all their products and services. If you run one or more websites that you want to remain in good standing with Google, actually running this code is not recommended.

That's all there is to it. The demo program associated with this article is a desktop application that uses the class to report the PageRank for any URL you enter.

Update History

4/30/2012: The URL to obtain PageRank data from Google has changed and I updated the code to work with the new URL.

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Code Project Open License.

Author Information

Jonathan Wood

I'm a software/website developer working out of the greater Salt Lake City area in Utah. I've developed many websites including Black Belt Coder, Insider Articles, and others.