Analysis of PageRank on Wikipedia

Tadakamala, Anirudh

Analysis of PageRank on Wikipedia

Files

AnirudhTadakamala2014.pdf (970.25 KB)

Date

2014-04-28

Authors

Tadakamala, Anirudh

Publisher

Kansas State University

Abstract

With massive explosion of data in recent times and people depending more and more on search engines to get all kinds of information they want, it has becoming increasingly difficult for the search engines to produce most relevant data to the users. PageRank is one algorithm that has revolutionized the way search engines work. It was developed by Google`s Larry Page and Sergey Brin. It was developed by Google to rank websites and display them in order of ranking in its search engine results. PageRank is a link analysis algorithm that assigns a weight to each document in a corpus and measures the relative importance within the corpus. The purpose of my project is to extract all the English Wikipedia data using MediaWiki API and JWPL(Java Wikipedia Library), build PageRank Algorithm and analyze its performance on the this data set. Since the data set is too big to run in a single node Hadoop cluster, the analysis is done in a high computation cluster called Beocat, provided by Kansas State University, Computing and Information Sciences Department.

Keywords

Hadoop, PageRank, MapReduce

Graduation Month

May

Degree

Master of Science

Department

Department of Computing and Information Science

Major Professor

Daniel A. Andresen

Date

2014

Type

Report

URI

http://hdl.handle.net/2097/17609

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -

Full item page

Analysis of PageRank on Wikipedia

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Graduation Month

Degree

Department

Major Professor

Date

Type

Citation

URI

Collections