Analysis of PageRank on Wikipedia

dc.contributor.authorTadakamala, Anirudh
dc.date.accessioned2014-04-28T14:51:25Z
dc.date.available2014-04-28T14:51:25Z
dc.date.graduationmonthMay
dc.date.issued2014-04-28
dc.date.published2014
dc.description.abstractWith massive explosion of data in recent times and people depending more and more on search engines to get all kinds of information they want, it has becoming increasingly difficult for the search engines to produce most relevant data to the users. PageRank is one algorithm that has revolutionized the way search engines work. It was developed by Google`s Larry Page and Sergey Brin. It was developed by Google to rank websites and display them in order of ranking in its search engine results. PageRank is a link analysis algorithm that assigns a weight to each document in a corpus and measures the relative importance within the corpus. The purpose of my project is to extract all the English Wikipedia data using MediaWiki API and JWPL(Java Wikipedia Library), build PageRank Algorithm and analyze its performance on the this data set. Since the data set is too big to run in a single node Hadoop cluster, the analysis is done in a high computation cluster called Beocat, provided by Kansas State University, Computing and Information Sciences Department.
dc.description.advisorDaniel A. Andresen
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Computing and Information Science
dc.description.levelMasters
dc.identifier.urihttp://hdl.handle.net/2097/17609
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectHadoop
dc.subjectPageRank
dc.subjectMapReduce
dc.subject.umiComputer Science (0984)
dc.titleAnalysis of PageRank on Wikipedia
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AnirudhTadakamala2014.pdf
Size:
970.25 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: