Parallelization of backward deleted distance calculation in graph based features using Hadoop

Pillamari, Jayachandran

Parallelization of backward deleted distance calculation in graph based features using Hadoop

dc.contributor.author	Pillamari, Jayachandran
dc.date.accessioned	2013-04-01T14:45:23Z
dc.date.available	2013-04-01T14:45:23Z
dc.date.graduationmonth	May
dc.date.issued	2013-04-01
dc.date.published	2013
dc.description.abstract	The current project presents an approach to parallelize the calculation of Backward Deleted Distance (BDD) in Graph Based Features (GBF) computation using Hadoop. In this project the issues concerned with the calculation of BDD are identified and parallel computing technologies like Hadoop are applied to solve them. The project introduces a new algorithm to parallelize the APSP problem in BDD calculation using Hadoop Map Reduce feature. The project is implemented in Java and Hadoop technologies. The aim of this project is to parallelize the calculation of BDD thereby reducing GBF computation time. The process of BDD calculation is examined to identify the key places where it could be parallelized. Since the BDD calculation involves calculating the shortest paths between all pairs of given users, it can viewed as All Pairs Shortest Path (APSP) problem. The internal structure and implementation of Hadoop Map-Reduce framework is studied and applied to the process of APSP problem. The GBF features are one of the features set used in the Ontology classifiers. In the current project, GBF features are used to predict the friendship relationship between the users whose direct link is deleted. The computation involves calculating BDD between all pairs of users. The BDD for a user pair represents the shortest path between them when their direct link is deleted. In real terms, it is the shortest distance between them other than the direct path. The project uses train and test data sets consisting of positive instances and negative instances. The positive instances consist of user pairs having a friendship link between them whereas the negative instances do not have any direct link between them. Apache Hadoop is a latest emerging technology in the market introduced for scalable, distributed computing across clusters of computers. It has a Map Reduce framework used for developing applications which process large amounts of data in parallel on large clusters. The project is developed and implemented successfully and has the best time complexity. The project is tested for its reliability and performance. Different data sets are used in this testing by considering various factors and typical graph representations. The test results were analyzed to predict the behavior of the system. The test results show that the system has best speedup and considerably decreased the processing time from 10 hours to 20 minutes which is rewarding.
dc.description.advisor	Daniel A. Andresen
dc.description.degree	Master of Science
dc.description.department	Department of Computing & Information Sciences
dc.description.level	Masters
dc.identifier.uri	http://hdl.handle.net/2097/15433
dc.language.iso	en_US
dc.publisher	Kansas State University
dc.rights	© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Parallelization of APSP
dc.subject	Hadoop
dc.subject	Map Reduce
dc.subject.umi	Computer Science (0984)
dc.subject.umi	Information Technology (0489)
dc.title	Parallelization of backward deleted distance calculation in graph based features using Hadoop
dc.type	Report

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JayachandranPillamari2013.pdf
Size:: 1.44 MB
Format:: Adobe Portable Document Format
Description:: Masters Report

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -