Parallelization of backward deleted distance calculation in graph based features using Hadoop

dc.contributor.authorPillamari, Jayachandran
dc.date.accessioned2013-04-01T14:45:23Z
dc.date.available2013-04-01T14:45:23Z
dc.date.graduationmonthMay
dc.date.issued2013-04-01
dc.date.published2013
dc.description.abstractThe current project presents an approach to parallelize the calculation of Backward Deleted Distance (BDD) in Graph Based Features (GBF) computation using Hadoop. In this project the issues concerned with the calculation of BDD are identified and parallel computing technologies like Hadoop are applied to solve them. The project introduces a new algorithm to parallelize the APSP problem in BDD calculation using Hadoop Map Reduce feature. The project is implemented in Java and Hadoop technologies. The aim of this project is to parallelize the calculation of BDD thereby reducing GBF computation time. The process of BDD calculation is examined to identify the key places where it could be parallelized. Since the BDD calculation involves calculating the shortest paths between all pairs of given users, it can viewed as All Pairs Shortest Path (APSP) problem. The internal structure and implementation of Hadoop Map-Reduce framework is studied and applied to the process of APSP problem. The GBF features are one of the features set used in the Ontology classifiers. In the current project, GBF features are used to predict the friendship relationship between the users whose direct link is deleted. The computation involves calculating BDD between all pairs of users. The BDD for a user pair represents the shortest path between them when their direct link is deleted. In real terms, it is the shortest distance between them other than the direct path. The project uses train and test data sets consisting of positive instances and negative instances. The positive instances consist of user pairs having a friendship link between them whereas the negative instances do not have any direct link between them. Apache Hadoop is a latest emerging technology in the market introduced for scalable, distributed computing across clusters of computers. It has a Map Reduce framework used for developing applications which process large amounts of data in parallel on large clusters. The project is developed and implemented successfully and has the best time complexity. The project is tested for its reliability and performance. Different data sets are used in this testing by considering various factors and typical graph representations. The test results were analyzed to predict the behavior of the system. The test results show that the system has best speedup and considerably decreased the processing time from 10 hours to 20 minutes which is rewarding.
dc.description.advisorDaniel A. Andresen
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Computing & Information Sciences
dc.description.levelMasters
dc.identifier.urihttp://hdl.handle.net/2097/15433
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectParallelization of APSP
dc.subjectHadoop
dc.subjectMap Reduce
dc.subject.umiComputer Science (0984)
dc.subject.umiInformation Technology (0489)
dc.titleParallelization of backward deleted distance calculation in graph based features using Hadoop
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
JayachandranPillamari2013.pdf
Size:
1.44 MB
Format:
Adobe Portable Document Format
Description:
Masters Report

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: