A framework for automatic optimization of MapReduce programs based on job parameter configurations.

dc.contributor.authorLakkimsetti, Praveen Kumar
dc.date.accessioned2011-08-11T20:23:42Z
dc.date.available2011-08-11T20:23:42Z
dc.date.graduationmonthAugust
dc.date.issued2011-08-11
dc.date.published2011
dc.description.abstractRecently, cost-effective and timely processing of large datasets has been playing an important role in the success of many enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: first, the emergence of cloud computing, which provides transparent access to a large number of processing, storage and networking resources; and second, the development of the MapReduce programming model, which provides a high-level abstraction for data-intensive computing. MapReduce has been widely used for large-scale data analysis in the Cloud [5]. The system is well recognized for its elastic scalability and fine-grained fault tolerance. However, even to run a single program in a MapReduce framework, a number of tuning parameters have to be set by users or system administrators to increase the efficiency of the program. Users often run into performance problems because they are unaware of how to set these parameters, or because they don't even know that these parameters exist. With MapReduce being a relatively new technology, it is not easy to find qualified administrators [4]. The major objective of this project is to provide a framework that optimizes MapReduce programs that run on large datasets. This is done by executing the MapReduce program on a part of the dataset using stored parameter combinations and setting the program with the most efficient combination and this modified program can be executed over the different datasets. We know that many MapReduce programs are used over and over again in applications like daily weather analysis, log analysis, daily report generation etc. So, once the parameter combination is set, it can be used on a number of data sets efficiently. This feature can go a long way towards improving the productivity of users who lack the skills to optimize programs themselves due to lack of familiarity with MapReduce or with the data being processed.
dc.description.advisorMitchell L. Neilsen
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Computing and Information Sciences
dc.description.levelMasters
dc.identifier.urihttp://hdl.handle.net/2097/12011
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectHadoop mapreduce
dc.subjectOptimization
dc.subjectPerformance
dc.subjectParallel processing
dc.subjectJob configuration parameters
dc.subjectDistributed computing
dc.subject.umiComputer Science (0984)
dc.titleA framework for automatic optimization of MapReduce programs based on job parameter configurations.
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PraveenKumarLakkimsetti2011.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: