A framework for automatic optimization of MapReduce programs based on job parameter configurations.

dc.contributor.authorLakkimsetti, Praveen Kumar
dc.date.accessioned2011-08-11T20:23:42Z
dc.date.available2011-08-11T20:23:42Z
dc.date.graduationmonthAugusten_US
dc.date.issued2011-08-11
dc.date.published2011en_US
dc.description.abstractRecently, cost-effective and timely processing of large datasets has been playing an important role in the success of many enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: first, the emergence of cloud computing, which provides transparent access to a large number of processing, storage and networking resources; and second, the development of the MapReduce programming model, which provides a high-level abstraction for data-intensive computing. MapReduce has been widely used for large-scale data analysis in the Cloud [5]. The system is well recognized for its elastic scalability and fine-grained fault tolerance. However, even to run a single program in a MapReduce framework, a number of tuning parameters have to be set by users or system administrators to increase the efficiency of the program. Users often run into performance problems because they are unaware of how to set these parameters, or because they don't even know that these parameters exist. With MapReduce being a relatively new technology, it is not easy to find qualified administrators [4]. The major objective of this project is to provide a framework that optimizes MapReduce programs that run on large datasets. This is done by executing the MapReduce program on a part of the dataset using stored parameter combinations and setting the program with the most efficient combination and this modified program can be executed over the different datasets. We know that many MapReduce programs are used over and over again in applications like daily weather analysis, log analysis, daily report generation etc. So, once the parameter combination is set, it can be used on a number of data sets efficiently. This feature can go a long way towards improving the productivity of users who lack the skills to optimize programs themselves due to lack of familiarity with MapReduce or with the data being processed.en_US
dc.description.advisorMitchell L. Neilsenen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computing and Information Sciencesen_US
dc.description.levelMastersen_US
dc.identifier.urihttp://hdl.handle.net/2097/12011
dc.language.isoen_USen_US
dc.publisherKansas State Universityen
dc.subjectHadoop mapreduceen_US
dc.subjectOptimizationen_US
dc.subjectPerformanceen_US
dc.subjectParallel processingen_US
dc.subjectJob configuration parametersen_US
dc.subjectDistributed computingen_US
dc.subject.umiComputer Science (0984)en_US
dc.titleA framework for automatic optimization of MapReduce programs based on job parameter configurations.en_US
dc.typeReporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PraveenKumarLakkimsetti2011.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: