A framework for automatic optimization of MapReduce programs based on job parameter configurations.

Lakkimsetti, Praveen Kumar

A framework for automatic optimization of MapReduce programs based on job parameter configurations.

dc.contributor.author	Lakkimsetti, Praveen Kumar
dc.date.accessioned	2011-08-11T20:23:42Z
dc.date.available	2011-08-11T20:23:42Z
dc.date.graduationmonth	August
dc.date.issued	2011-08-11
dc.date.published	2011
dc.description.abstract	Recently, cost-effective and timely processing of large datasets has been playing an important role in the success of many enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: first, the emergence of cloud computing, which provides transparent access to a large number of processing, storage and networking resources; and second, the development of the MapReduce programming model, which provides a high-level abstraction for data-intensive computing. MapReduce has been widely used for large-scale data analysis in the Cloud [5]. The system is well recognized for its elastic scalability and fine-grained fault tolerance. However, even to run a single program in a MapReduce framework, a number of tuning parameters have to be set by users or system administrators to increase the efficiency of the program. Users often run into performance problems because they are unaware of how to set these parameters, or because they don't even know that these parameters exist. With MapReduce being a relatively new technology, it is not easy to find qualified administrators [4]. The major objective of this project is to provide a framework that optimizes MapReduce programs that run on large datasets. This is done by executing the MapReduce program on a part of the dataset using stored parameter combinations and setting the program with the most efficient combination and this modified program can be executed over the different datasets. We know that many MapReduce programs are used over and over again in applications like daily weather analysis, log analysis, daily report generation etc. So, once the parameter combination is set, it can be used on a number of data sets efficiently. This feature can go a long way towards improving the productivity of users who lack the skills to optimize programs themselves due to lack of familiarity with MapReduce or with the data being processed.
dc.description.advisor	Mitchell L. Neilsen
dc.description.degree	Master of Science
dc.description.department	Department of Computing and Information Sciences
dc.description.level	Masters
dc.identifier.uri	http://hdl.handle.net/2097/12011
dc.language.iso	en_US
dc.publisher	Kansas State University
dc.rights	© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Hadoop mapreduce
dc.subject	Optimization
dc.subject	Performance
dc.subject	Parallel processing
dc.subject	Job configuration parameters
dc.subject	Distributed computing
dc.subject.umi	Computer Science (0984)
dc.title	A framework for automatic optimization of MapReduce programs based on job parameter configurations.
dc.type	Report

Files

Original bundle

Now showing 1 - 1 of 1

Name:: PraveenKumarLakkimsetti2011.pdf
Size:: 1.33 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -