Implementing a Lambda Architecture to perform real-time updates

dc.contributor.authorGudipati, Pramod Kumar
dc.date.accessioned2016-11-17T21:55:47Z
dc.date.available2016-11-17T21:55:47Z
dc.date.graduationmonthDecember
dc.date.issued2016-12-01
dc.description.abstractThe Lambda Architecture is the new paradigm for big data, that helps in data processing with a balance on throughput, latency and fault-tolerance. There exists no single tool that provides a complete solution in terms of better accuracy, low latency and high throughput. This initiated the idea to use a set of tools and techniques to build a complete big data system. The Lambda Architecture defines a set of layers to fit in a set of tools and techniques rightly for building a complete big data system: Speed Layer, Serving Layer, Batch Layer. Each layer satisfies a set of properties and builds upon the functionality provided by the layers beneath it. The Batch layer is the place where the master dataset is stored, which is an immutable and append-only set of raw data. Also, batch layer pre-computes results using a distributed processing system like Hadoop, Apache Spark that can handle large quantities of data. The Speed Layer captures new data coming in real time and processes it. The Serving Layer contains a parallel processing query engine, which takes results from both Batch and Speed layers and responds to queries in real time with low latency. Stack Overflow is a Question & Answer forum with a huge user community, millions of posts with a rapid growth over the years. This project demonstrates The Lambda Architecture by constructing a data pipeline, to add a new “Recommended Questions” section in Stack Overflow user profile and update the questions suggested in real time. Also, various statistics such as trending tags, user performance numbers such as UpVotes, DownVotes are shown in user dashboard by querying through batch processing layer.
dc.description.advisorWilliam H. Hsu
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Computing and Information Sciences
dc.description.levelMasters
dc.identifier.urihttp://hdl.handle.net/2097/34517
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectBig Data
dc.subjectStack overflow
dc.subjectcomputer scienceLambda Architecture
dc.titleImplementing a Lambda Architecture to perform real-time updates
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PramodKumarGudipati2016.pdf
Size:
2.75 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: