An application of topic modeling algorithms to text analytics in business intelligence

dc.contributor.authorAlsadhan, Majed
dc.date.accessioned2014-04-25T21:51:19Z
dc.date.available2014-04-25T21:51:19Z
dc.date.graduationmonthMay
dc.date.issued2014-04-25
dc.date.published2014
dc.description.abstractIn this work, we focus on the task of clustering businesses in the state of Kansas based on the content of their websites and their business listing information. Our goal is to cluster the businesses and overcome the challenges facing current approaches such as: data noise, low number of clustered businesses, and lack of evaluation approach. We propose an LSA-based approach to analyze the businesses’ data and cluster those businesses by using Bisecting K-Means algorithm. In this approach, we analyze the businesses’ data by using LSA and produce businesses’ representations in a reduced space. We then use the businesses’ representations to cluster the businesses by applying the Bisecting K-Means algorithm. We also apply an existing LDA-based approach to cluster the businesses and compare the results with our proposed LSA-based approach at the end. In this work, we evaluate the results by using a human-expert-based evaluation procedure. At the end, we visualize the clusters produced in this work by using Google Earth and Tableau. According to our evaluation procedure, the LDA-based approach performed slightly bet- ter then the LSA-based approach. However, with the LDA-based approach, there were some limitations which are: low number of clustered businesses, and not being able to produce a hierarchical tree for the clusters. With the LSA-based approach, we were able to cluster all the businesses and produce a hierarchical tree for the clusters.
dc.description.advisorDoina Caragea
dc.description.advisorWilliam H. Hsu
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Computing and Information Sciences
dc.description.levelMasters
dc.identifier.urihttp://hdl.handle.net/2097/17580
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectLSA
dc.subjectLDA
dc.subjectClustering
dc.subjectBusinesses
dc.subjectBisecting Kmeans
dc.subject.umiComputer Science (0984)
dc.subject.umiEconomics (0501)
dc.titleAn application of topic modeling algorithms to text analytics in business intelligence
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MajedAlsadhan2014.pdf
Size:
8.84 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: