An application of topic modeling algorithms to text analytics in business intelligence

dc.contributor.authorAlsadhan, Majeden_US
dc.date.accessioned2014-04-25T21:51:19Z
dc.date.available2014-04-25T21:51:19Z
dc.date.graduationmonthMayen_US
dc.date.issued2014-04-25
dc.date.published2014en_US
dc.description.abstractIn this work, we focus on the task of clustering businesses in the state of Kansas based on the content of their websites and their business listing information. Our goal is to cluster the businesses and overcome the challenges facing current approaches such as: data noise, low number of clustered businesses, and lack of evaluation approach. We propose an LSA-based approach to analyze the businesses’ data and cluster those businesses by using Bisecting K-Means algorithm. In this approach, we analyze the businesses’ data by using LSA and produce businesses’ representations in a reduced space. We then use the businesses’ representations to cluster the businesses by applying the Bisecting K-Means algorithm. We also apply an existing LDA-based approach to cluster the businesses and compare the results with our proposed LSA-based approach at the end. In this work, we evaluate the results by using a human-expert-based evaluation procedure. At the end, we visualize the clusters produced in this work by using Google Earth and Tableau. According to our evaluation procedure, the LDA-based approach performed slightly bet- ter then the LSA-based approach. However, with the LDA-based approach, there were some limitations which are: low number of clustered businesses, and not being able to produce a hierarchical tree for the clusters. With the LSA-based approach, we were able to cluster all the businesses and produce a hierarchical tree for the clusters.en_US
dc.description.advisorDoina Carageaen_US
dc.description.advisorWilliam H. Hsuen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computing and Information Sciencesen_US
dc.description.levelMastersen_US
dc.identifier.urihttp://hdl.handle.net/2097/17580
dc.language.isoen_USen_US
dc.publisherKansas State Universityen
dc.subjectLSAen_US
dc.subjectLDAen_US
dc.subjectClusteringen_US
dc.subjectBusinessesen_US
dc.subjectBisecting Kmeansen_US
dc.subject.umiComputer Science (0984)en_US
dc.subject.umiEconomics (0501)en_US
dc.titleAn application of topic modeling algorithms to text analytics in business intelligenceen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MajedAlsadhan2014.pdf
Size:
8.84 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: