An application of topic modeling algorithms to text analytics in business intelligence



Journal Title

Journal ISSN

Volume Title


Kansas State University


In this work, we focus on the task of clustering businesses in the state of Kansas based on the content of their websites and their business listing information. Our goal is to cluster the businesses and overcome the challenges facing current approaches such as: data noise, low number of clustered businesses, and lack of evaluation approach. We propose an LSA-based approach to analyze the businesses’ data and cluster those businesses by using Bisecting K-Means algorithm. In this approach, we analyze the businesses’ data by using LSA and produce businesses’ representations in a reduced space. We then use the businesses’ representations to cluster the businesses by applying the Bisecting K-Means algorithm. We also apply an existing LDA-based approach to cluster the businesses and compare the results with our proposed LSA-based approach at the end. In this work, we evaluate the results by using a human-expert-based evaluation procedure. At the end, we visualize the clusters produced in this work by using Google Earth and Tableau. According to our evaluation procedure, the LDA-based approach performed slightly bet- ter then the LSA-based approach. However, with the LDA-based approach, there were some limitations which are: low number of clustered businesses, and not being able to produce a hierarchical tree for the clusters. With the LSA-based approach, we were able to cluster all the businesses and produce a hierarchical tree for the clusters.



LSA, LDA, Clustering, Businesses, Bisecting Kmeans

Graduation Month



Master of Science


Department of Computing and Information Sciences

Major Professor

Doina Caragea; William H. Hsu