A Gensim-based approach to continuous-time infinite dynamic topic modeling
dc.contributor.author | Tucker, Timothy Darryl | |
dc.date.accessioned | 2024-08-12T13:27:24Z | |
dc.date.available | 2024-08-12T13:27:24Z | |
dc.date.graduationmonth | August | |
dc.date.issued | 2024 | |
dc.description.abstract | This report focuses on the problem of dynamic topic modeling for a document stream with a variable number of topics in continuous time. Previous solution approaches include Dynamic Latent Dirichlet Allocation (D-LDA) and online Hierarchical Dirichlet Processes (oHDP). However, models like D-LDA face the issue of requiring a fixed number of topics over time while models like oHDP can only make inferences in discrete steps of time over a minimum time quantum to allow synchronous updates to the number of topics. The dissertation of Elshamy (2012) introduced a hybrid approach consisting of two topic models: a top-level model whose hyperparameters govern the number of extant topics, and the other, a model that uses the number of topics to inform the evolution of topics in continuous time. This report consists of a new implementation of that model using the Gensim library – the purpose of which is to facilitate empirical exploration of DTM design aspects such as hyperparameter updating and parameter interpretation for visualization of topics over time using timestamped text corpora. The resultant implementation is evaluated using a full- text benchmark corpora that are commonly used and consist of timestamped documents. Our objective being improving the log likelihood (and perplexity) on validation documents through an understanding of hyperparameter effects on topic generation and modification. The results of comparing a prior baseline DTM to the hybrid DTM are reported along with basic visualization output for the topics found. | |
dc.description.advisor | William H. Hsu | |
dc.description.degree | Master of Science | |
dc.description.department | Department of Computer Science | |
dc.description.level | Masters | |
dc.identifier.uri | https://hdl.handle.net/2097/44463 | |
dc.language.iso | en_US | |
dc.publisher | Kansas State University | |
dc.rights | © the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
dc.subject | Dynamic topic modeling | |
dc.title | A Gensim-based approach to continuous-time infinite dynamic topic modeling | |
dc.type | Report |