A Gensim-based approach to continuous-time infinite dynamic topic modeling

dc.contributor.authorTucker, Timothy Darryl
dc.date.accessioned2024-08-12T13:27:24Z
dc.date.available2024-08-12T13:27:24Z
dc.date.graduationmonthAugust
dc.date.issued2024
dc.description.abstractThis report focuses on the problem of dynamic topic modeling for a document stream with a variable number of topics in continuous time. Previous solution approaches include Dynamic Latent Dirichlet Allocation (D-LDA) and online Hierarchical Dirichlet Processes (oHDP). However, models like D-LDA face the issue of requiring a fixed number of topics over time while models like oHDP can only make inferences in discrete steps of time over a minimum time quantum to allow synchronous updates to the number of topics. The dissertation of Elshamy (2012) introduced a hybrid approach consisting of two topic models: a top-level model whose hyperparameters govern the number of extant topics, and the other, a model that uses the number of topics to inform the evolution of topics in continuous time. This report consists of a new implementation of that model using the Gensim library – the purpose of which is to facilitate empirical exploration of DTM design aspects such as hyperparameter updating and parameter interpretation for visualization of topics over time using timestamped text corpora. The resultant implementation is evaluated using a full- text benchmark corpora that are commonly used and consist of timestamped documents. Our objective being improving the log likelihood (and perplexity) on validation documents through an understanding of hyperparameter effects on topic generation and modification. The results of comparing a prior baseline DTM to the hybrid DTM are reported along with basic visualization output for the topics found.
dc.description.advisorWilliam H. Hsu
dc.description.degreeMaster of Science
dc.description.departmentDepartment of Computer Science
dc.description.levelMasters
dc.identifier.urihttps://hdl.handle.net/2097/44463
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectDynamic topic modeling
dc.titleA Gensim-based approach to continuous-time infinite dynamic topic modeling
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TimothyTucker2024.pdf
Size:
275.69 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.6 KB
Format:
Item-specific license agreed upon to submission
Description: