Inferencing with sparse spatio-temporal data in biological systems

Tharzeen, Aabila

Inferencing with sparse spatio-temporal data in biological systems

dc.contributor.author	Tharzeen, Aabila
dc.date.accessioned	2025-05-07T15:01:15Z
dc.date.available	2025-05-07T15:01:15Z
dc.date.graduationmonth	August
dc.date.issued	2025
dc.description.abstract	Spatio-temporal data analysis plays a crucial role in many scientific domains, including biological systems, earth sciences, autonomous vehicles, and many others, providing critical insights into how spatially coherent entities evolve over time. Particularly in biological systems, accurately understanding the underlying cause-effect relationships requires systematic exploitation of both spatial and temporal variations. One of the major challenges in biological systems is that available data can be limited due to high experimental costs, ethical considerations, or logistical constraints and can hinder accurate modeling of the underlying complex spatio-temporal phenomena. Classical statistical approaches, while interpretable, frequently struggle to model complex nonlinear dependencies, whereas purely data-driven machine learning (ML) methods risk overfitting and poor generalization when dealing with limited data. Thus, addressing challenges associated with sparse spatio-temporal data is crucial to reliably inferring meaningful insights from biological systems. This dissertation addresses challenges associated with sparse spatio-temporal inference by systematically integrating uncertainty quantification (UQ) into ML frameworks specifically tailored for biological systems. Missing data is often handled by imputation based on available observations. However, the missingness itself can contain critical information. The imputation can introduce bias and information loss, while failing to effecttively capture the underlying spatio-temporal relationships. To address this “ imputation dilemma," this dissertation proposes and theoretically analyzes a novel Informative Missing Indicator Method (IMIM) specifically designed for neural networks. IMIM helps decide when imputation should occur without introducing bias or loss of information in the data. Furthermore, a graph neural network combined with a recurrent neural network-based spatio-temporal imputation framework is developed to systematically capture spatio-temporal relationships, significantly enhancing predictive capabilities in downstream tasks by effectively increasing the amount of informative data available. Additionally, learning ML models from limited data can also be challenging due to the high risk of overfitting and poor generalizability. To mitigate these issues, this dissertation introduces innovative methods to integrate prior knowledge that represents high-level abstractions of natural phenomena into ML frameworks either as observational bias or learning bias. This ensures that model predictions conform to known scientific principles. The framework also facilitates inference of uncertain parameters not directly observable from data using the uncertainty quantified on the model predictions. In safety-critical biological applications, confidence in predictions is as crucial as prediction accuracy itself. Therefore, recognizing the critical importance of uncertainty quantification, this dissertation presents a generic, task-agnostic UQ framework utilizing neural stochastic differential equations (Neural SDEs). This framework analytically captures epistemic uncertainty in both traditional neural networks and graph neural networks, thereby enhancing model reliability and interoperability. Additionally, this dissertation proposes an uncertainty-guided active learning framework that analytically propagates spatio-temporal measurement uncertainty to strategically select the most informative samples. This approach effectively reduces overall prediction uncertainty, optimizing resource usage and improving predictive accuracy. The methods developed in this dissertation are highly beneficial for healthcare and other data-scarce, safety-critical biological applications where reliability, accuracy, and informed decision-making are essential.
dc.description.advisor	Balasubramaniam Natarajan
dc.description.degree	Doctor of Philosophy
dc.description.department	Department of Electrical and Computer Engineering
dc.description.level	Doctoral
dc.description.sponsorship	This material is based upon work supported by National Science Foundation (NSF) CNS 2039014
dc.identifier.uri	https://hdl.handle.net/2097/45020
dc.language.iso	en_US
dc.subject	Sparse spatio temporal data
dc.subject	Uncertainty qunatification
dc.subject	Machine learning
dc.subject	Active learning
dc.subject	Physics informed machine learning
dc.subject	Neural stochastic differential equation
dc.title	Inferencing with sparse spatio-temporal data in biological systems
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AabilaTharzeen2025.pdf
Size:: 6.83 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -