Inferencing with sparse spatio-temporal data in biological systems

dc.contributor.authorTharzeen, Aabila
dc.date.accessioned2025-05-07T15:01:15Z
dc.date.available2025-05-07T15:01:15Z
dc.date.graduationmonthAugust
dc.date.issued2025
dc.description.abstractSpatio-temporal data analysis plays a crucial role in many scientific domains, including biological systems, earth sciences, autonomous vehicles, and many others, providing critical insights into how spatially coherent entities evolve over time. Particularly in biological systems, accurately understanding the underlying cause-effect relationships requires systematic exploitation of both spatial and temporal variations. One of the major challenges in biological systems is that available data can be limited due to high experimental costs, ethical considerations, or logistical constraints and can hinder accurate modeling of the underlying complex spatio-temporal phenomena. Classical statistical approaches, while interpretable, frequently struggle to model complex nonlinear dependencies, whereas purely data-driven machine learning (ML) methods risk overfitting and poor generalization when dealing with limited data. Thus, addressing challenges associated with sparse spatio-temporal data is crucial to reliably inferring meaningful insights from biological systems. This dissertation addresses challenges associated with sparse spatio-temporal inference by systematically integrating uncertainty quantification (UQ) into ML frameworks specifically tailored for biological systems. Missing data is often handled by imputation based on available observations. However, the missingness itself can contain critical information. The imputation can introduce bias and information loss, while failing to effecttively capture the underlying spatio-temporal relationships. To address this “ imputation dilemma," this dissertation proposes and theoretically analyzes a novel Informative Missing Indicator Method (IMIM) specifically designed for neural networks. IMIM helps decide when imputation should occur without introducing bias or loss of information in the data. Furthermore, a graph neural network combined with a recurrent neural network-based spatio-temporal imputation framework is developed to systematically capture spatio-temporal relationships, significantly enhancing predictive capabilities in downstream tasks by effectively increasing the amount of informative data available. Additionally, learning ML models from limited data can also be challenging due to the high risk of overfitting and poor generalizability. To mitigate these issues, this dissertation introduces innovative methods to integrate prior knowledge that represents high-level abstractions of natural phenomena into ML frameworks either as observational bias or learning bias. This ensures that model predictions conform to known scientific principles. The framework also facilitates inference of uncertain parameters not directly observable from data using the uncertainty quantified on the model predictions. In safety-critical biological applications, confidence in predictions is as crucial as prediction accuracy itself. Therefore, recognizing the critical importance of uncertainty quantification, this dissertation presents a generic, task-agnostic UQ framework utilizing neural stochastic differential equations (Neural SDEs). This framework analytically captures epistemic uncertainty in both traditional neural networks and graph neural networks, thereby enhancing model reliability and interoperability. Additionally, this dissertation proposes an uncertainty-guided active learning framework that analytically propagates spatio-temporal measurement uncertainty to strategically select the most informative samples. This approach effectively reduces overall prediction uncertainty, optimizing resource usage and improving predictive accuracy. The methods developed in this dissertation are highly beneficial for healthcare and other data-scarce, safety-critical biological applications where reliability, accuracy, and informed decision-making are essential.
dc.description.advisorBalasubramaniam Natarajan
dc.description.degreeDoctor of Philosophy
dc.description.departmentDepartment of Electrical and Computer Engineering
dc.description.levelDoctoral
dc.description.sponsorshipThis material is based upon work supported by National Science Foundation (NSF) CNS 2039014
dc.identifier.urihttps://hdl.handle.net/2097/45020
dc.language.isoen_US
dc.subjectsparse spatio temporal data
dc.subjectUncertainty qunatification
dc.subjectMachine learning
dc.subjectActive learning
dc.subjectPhysics informed machine learning
dc.subjectNeural stochastic differential equation
dc.titleInferencing with sparse spatio-temporal data in biological systems
dc.typeDissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AabilaTharzeen2025.pdf
Size:
6.83 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.65 KB
Format:
Item-specific license agreed upon to submission
Description: