Inferencing with sparse spatio-temporal data in biological systems

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Spatio-temporal data analysis plays a crucial role in many scientific domains, including biological systems, earth sciences, autonomous vehicles, and many others, providing critical insights into how spatially coherent entities evolve over time. Particularly in biological systems, accurately understanding the underlying cause-effect relationships requires systematic exploitation of both spatial and temporal variations. One of the major challenges in biological systems is that available data can be limited due to high experimental costs, ethical considerations, or logistical constraints and can hinder accurate modeling of the underlying complex spatio-temporal phenomena. Classical statistical approaches, while interpretable, frequently struggle to model complex nonlinear dependencies, whereas purely data-driven machine learning (ML) methods risk overfitting and poor generalization when dealing with limited data. Thus, addressing challenges associated with sparse spatio-temporal data is crucial to reliably inferring meaningful insights from biological systems. This dissertation addresses challenges associated with sparse spatio-temporal inference by systematically integrating uncertainty quantification (UQ) into ML frameworks specifically tailored for biological systems.

Missing data is often handled by imputation based on available observations. However, the missingness itself can contain critical information. The imputation can introduce bias and information loss, while failing to effecttively capture the underlying spatio-temporal relationships. To address this “ imputation dilemma," this dissertation proposes and theoretically analyzes a novel Informative Missing Indicator Method (IMIM) specifically designed for neural networks. IMIM helps decide when imputation should occur without introducing bias or loss of information in the data. Furthermore, a graph neural network combined with a recurrent neural network-based spatio-temporal imputation framework is developed to systematically capture spatio-temporal relationships, significantly enhancing predictive capabilities in downstream tasks by effectively increasing the amount of informative data available. Additionally, learning ML models from limited data can also be challenging due to the high risk of overfitting and poor generalizability. To mitigate these issues, this dissertation introduces innovative methods to integrate prior knowledge that represents high-level abstractions of natural phenomena into ML frameworks either as observational bias or learning bias. This ensures that model predictions conform to known scientific principles. The framework also facilitates inference of uncertain parameters not directly observable from data using the uncertainty quantified on the model predictions.

In safety-critical biological applications, confidence in predictions is as crucial as prediction accuracy itself. Therefore, recognizing the critical importance of uncertainty quantification, this dissertation presents a generic, task-agnostic UQ framework utilizing neural stochastic differential equations (Neural SDEs). This framework analytically captures epistemic uncertainty in both traditional neural networks and graph neural networks, thereby enhancing model reliability and interoperability. Additionally, this dissertation proposes an uncertainty-guided active learning framework that analytically propagates spatio-temporal measurement uncertainty to strategically select the most informative samples. This approach effectively reduces overall prediction uncertainty, optimizing resource usage and improving predictive accuracy.

The methods developed in this dissertation are highly beneficial for healthcare and other data-scarce, safety-critical biological applications where reliability, accuracy, and informed decision-making are essential.

Description

Keywords

sparse spatio temporal data, Uncertainty qunatification, Machine learning, Active learning, Physics informed machine learning, Neural stochastic differential equation

Graduation Month

August

Degree

Doctor of Philosophy

Department

Department of Electrical and Computer Engineering

Major Professor

Balasubramaniam Natarajan

Date

Type

Dissertation

Citation