Constraining galaxy-halo connection using machine learning

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this dissertation, we explore the galaxy-halo connection using advanced machine learning techniques to improve the accuracy and efficiency of modeling small-scale galaxy clustering and constraining Halo Occupation Distribution (HOD) parameters. The galaxy-halo connection, which describes how galaxies are distributed within dark matter halos, is a critical aspect of understanding cosmic structure formation. Traditional methods for studying this connection rely heavily on statistical models like HOD, but these approaches are computationally expensive and often lead to biased parameter estimates due to the complex nature of galaxy formation. To address these challenges, we leverage a range of machine learning algorithms, including artificial neural networks (ANNs), random forests, and Bayesian regression models. Our analysis reveals that while many machine learning algorithms report good statistical fits, they can yield biased likelihood contours in both mean values and variances relative to the true model parameters. This underscores the importance of careful data processing, algorithm selection, and the transformation of training data to achieve unbiased and robust predictions. Among the methods tested, ANNs outperform random forests and ridge regression when the HOD parameter space is appropriately restricted. These machine-learning tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods. Using our ANN-based pipeline, we successfully recreate standard results from the literature by analyzing key metrics such as the projected two-point correlation function (wₚ(rₚ)), angular multipoles of the correlation function (ξℓ(r)), and the void probability function (VPF). We show that while combining wₚ(rₚ) and VPF improves parameter constraints, adding the multipoles ξ₀, ξ₂, and ξ₄ to wₚ(rₚ) does not significantly enhance these constraints. Additionally, we explore two auxiliary projects that use deep learning in fields outside of cosmology: a deep learning-based fingerprint verification system and a zero-shot classification framework for bioacoustics, specifically bird call recognition. Although these projects are not directly related to galaxy-halo studies, they demonstrate the adaptability of machine-learning techniques across diverse fields. The methods developed for biometric verification and species classification could inspire similar approaches in cosmology, particularly for handling sparse data and complex classification problems. In summary, this dissertation advances the field of cosmology by integrating modern machine-learning approaches into the study of the galaxy-halo connection. Our results not only improve the efficiency and accuracy of modeling galaxy clustering but also highlight the broader applicability of machine learning techniques, offering new directions for future research in both cosmology and other scientific domains.

Description

Keywords

Large-scale structure of Universe, Cosmology, Halos, Machine learning, Deep learning, Galaxies

Graduation Month

December

Degree

Doctor of Philosophy

Department

Department of Physics

Major Professor

Lado Samushia

Date

Type

Dissertation

Citation