Predicting harmful algal blooms and uncovering mortgage bias: a data-intensive thesis

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis presents two data science approaches for important environmental and social problems: predicting harmful algal blooms (HABs) resulting from cyanobacteria and identifying racial biases inherent in home mortgage systems.

In the first chapter, a machine learning model is developed to forecast HABs in Marion Reservoir, Kansas. HABs are a threat to water resources as they emit toxic chemicals that are harmful to agriculture and aquatic species. Early prediction of algae growth will help manage and prevent further growth. Various models are utilized for the prediction, including Random Forest, Support Vector Machine, Gaussian Bayes, Decision Tree, Long Short-Term Memory models, and XGBoost. In addition, using feature analysis, several factors were found that do not significantly affect the accuracy of predictions. Furthermore, the research extends its scope by comparing the algal bloom trends observed in Owasco Lake, New York, with those in Marion Reservoir. The findings of this research highlight the capacity of data science methodologies to tackle environmental issues, hence offering insights into the topic of proactive regulation of the water ecosystem.

The second chapter examines an extensive dataset of federal home mortgage data in the United States. This dataset covers 13 years and includes a vast number of loans. By utilizing machine learning methodologies, we reveal a significant correlation between the qualities of borrowers and mortgage data, particularly concerning the borrower's racial background. The results of our study indicate an association between the personal attributes of borrowers and loan data, suggesting that borrower race plays a significant role in the observed racial discrepancies in mortgage lending. Although other historical and present prejudices may be at play, this study offers quantitative evidence of racial biases across the home mortgage system. By identifying and examining these biases, our study makes a valuable contribution to enhancing comprehension of the social concerns about equality and discrimination within the financial industry.

Together, these chapters emphasize the significance of employing data-driven research methodologies to address complex environmental challenges and uncover disparities in social equity. This highlights the multidisciplinary capacity of data science in the pursuit of achieving a more sustainable and equitable future.

Description

Keywords

Machine Learning, HABs, Alagal Bloom Prediction, Data Science, Racial Bias, Federal Hoam Loan

Graduation Month

May

Degree

Master of Science

Department

Department of Computer Science

Major Professor

Lior Shamir

Date

2024

Type

Thesis

Citation