Ethanol Plant Predictive Regression Models: The Importance of Plant Data Analytics


Journal Title

Journal ISSN

Volume Title



The modern use of data analytics is not new to production processes, however the substantial reliance of it in the ethanol industry has been increasing over recent years. Being able to pull larger amounts of data is important to monitor an ethanol plant’s KPI’s (key performance indicators). Taking data analysis to the next level of being able to run regression models and predictive type examination to accurately determine ethanol yield is the succeeding phase currently facing the ethanol business on the plant level. Using simple regressions and an industry survey to deliver data can help both ethanol plant personnel and vendor data scientists to work together to be able to use all the information on hundreds of ethanol plant variables that are gathered daily to provide predictive guidance. Over the years, the ethanol industry has also become a crucial business for imports and exports in countries, such as the United States, Canada, and Brazil. However, the ethanol industry does come with many risks and challenges and many of them are beyond an ethanol plant’s control. For this reason, the purpose of this thesis is to examine the importance and impact of data analytics in ethanol production, and to determine the value that predictive modeling of ethanol yield can have for an ethanol plant. An ethanol industry-aimed survey was developed, conducted, and data summarized. Dependent variable and independent variables for regression analyses were analyzed to see the trends for 2010 from an Excel extract provided by Plant ABC. A linear regression model of ethanol plant data was used in this thesis to be able to examine contemporaneous dependence of fifteen different variables with the dependent variable, ethanol yield. Regression modeling was also used to determine the factors that are statistically significant in predicting ethanol yield using other types of models with alternative functional forms, including the semi-log, double-log, and quadratic. Ethanol yield linear regression was estimated and showed that the independent variables of ratio milo, Drop pH, Drop DP4+, Drop Glucose, Drop Lactic Acid, and Drop Acetic Acid had p-values under 1% and have significant correlations to ethanol yield. The quadratic model yielded the lowest RMSE indicating the best predicted model out of the four models estimated.



Ethanol, Data Analytics, Regression, Predictive Analysis, Production Process

Graduation Month



Master of Agribusiness


Department of Agricultural Economics

Major Professor

Jason S. Bergtold