A covariate-adjusted classification model for multiple biomarkers in disease screening and diagnosis



Journal Title

Journal ISSN

Volume Title



The classification methods based on a linear combination of multiple biomarkers have been widely used to improve the accuracy in disease screening and diagnosis. However, it is seldom to include covariates such as gender and age at diagnosis into these classification procedures. It is known that biomarkers or patient outcomes are often associated with some covariates in practice, therefore the inclusion of covariates may further improve the power of prediction as well as the classification accuracy. In this study, we focus on the classification methods for multiple biomarkers adjusting for covariates. First, we proposed a covariate-adjusted classification model for multiple cross-sectional biomarkers. Technically, it is a two-stage method with a parametric or non-parametric approach to combine biomarkers first, and then incorporating covariates with the use of the maximum rank correlation estimators. Specifically, these parameter coefficients associated with covariates can be estimated by maximizing the area under the receiver operating characteristic (ROC) curve. The asymptotic properties of these estimators in the model are also discussed. An intensive simulation study is conducted to evaluate the performance of this proposed method in finite sample sizes. The data of colorectal cancer and pancreatic cancer are used to illustrate the proposed methodology for multiple cross-sectional biomarkers. We further extend our classification method to longitudinal biomarkers. With the use of a natural cubic spline basis, each subject's longitudinal biomarker profile can be characterized by spline coefficients with a significant reduction in the dimension of data. Specifically, the maximum reduction can be achieved by controlling the number of knots or degrees of freedom in the spline approach, and its coefficients can be obtained by the ordinary least squares method. We consider each spline coefficient as ``biomarker'' in our previous method, then the optimal linear combination of those spline coefficients can be acquired using Stepwise method without any distributional assumption. Afterward, covariates are included by maximizing the corresponding AUC as the second stage. The proposed method is applied to the longitudinal data of Alzheimer's disease and the primary biliary cirrhosis data for illustration. We conduct a simulation study to assess the finite-sample performance of the proposed method for longitudinal biomarkers.



Classification, AUC, Disease diagnosis, Receiver operating characteristic curve

Graduation Month



Doctor of Philosophy


Department of Statistics

Major Professor

Wei-Wen Hsu