# Model adequacy tests for exponential family regression models

## Date

## Authors

## Journal Title

## Journal ISSN

## Volume Title

## Publisher

## Abstract

The problem of testing for lack of fit in exponential family regression models is considered. Such nonlinear models are the natural extension of Normal nonlinear regression models and generalized linear models. As is usually the case, inadequately specified models have an adverse impact on statistical inference and scientific discovery. Models of interest are curved exponential families determined by a sequence of predictor settings and mean regression function, considered as a sub-manifold of the full exponential family. Constructed general alternative models are based on clusterings in the mean parameter components and allow likelihood ratio testing for lack of fit associated with the mean, equivalently natural parameter, for a proposed null model. A maximin clustering methodology is defined in this context to determine suitable clusterings for assessing lack of fit. In addition, a geometrically motivated goodness of fit test statistic for exponential family regression based on the information metric is introduced. This statistic is applied to the cases of logistic regression and Poisson regression, and in both cases it can be seen to be equal to a form of the Pearson chi[superscript]2 statistic. This same statement is true for multinomial regression. In addition, the problem of testing for equal means in a heteroscedastic Normal model is discussed. In particular, a saturated 3 parameter exponential family model is developed which allows for equal means testing with unequal variances. A simulation study was carried out for the logistic and Poisson regression models to investigate comparative performance of the likelihood ratio test, the deviance test and the goodness of fit test based on the information metric. For logistic regression, the Hosmer-Lemeshow test was also included in the simulations. Notably, the likelihood ratio test had comparable power with that of the Hosmer-Lemeshow test under both m- and n-asymptotics, with superior power for constructed alternatives. A distance function defined between densities and based on the information metric is also given. For logistic models, as the natural parameters go to plus or minus infinity, the densities become more and more deterministic and limits of this distance function are shown to play an important role in the lack of fit analysis. A further simulation study investigated the power of a likelihood ratio test and a geometrically derived test based on the information metric for testing equal means in heteroscedastic Normal models.