PhD Dissertation

Empirical comparison of classification rules and of error rate estimators in discriminant analysis

Ph.D., Gembloux Agricultural University

Monte Carlo study was achieved to compare, for two groups discriminant analysis, three classification rules and twenty estimators of error rates associated with the rules, in 480 situations related to the type of distribution, the overlap of the populations, the number of variables, the samples size and the heteroscedasticity degree of the populations, which was measured by a parameter  defined in the study. The results of this study showed that the quadratic rule was the best only for severe heteroscedastic normal populations with high overlap. The linear rule was the best for homoscedastic normal or moderate non normal models with low overlap. The logistic rule was the best for severe non normal models except when homoscedasticity occurred. In the other situations, linear and logistic rules have almost the same performance. By considering the parameters computed on the data samples, the linear rule is the best when normality occurs whereas logistic rule performs well with non-normality. As far as the error rate estimators were concerned, the results indicated the best performance of e632 estimator, for the computation of the actual error rate associated with the discriminant rules used. The parametric estimators eOS, eO, eM and eL can also be advised for the computation of actual error rate but their performance depends on the normality and the heteroscedasticity degree of the populations. Moreover, the effect of the number of groups on the performance of classification rules and error rate estimators, achieved in the case of normal populations allowed to notice the invariability of the performance of classifications rules and error rate estimators, excepted logistic rule, whose performance decreased with the increasing of number of groups.