For over fourty years, choosing a statistical model thanks to data consisted in optimizing a criterion based on penalized likelihood (H. Akaike, 1973) or penalized least squares (C. Mallows, 1973). These methods are valid for predictive model choice (regression, classification) and for descriptive models (clustering, mixtures). Most of their properties are asymptotic, but a non asymptotic theory has emerged at the end of the last century (Birgé-Massart, 1997). Instead of choosing the best model among several candidates, model aggregation combines different models, often linearly, allowing better predictions. Bayesian statistics provide a useful framework for model choice and model aggregation with Bayesian Model Averaging.
In a purely predictive context and with very few assumptions, ensemble methods or meta-algorithms, such as boosting and random forests, have proven their efficiency.
This volume originates from the collaboration of high-level specialists: Christophe Biernacki (Université de Lille I), Jean-Michel Marin (Université de Montpellier), Pascal Massart (Université de Paris-Sud), Cathy Maugis-Rabusseau (INSA de Toulouse), Mathilde Mougeot (Université Paris Diderot), and Nicolas Vayatis (École Normale Supérieure de Cachan) who were all speakers at the 16th biennal workshop on advanced statistics organized by the French Statistical Society. In this book, the reader will find a synthesis of the methodologies’ foundations and of recent work and applications in various fields.
The French Statistical Society (SFdS) is a non-profit organization that promotes the development of statistics, as well as a professional body for all kinds of statisticians working in public and private sectors. Founded in 1997, SFdS is the heir of the Société de Statistique de Paris, established in 1860. SFdS is a corporate member of the International Statistical Institute and a founding member of FENStatS—the Federation of European National Statistical Societies.
1. A Model Selection Tale. 2. Model’s Introduction. 3. Non Linear Gaussian Model Selection. 4. Bayesian Model Choice. 5. Some Computational Aspects of Bayesian Model Choice. 6. Randomization and Aggregation for Predictive Modeling with Classification Data. 7. Mixture Models. 8. Calibration of Penalties. High Dimensional Clustering. 10. Clustering of Co-expressed Genes. 11. Forecasting the French National Electricity Consumption: from Sparse Models to Aggregated Forecasts.