Hugendubel.info - Die B2B Online-Buchhandlung 

Merkliste
Die Merkliste ist leer.
Bitte warten - die Druckansicht der Seite wird vorbereitet.
Der Druckdialog öffnet sich, sobald die Seite vollständig geladen wurde.
Sollte die Druckvorschau unvollständig sein, bitte schliessen und "Erneut drucken" wählen.

Rank-Based Methods for Shrinkage and Selection

With Application to Machine Learning
Wileyerschienen am01.07.2022
Rank-Based Methods for Shrinkage and Selection
A practical and hands-on guide to the theory and methodology of statistical estimation based on rank
Robust statistics is an important field in contemporary mathematics and applied statistical methods. Rank-Based Methods for Shrinkage and Selection: With Application to Machine Learning describes techniques to produce higher quality data analysis in shrinkage and subset selection to obtain parsimonious models with outlier-free prediction. This book is intended for statisticians, economists, biostatisticians, data scientists and graduate students.
Rank-Based Methods for Shrinkage and Selection elaborates on rank-based theory and application in machine learning to robustify the least squares methodology. It also includes: Development of rank theory and application of shrinkage and selection
Methodology for robust data science using penalized rank estimators
Theory and methods of penalized rank dispersion for ridge, LASSO and Enet
Topics include Liu regression, high-dimension, and AR(p)
Novel rank-based logistic regression and neural networks
Problem sets include R code to demonstrate its use in machine learning



A. K. Md. Ehsanes Saleh, PhD, is a Professor Emeritus and Distinguished Professor in the School of Mathematics and Statistics, Carleton University, Ottawa, Canada. He is Fellow of IMS, ASA and Honorary member of SSC, Canada.

Mohammad Arashi, PhD, is an Associate Professor at Ferdowsi University of Mashhad in Iran and Extraordinary Professor and C2 rated researcher at University of Pretoria, Pretoria, South Africa. He is an elected member of ISI.
Resve A. Saleh, M.Sc, PhD (Berkeley), is a Professor Emeritus in the Department of ECE at the University of British Columbia, Vancouver, Canada, and formerly with University of Illinois and Stanford University. He is the author of 4 books and Fellow of the IEEE.
Mina Norouzirad, PhD, is a post-doctoral researcher at the Center for Mathematics and Applications (CMA) of NOVA University of Lisbon, Portugal.
mehr
Verfügbare Formate
BuchGebunden
EUR138,50
E-BookPDF2 - DRM Adobe / Adobe Ebook ReaderE-Book
EUR109,99
E-BookEPUB2 - DRM Adobe / EPUBE-Book
EUR109,99

Produkt

KlappentextRank-Based Methods for Shrinkage and Selection
A practical and hands-on guide to the theory and methodology of statistical estimation based on rank
Robust statistics is an important field in contemporary mathematics and applied statistical methods. Rank-Based Methods for Shrinkage and Selection: With Application to Machine Learning describes techniques to produce higher quality data analysis in shrinkage and subset selection to obtain parsimonious models with outlier-free prediction. This book is intended for statisticians, economists, biostatisticians, data scientists and graduate students.
Rank-Based Methods for Shrinkage and Selection elaborates on rank-based theory and application in machine learning to robustify the least squares methodology. It also includes: Development of rank theory and application of shrinkage and selection
Methodology for robust data science using penalized rank estimators
Theory and methods of penalized rank dispersion for ridge, LASSO and Enet
Topics include Liu regression, high-dimension, and AR(p)
Novel rank-based logistic regression and neural networks
Problem sets include R code to demonstrate its use in machine learning



A. K. Md. Ehsanes Saleh, PhD, is a Professor Emeritus and Distinguished Professor in the School of Mathematics and Statistics, Carleton University, Ottawa, Canada. He is Fellow of IMS, ASA and Honorary member of SSC, Canada.

Mohammad Arashi, PhD, is an Associate Professor at Ferdowsi University of Mashhad in Iran and Extraordinary Professor and C2 rated researcher at University of Pretoria, Pretoria, South Africa. He is an elected member of ISI.
Resve A. Saleh, M.Sc, PhD (Berkeley), is a Professor Emeritus in the Department of ECE at the University of British Columbia, Vancouver, Canada, and formerly with University of Illinois and Stanford University. He is the author of 4 books and Fellow of the IEEE.
Mina Norouzirad, PhD, is a post-doctoral researcher at the Center for Mathematics and Applications (CMA) of NOVA University of Lisbon, Portugal.
Details
Weitere ISBN/GTIN9781119625421
ProduktartE-Book
EinbandartE-Book
FormatEPUB
Verlag
Erscheinungsjahr2022
Erscheinungsdatum01.07.2022
Seiten480 Seiten
SpracheEnglisch
Dateigrösse29948
Artikel-Nr.9146040
Rubriken
Genre9201

Inhalt/Kritik

Leseprobe

Preface

The objective of this book is to introduce the audience to the theory and application of robust statistical methodologies using rank-based methods. We present a number of new ideas and research directions in machine learning and statistical analysis that the reader can and should pursue in the future. We begin by noting that the well-known least squares and likelihood principles are traditional methods of estimation in machine learning and data science. One of the most widely read books is the Introduction to Statistical Learning (James et al., 2013) which describes these and other methods. However, it also properly identifies many of their shortcomings, especially in terms of robustness in the presence of outliers. Our book describes a number of novel ideas and concepts to resolve these problems, many of which are worthy of further investigation. Our goal is to motivate the interest of more researchers to pursue further activities in this field. We build on this motivation to carry out a rigorous mathematical analysis of rank-based penalty estimators.

From our point of view, outliers are present in almost all real-world data sets. They may be the result of human error, transmission error, measurement error or simply due to the nature of the data being collected. Whatever be the reason, we must first recognize that all data sets have some form of outliers and then build solutions based on this fact. Outliers may greatly affect the estimates and lead to poor prediction accuracy. As a result, operations such as data cleaning, outlier detection and robust regression methods are extremely important in building models that provide suitably accurate prediction capability. Here, we describe rank-based methods to address many such problems. Indeed, many researchers are now involved in these and other methods towards robust data science. Most of the methods and results presented in this book were derived from our implementations in R and Python which are languages used routinely by statisticians and by practitioners in machine learning and data science. Some of the problems at the end of each chapter involve the use of R. The reader will be well-served to follow the descriptions in the book while implementing the ideas wherever possible in R or Python. This is the best way to get the most out of this book.

Rank regression is based on the linear rank dispersion function described by Jaeckel (1972). The dispersion function replaces the least squares loss function to enable estimates based on the median rather than the mean. This book is intended to guide the reader in this direction starting with basic principles such as the importance of the median vs. the mean, comparisons of rank vs. least squares methods on simple linear problems, and the role of penalty functions in improving the accuracy of prediction. We present new practical methods of data cleaning, subset selection and shrinkage estimation in the context of rank-based methods. We then begin our theoretical journey starting with basic rank statistics for location and simple linear models, and then move on to multiple regression, ANOVA and problems in a high-dimensional setting. We conclude with new ideas not published elsewhere in the literature in the area of rank-based logistic regression and neural networks to address classification problems in machine learning and data science.

We believe that most practitioners today are still employing least squares and log-likelihood methods that are not robust in the presence of outliers. This is due to the long history of these estimation methods in statistics and their natural adoption in the machine learning community over the past two decades. However, the history of estimation theory actually changed its course radically many decades prior when Stein (1956) and James and Stein (1961) proved that the sample mean based on a sample from a p-dimensional multivariate normal distribution is inadmissible under a quadratic loss function for p ⥠3. This result gave birth to a class of shrinkage estimators in various forms and set-ups. Due to the immense impact of Stein s theory, scores of technical papers appeared in the literature covering many areas of application. Beginning in the 1970s, the pioneering work of Saleh and Sen (1978, 1983, 1984b, a, 1985a, a, b, c, d, e, 1986, 1987) expanded the scope of this class of shrinkage estimators using the quasi-empirical Bayes method to obtain robust (such as R-, L-, and M-estimation) Stein-type estimators. Details are provided in Saleh (2006).

Of particular interest here is the use of penalty estimators in the context of robust R-estimation. Next generation shrinkage estimators known as ridge regression estimators for the multiple linear regression model were developed by Hoerl and Kennard (1970) based on Tikhonov s regularization (Tikhonov, 1963). The ridge regression (RR) estimator is the result of minimizing the penalized least squares criterion using an L2-penalty function. Ridge regression laid the foundation of penalty estimation. Later, Tibshirani (1996) proposed the least absolute shrinkage and selection operator (LASSO) by minimizing the penalized least squares criterion using an L1-penalty function which went viral in the area of model selection.

Unlike the RR estimator, LASSO simultaneously selects and estimates variables. It is the reminiscent of subset selection . The subset selection rule is extremely variable due to its inherent discreteness (Breiman, 1996; Fan and Li, 2001). It is also highly variable and often trapped into a locally optimal solution rather than the globally optimal solution. LASSO is a continuous process and stable; however, it is not suggested to be used in multicollinear situations. Zou and Hastie (2005) proposed a compromised penalty function which is a combination of L1 and L2 penalty giving rise to the elastic net estimator. It can select groups of correlated variables. It is metaphorically like a stretchable fishing net retaining all potentially big fish.

Although LASSO simultaneously estimates and selects variables, it does not possess oracle properties in general. To overcome this problem Fan and Li (2001) proposed the smoothly clipped absolute deviation (SCAD) penalty function. Following Fan and Li (2001), Zou (2006) modified LASSO using a weighted L1-penalty function. Zou (2006) called this estimator an adaptive LASSO (aLASSO). Later, Zhang (2010) suggested a minimax concave penalty (MCP) estimator. All results found in the above literature are based on penalized least squares criterion.

This book contains a thorough study of rank-based estimation with three basic penalty estimators, namely, ridge regression, LASSO and elastic net . It also includes preliminary test and Stein-type R-estimators for completeness. Efforts are made to present a clear and balanced introduction of rank-based estimators with mathematical comparisons of the properties of various estimators considered. The book is directed towards graduate students, researchers of statistics, economics, bio-statistical biologists and for all applied statisticians, economists and computer scientists and data scientists, among others. The literature is very limited in the area of robust penalty and other shrinkage estimators in the context of rank-based estimation. Here, we provide both theoretical and practical aspects of the subject matter.

The book is spread over twelve chapters. Chapter 1 begins with an introductory examination of the median, outliers and robust rank-based methods, along with a brief look at penalty estimators. Chapter 2 continues with the characteristics of rank-based penalty estimators and demonstrates their enormous value in machine learning. Chapter 3 provides the preliminaries of rank-based theory and various aspects of it, along with a description of penalty estimators, which are then applied to location and simple linear models. Chapters 4 deals with ANOVA and Chapter 5 with seemingly unrelated simple linear models. Chapter 6 considers the multiple linear model and Chapter 7 expands on the partially linear regression model (PLM). The Liu regression estimator is discussed in Chapter 8. Chapter 9 introduces the AR(p) model. Chapter 10 covers selection and shrinkage of variables in high-dimensional data analysis. Chapter 11 deals with multivariate rank-based logistic regression models. Finally, Chapter 12 concludes with applications of rank-based neural networks.

To our knowledge, this is one of the first books to combine advanced statistical analysis with advanced machine learning. Each chapter is self-contained but those interested in machine learning may consider Chapters 1 and 2, and 11 and 12, while those interested in statistics may consider Chapters 3-10. A good mix of the two would be derived from Chapters 1-4 and 11 and 12. It is our hope that readers in both fields will find something of value, and that it will lead to many areas of future research.

The authors wish to thank the developers of Rfit (Kloke and McKean, 2012) and glmnet (Stanford University) which are extremely useful packages for R-estimation and penalized maximum likelihood estimation, respectively. We also thank Professor Brent Johnson (University of Rochester) for the rank-based LASSO and aLASSO code (Johnson and Peng, 2008) provided on his website.

Professor A.K. Md. E. Saleh is grateful to NSERC for supporting his research for more than four decades and is...
mehr

Autor