Fundamentals of machine learning for predictive data analytics. Algorithms, worked examples, and case studies (Q2794333)

scientific article; zbMATH DE number 6553317

Language	Label	Description	Also known as
English	Fundamentals of machine learning for predictive data analytics. Algorithms, worked examples, and case studies	scientific article; zbMATH DE number 6553317

Statements

instance of

0 references

0 references

0 references

0 references

10 March 2016

0 references

zbMATH Keywords

regression

0 references

smoothing

0 references

normalization

0 references

machine learning

0 references

sampling

0 references

decision trees

0 references

data analytics

0 references

data exploration

0 references

missing values

0 references

performance measures

0 references

features

0 references

Bayes' theorem

0 references

probability-based learning

0 references

evaluation on a test set

0 references

information-based learning

0 references

error-based learning

0 references

similarity-based learning

0 references

multivariable-based regression

0 references

misclassification rate

0 references

MaRDI profile type

Publication

0 references

title

Fundamentals of machine learning for predictive data analytics. Algorithms, worked examples, and case studies (English)

0 references

review text

The book is an accessible yet thorough text-book structured in eight chapters, two case studies, an epilogue and three appendices. Its main distinctive feature is the balance between clear overviews of popular algorithms and methods and completely worked examples which not only illustrate the theoretical concepts but also exemplify the steps for approaching real-world problems (in particular the case studies illustrating the customer churn and the galaxy classification, discussed in detail in Chapters 9 and 10, respectively).NEWLINENEWLINEThe first chapter is built as an introductory overview of machine learning and its usage for predictive data analysis. The second chapter focuses on decisions and tackles approaches for converting different types of data into features; the main example analysed throughout this chapter and the next relates to motor insurance fraud. In the third chapter the authors present different methods for data exploration, discussing how to handle missing values and outliers, how to progress beyond standard reports and investigate/visualize relationships between features, e.g. covariance and correlation. The next four chapters present the data analyses from four distinct angles: the information view (Chapter 4), the similarity one (Chapter 5), the probability-based one (Chapter 6) and the error-based one (Chapter 7). In the fourth chapter, approaches for information-based learning are presented, including decision trees (the standard ID3 approach) and entropy analyses. Next, the similarity-based methods are introduced, using the ``feature space'' concept and the nearest neighbour algorithm. The extensions include the effect of noisy data and the role of normalizations. In the sixth chapter the authors introduce the Bayes theorem as pivot for the probability-based learning. The main algorithm is the naïve Bayes, which can be optimized through either smoothing or binning. The seventh chapter focuses on different types of regressions in order to illustrate the error-based learning. Following the introduction of the simple linear regression, the authors also present the multiple linear regression with gradient descent and discuss the effect of setting the learning rate using weight decay or how non-linear relationships can be modelled.NEWLINENEWLINEIn the eighth chapter the most used methods for evaluation are presented, i.e., the usage of a hold-out set. Various performance measures and their effect on categorical, continuous or multinomial targets are discussed. The epilogue chapter overviews different perspectives on prediction models and debates the usage of various approaches depending on the data or on the question to be asked. The three appendices provide additional support for the better understanding of the data features revealed by different types of plots (Appendix A), for probability-based approaches (Appendix B) and for differentiation (Appendix C).NEWLINENEWLINEThe book was fundamentally built as a textbook for undergraduates, however the style, balance between the algorithmic level of detail and additional explanations of the various features and numerous examples recommend it for a wider audience of scholars interested in acquiring a fundamental background in data analytics.

0 references

reviewed by

Irina Ioana Mohorianu

0 references

Identifiers

zbMATH Open document ID

1393.68007

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2794333