Fundamentals of machine learning for predictive data analytics. Algorithms, worked examples, and case studies (Q2794333)

From MaRDI portal





scientific article; zbMATH DE number 6553317
Language Label Description Also known as
English
Fundamentals of machine learning for predictive data analytics. Algorithms, worked examples, and case studies
scientific article; zbMATH DE number 6553317

    Statements

    0 references
    0 references
    0 references
    10 March 2016
    0 references
    regression
    0 references
    smoothing
    0 references
    normalization
    0 references
    machine learning
    0 references
    sampling
    0 references
    decision trees
    0 references
    data analytics
    0 references
    data exploration
    0 references
    missing values
    0 references
    performance measures
    0 references
    features
    0 references
    Bayes' theorem
    0 references
    probability-based learning
    0 references
    evaluation on a test set
    0 references
    information-based learning
    0 references
    error-based learning
    0 references
    similarity-based learning
    0 references
    multivariable-based regression
    0 references
    misclassification rate
    0 references
    Fundamentals of machine learning for predictive data analytics. Algorithms, worked examples, and case studies (English)
    0 references
    The book is an accessible yet thorough text-book structured in eight chapters, two case studies, an epilogue and three appendices. Its main distinctive feature is the balance between clear overviews of popular algorithms and methods and completely worked examples which not only illustrate the theoretical concepts but also exemplify the steps for approaching real-world problems (in particular the case studies illustrating the customer churn and the galaxy classification, discussed in detail in Chapters 9 and 10, respectively).NEWLINENEWLINEThe first chapter is built as an introductory overview of machine learning and its usage for predictive data analysis. The second chapter focuses on decisions and tackles approaches for converting different types of data into features; the main example analysed throughout this chapter and the next relates to motor insurance fraud. In the third chapter the authors present different methods for data exploration, discussing how to handle missing values and outliers, how to progress beyond standard reports and investigate/visualize relationships between features, e.g. covariance and correlation. The next four chapters present the data analyses from four distinct angles: the information view (Chapter 4), the similarity one (Chapter 5), the probability-based one (Chapter 6) and the error-based one (Chapter 7). In the fourth chapter, approaches for information-based learning are presented, including decision trees (the standard ID3 approach) and entropy analyses. Next, the similarity-based methods are introduced, using the ``feature space'' concept and the nearest neighbour algorithm. The extensions include the effect of noisy data and the role of normalizations. In the sixth chapter the authors introduce the Bayes theorem as pivot for the probability-based learning. The main algorithm is the naïve Bayes, which can be optimized through either smoothing or binning. The seventh chapter focuses on different types of regressions in order to illustrate the error-based learning. Following the introduction of the simple linear regression, the authors also present the multiple linear regression with gradient descent and discuss the effect of setting the learning rate using weight decay or how non-linear relationships can be modelled.NEWLINENEWLINEIn the eighth chapter the most used methods for evaluation are presented, i.e., the usage of a hold-out set. Various performance measures and their effect on categorical, continuous or multinomial targets are discussed. The epilogue chapter overviews different perspectives on prediction models and debates the usage of various approaches depending on the data or on the question to be asked. The three appendices provide additional support for the better understanding of the data features revealed by different types of plots (Appendix A), for probability-based approaches (Appendix B) and for differentiation (Appendix C).NEWLINENEWLINEThe book was fundamentally built as a textbook for undergraduates, however the style, balance between the algorithmic level of detail and additional explanations of the various features and numerous examples recommend it for a wider audience of scholars interested in acquiring a fundamental background in data analytics.
    0 references

    Identifiers