Abstract
Predictive Analytics (PA) models are assuming increasing role in the world of big data for making decisions in many industries – marketing, banking, insurance, telecommunication, healthcare, cyber, and more. Even in the era of data mining and machine learning, the leading predictive models still belong to the realm of regression. While regression models were originally developed to explain phenomena, find relationships between variables, and draw conclusions, in prediction models the main objective is to build models which are general enough to apply for predicting unseen data, even at the expense of giving up some model accuracy. Therefore, models with good explanation power are not necessarily models with good prediction power, and vice versa. Focusing on regression models, we discuss in this article the differences between explanation and prediction models, propose several principles for building good predictive models, present several performance measures for assessing the quality of the prediction results in classification problems based on logistic regression, and conclude by discussing the deployment process of the model results for decision-making. We end up this article by briefly reviewing the non-parametric decision tree approach for building the PA model.
Original language | English |
---|---|
Title of host publication | Machine Learning for Data Science Handbook |
Subtitle of host publication | Data Mining and Knowledge Discovery Handbook, Third Edition |
Publisher | Springer International Publishing |
Pages | 751-777 |
Number of pages | 27 |
ISBN (Electronic) | 9783031246289 |
ISBN (Print) | 9783031246272 |
DOIs | |
State | Published - 1 Jan 2023 |