Statistical Thinking for Industrial Problem Solving
A free online statistics course
Predictive Modeling and Text Mining
Predictive analytics is about using data and statistical algorithms to predict what might happen next given the current process and environment.
In this module, you will learn about some of the core techniques used in building predictive models, including how to address overfitting, select the best predictive model, and use multiple linear regression and logistic regression. You will also see how to fit other types of predictive models, including penalized regression, decision trees and neural networks. Finally, you will learn how to extract information and meaning from unstructured text data, such as survey response data.
Estimated time to complete this module: 3 to 4 hours
Specific topics covered in this module include:
Essentials of Predictive Modeling
- Introduction to Predictive Modeling
- Overfitting and Model Validation
- Assessing Model Performance: Prediction Models
- Assessing Model Performance: Classification Models
- Receiver-Operating Characteristic (ROC) Curves
Decision Trees
- Introduction to Decision Trees
- Classification Trees
- Regression Trees
- Decision Trees with Validation
- Random (Bootstrap) Forests
Neural Networks
- What is a Neural Network?
- Interpreting Neural Networks
- Predictive Modeling with Neural Networks
Generalized Regression
- Introduction to Generalized Regression
- Fitting Models Using Maximum Likelihood
- Introduction to Penalized Regression
Model Comparison and Selection
- Comparing Predictive Models
Introduction to Text Mining
- Introduction to Text Mining
- Processing Text Data
- Curating the Term List
- Visualizing and Exploring Text Data
- Analyzing (Mining) Text Data