February 27, 2015

CRISP-DM Approach for Data Mining

CRISP-DM approach is widely used in industry for Data Mining tasks. CRISP-DM stands for Cross Industry Standard Process for Data Mining. 

Advantages of CRISP-DM approach:
1] It is neutral with respect to tools being used.
2] It provides a uniform framework for guidelines and experience documentation.
3] It is flexible to account for differences in business/agency problems as well as different types of data sets.
4] Standardizing the process makes it easy for new users of the methodology.


The overall chart for this process is shown in following figure. The sequence of execution of these steps may not be strict and moving back and forth between different phases is always required. 

Figure 1: CRISP-DM Process Flow



There are six steps in this approach.

Business Understanding: 
  • Understanding the project objectives and requirements
  • Defining the Data Mining problem
  • Designing a preliminary plan to achieve the objectives
Data Understanding: 
  • Initial data collection and familiarization with data features
  • Identification of problems in data quality 
  • Finding initial interesting insights of the data  
Data Preparation: 
  • Feature selection
  • Constructing the final data set from initial raw data 
  • Transformation, formatting and cleaning the data
Modeling: 
  • Selection and application of modeling techniques 
  • Parameters calibration for the model
  • Assessing the model performance
Evaluation: 
  • Thorough evaluation of model
  • Evaluation of all important business objectives & issues
  • Reviewing the process
Deployment:
  • Result model deployment
  • Generating report
  • Implementing repeatable data scoring process 
  • Plan monitoring and maintenance

Sources:
[1] What is CRISP-DM methodology? 
[2] CRISP-DM

No comments: