1] Forward Feature Elimination
2] Backward Feature Elimination
3] Feature Elimination with Support Vector Machine
4] Principal Component Analysis
5] Independent Component Analysis
April 14, 2015
February 27, 2015
CRISP-DM Approach for Data Mining
CRISP-DM approach is widely used in industry for Data Mining tasks. CRISP-DM stands for Cross Industry Standard Process for Data Mining.
Advantages of CRISP-DM approach:
1] It is neutral with respect to tools being used.
2] It provides a uniform framework for guidelines and experience documentation.
3] It is flexible to account for differences in business/agency problems as well as different types of data sets.
4] Standardizing the process makes it easy for new users of the methodology.
There are six steps in this approach.
Business Understanding:
Sources:
[1] What is CRISP-DM methodology?
[2] CRISP-DM
Advantages of CRISP-DM approach:
1] It is neutral with respect to tools being used.
2] It provides a uniform framework for guidelines and experience documentation.
3] It is flexible to account for differences in business/agency problems as well as different types of data sets.
4] Standardizing the process makes it easy for new users of the methodology.
The overall chart for this process is shown in following figure. The sequence of execution of these steps may not be strict and moving back and forth between different phases is always required.
![]() |
Figure 1: CRISP-DM Process Flow |
There are six steps in this approach.
Business Understanding:
- Understanding the project objectives and requirements
- Defining the Data Mining problem
- Designing a preliminary plan to achieve the objectives
- Initial data collection and familiarization with data features
- Identification of problems in data quality
- Finding initial interesting insights of the data
- Feature selection
- Constructing the final data set from initial raw data
- Transformation, formatting and cleaning the data
- Selection and application of modeling techniques
- Parameters calibration for the model
- Assessing the model performance
- Thorough evaluation of model
- Evaluation of all important business objectives & issues
- Reviewing the process
- Result model deployment
- Generating report
- Implementing repeatable data scoring process
- Plan monitoring and maintenance
Sources:
[1] What is CRISP-DM methodology?
[2] CRISP-DM
February 01, 2015
Anomaly Detection in Data Mining
Finding anomalies (also known as outliers) is a very critical and important step in data mining. What is an anomaly? Anomalies are the set of data points that are considerably different than the remainder of the data.
Anomaly Detection is very useful in applications like credit card fraud transactions. In such data base, there are few data points (transactions) which need to be separated out.
In model based anomaly detection techniques, a model is built for the given data. For unsupervised models, anomalies will be these points which distort or don't fit well in the model. For supervised models, anomalies will be those data points which belong to some rare class.
Commonly used anomaly detection techniques are:
1] Proximity-based: Points far away from other data points
2] Density-based: Very low density
3] Pattern Matching: Finding atypical patterns
4] Probabilistic Approach: Points with a low probability with respect to a probability distribution
model of the data
5] Statistical-based Likelihood Approach
6] Distance-based Approach
7] Clustering-Based Approaches: A data point is an anomaly if it does not strongly belong to any cluster
References:
[1] Tan, P.-N., Steinbach, M., and Kumar, V. 2005. Introduction to Data Mining. Addison-Wesley.
Subscribe to:
Posts (Atom)