Finding anomalies (also known as outliers) is a very critical and important step in data mining. What is an anomaly? Anomalies are the set of data points that are considerably different than the remainder of the data.
Anomaly Detection is very useful in applications like credit card fraud transactions. In such data base, there are few data points (transactions) which need to be separated out.
In model based anomaly detection techniques, a model is built for the given data. For unsupervised models, anomalies will be these points which distort or don't fit well in the model. For supervised models, anomalies will be those data points which belong to some rare class.
Commonly used anomaly detection techniques are:
1] Proximity-based: Points far away from other data points
2] Density-based: Very low density
3] Pattern Matching: Finding atypical patterns
4] Probabilistic Approach: Points with a low probability with respect to a probability distribution
model of the data
5] Statistical-based Likelihood Approach
6] Distance-based Approach
7] Clustering-Based Approaches: A data point is an anomaly if it does not strongly belong to any cluster
References:
[1] Tan, P.-N., Steinbach, M., and Kumar, V. 2005. Introduction to Data Mining. Addison-Wesley.
No comments:
Post a Comment