Title: | A Collection of Outlier Ensemble Algorithms |
---|---|
Description: | Ensemble functions for outlier/anomaly detection. There is a new ensemble method proposed using Item Response Theory. Existing outlier ensemble methods from Schubert et al (2012) <doi:10.1137/1.9781611972825.90>, Chiang et al (2017) <doi:10.1016/j.jal.2016.12.002> and Aggarwal and Sathe (2015) <doi:10.1145/2830544.2830549> are also included. |
Authors: | Sevvandi Kandanaarachchi [aut, cre] |
Maintainer: | Sevvandi Kandanaarachchi <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1 |
Built: | 2025-01-13 03:49:15 UTC |
Source: | https://github.com/sevvandi/outlierensembles |
This function uses the mean as the ensemble score.
average_ensemble(X)
average_ensemble(X)
X |
The input data containing the outlier scores in a dataframe, matrix or tibble format. Rows contain observations and columns contain outlier detection methods. |
The ensemble scores.
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- average_ensemble(Y) ens
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- average_ensemble(Y) ens
This function computes an ensemble score using the greedy algorithm in the paper titled Evaluation of Outlier Rankings and Outlier Scores by Schubert et al (2012) <doi:10.1137/1.9781611972825.90>. The greedy ensemble is detailed in Section 4.3.
greedy_ensemble(X, kk = 5)
greedy_ensemble(X, kk = 5)
X |
The input data containing the outlier scores in a dataframe, matrix or tibble format. Rows contain observations and columns contain outlier detection methods. |
kk |
The number of estimated outliers. |
A list with the components:
scores |
The ensemble scores. |
methods |
The methods that are chosen for the ensemble. |
chosen |
The chosen subset of original anomaly scores. |
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- greedy_ensemble(Y, kk=5) ens$scores
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- greedy_ensemble(Y, kk=5) ens$scores
This function computes an ensemble score using inverse cluster weighted averaging in the paper titled A Study on Anomaly Detection Ensembles by Chiang et al (2017) <doi:10.1016/j.jal.2016.12.002>. The ensemble is detailed in Algorithm 2.
icwa_ensemble(X)
icwa_ensemble(X)
X |
The input data containing the outlier scores in a dataframe, matrix or tibble format. Rows contain observations and columns contain outlier detection methods. |
The ensemble scores.
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- icwa_ensemble(Y) ens
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- icwa_ensemble(Y) ens
This function computes an ensemble score using Item Response Theory (IRT). This was proposed as an ensemble method for anomaly/outlier detection in Kandanaarachchi (2021) <doi:10.13140/RG.2.2.18355.96801>.
irt_ensemble(X)
irt_ensemble(X)
X |
The input data containing the outlier scores in a dataframe, matrix or tibble format. Rows contain observations and columns contain outlier detection methods. |
For outlier detection, higher ensemble scores indicate higher levels of anomalousness. This ensemble uses IRT's latent trait to uncover the hidden ground truth, which is used as the ensemble score. It uses the R packages airt and EstCRM to fit the IRT models. It can also be used for other ensembling tasks.
A list with the components:
scores |
The ensemble scores. |
model |
The IRT model. |
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- irt_ensemble(Y) ens$scores
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- irt_ensemble(Y) ens$scores
This function computes an ensemble score using the maximum score for each observation as detailed in Aggarwal and Sathe (2015) <doi:10.1145/2830544.2830549>.
max_ensemble(X)
max_ensemble(X)
X |
The input data containing the outlier scores in a dataframe, matrix or tibble format. Rows contain observations and columns contain outlier detection methods. |
The ensemble scores.
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- max_ensemble(Y) ens
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- max_ensemble(Y) ens
This function computes an ensemble score by aggregating values above the mean as detailed in Aggarwal and Sathe (2015) <doi:10.1145/2830544.2830549>.
threshold_ensemble(X)
threshold_ensemble(X)
X |
The input data containing the outlier scores in a dataframe, matrix or tibble format. Rows contain observations and columns contain outlier detection methods. |
The ensemble scores.
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- threshold_ensemble(Y) ens
set.seed(123) X <- data.frame(x1 = rnorm(200), x2 = rnorm(200)) X[199, ] <- c(4, 4) X[200, ] <- c(-3, 5) y1 <- DDoutlier::KNN_AGG(X) y2 <- DDoutlier::LOF(X) y3 <- DDoutlier::COF(X) y4 <- DDoutlier::INFLO(X) y5 <- DDoutlier::KDEOS(X) y6 <- DDoutlier::LDF(X) y7 <- DDoutlier::LDOF(X) Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7) ens <- threshold_ensemble(Y) ens