Title: | Streaming Events and their Early Classification |
---|---|
Description: | Implements event extraction and early classification of events in data streams in R. It has the functionality to generate 2-dimensional data streams with events belonging to 2 classes. These events can be extracted and features computed. The event features extracted from incomplete-events can be classified using a partial-observations-classifier (Kandanaarachchi et al. 2018) <doi:10.1371/journal.pone.0236331>. |
Authors: | Sevvandi Kandanaarachchi [aut, cre] |
Maintainer: | Sevvandi Kandanaarachchi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2024-11-02 02:56:38 UTC |
Source: | https://github.com/sevvandi/eventstream |
This function extracts events from a 2D or 3D data stream and computes a set of 30 features for 2D streams and 13 features for 3D streams, by using a moving window. 2D data streams with class labels can be generated by using the function gen_stream
. To get the class labels of the extracted events for the supervised setting, the event position is matched with the details
of the events, which is part of the output of the gen_stream
function.
extract_event_ftrs( stream, supervised = FALSE, details = NULL, win_size = 200, step_size = 20, thres = 0.95, folder = NULL, vis = FALSE, tt = 10, epsilon = 5, miniPts = 10, rolling = TRUE )
extract_event_ftrs( stream, supervised = FALSE, details = NULL, win_size = 200, step_size = 20, thres = 0.95, folder = NULL, vis = FALSE, tt = 10, epsilon = 5, miniPts = 10, rolling = TRUE )
stream |
A data stream. This can be the output of either the |
supervised |
If |
details |
Event details. This is also an output of the |
win_size |
The window length of the moving window model, default is set to |
step_size |
The window is moved by the |
thres |
The cut-off quantile. Default is set to |
folder |
If set to a local folder, this is where the jpegs of window data and extracted events are saved for a 2D data stream. |
vis |
If |
tt |
Related to event ages. For example if |
epsilon |
The |
miniPts |
The |
rolling |
This parameter is set to |
An Nx22x4
array is returned for 2D data streams and an Nx13x4
array for 3D data streams. Here N
is the total number of events extracted from all windows. The second dimension has m
features and the class label for the supervised
setting. The third dimension has 4
different event ages : tt, 2tt, 3tt, 4tt
.
For example, the element at [10,6,3]
has the 6th feature, of the 10th extracted event when the age of the event is 3tt
. The features for 2D streams are listed below. For 3D streams the features cluster_id, pixels, length, width, height, total_value, l2w_ratio, centroid_x, centroid_y, centroid_z, mean, std_dev
and sd_from_global_mean
are computed.
cluster_id |
An identification number for each event. |
pixels |
The number of pixels of each event. |
length |
The length of the event. |
width |
The width of the event. |
total_value |
The total value of the pixels. |
l2w_ratio |
Length to width ratio of event. |
centroid_x |
x coordinate of event centroid. |
centroid_y |
y coordinate of event centroid. |
mean |
Mean value of event pixels. |
std_dev |
Standard deviation of event pixels. |
avg_slope |
The slope of an |
quad_1 |
The linear coefficient of a second order polynomial fitted to event pixels using |
quad_2 |
The quadratic coefficient of a second order polynomial fitted to event pixels using |
2sd_from_mean |
The proportion of event pixels/cells that has values greater than 2 global standard deviations from the global mean of the window. |
3sd_from_mean |
The proportion of event pixels/cells that has values greater than 3 global standard deviations from the global mean of the window. |
4sd_from_mean |
The proportion of event pixels/cells that has values greater than 4 global standard deviations from the global mean of the window. |
5iqr_from_median |
A small portion of each window and its column medians and column IQRs are used to construct two smoothing splines: a median spline and an IQR spline. The value of the median smoothing spline at each event centroid is used as the local median for that event. Similarly, the value of the IQR smoothing spline at each event centroid is used as the local IQR for that event. This feature gives the proportion of event pixels/cells that has values greater than 5 local IQRs from the local median. |
6iqr_from_median |
The proportion of event pixels/cells that has values greater than 6 local IQRs from the local median computed using splines. |
7iqr_from_median |
The proportion of event pixels/cells that has values greater than 7 local IQRs from the local median computed using splines. |
8iqr_from_median |
The proportion of event pixels/cells that has values greater than 8 local IQRs from the local median computed using splines. |
iqr_from_median |
Let us denote the 75th percentile of the event pixels value by |
sd_from_mean |
Let us denote the 80th percentile of the event pixels value by |
# 2D data stream example out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) features <- extract_event_ftrs(zz, supervised=TRUE, details = out$details) features # 3D data stream example set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # getting events ftrs <- extract_event_ftrs(arr, supervised=FALSE, win_size=10, step_size = 2, tt=2, thres=0.985) ftrs
# 2D data stream example out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) features <- extract_event_ftrs(zz, supervised=TRUE, details = out$details) features # 3D data stream example set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # getting events ftrs <- extract_event_ftrs(arr, supervised=FALSE, win_size=10, step_size = 2, tt=2, thres=0.985) ftrs
This function generates a two-dimensional data stream containing events of two classes. The data stream can be saved as separate files with images by specifying the argument folder
.
gen_stream( n, folder = NULL, sd = 1, vis = FALSE, muAB = c(4, 3), sdAB = c(2, 3) )
gen_stream( n, folder = NULL, sd = 1, vis = FALSE, muAB = c(4, 3), sdAB = c(2, 3) )
n |
The number of files to generate. Each file consists of a 350x250 data matrix. |
folder |
If this is set to a local folder, the data matrices are saved in |
sd |
This specifies the seed. |
vis |
If |
muAB |
The starting event pixels of class A and B events are normally distributed with mean values specified by |
sdAB |
The starting standard deviations of class A and B events. Default set to |
There are events of two classes in the data matrices : A and B. Events of class A have only one shape while events of class B have three different shapes, including class A's shape. This was motivated from a real world example. The details of events of each class are given below.
Feature | class A | class B |
Starting cell/pixel values | N(4,2) |
N(3,3) |
Ending cell/pixel values | N(8,2) |
N(5,3) |
Maximum age of event - shape 1 | U(20,30) |
U(20,30) |
Maximum age of event - shape 2 | NA |
U(100,150) |
Maximum age of event - shape 3 | NA |
U(100,150) |
Maximum width of event - shape 1 | U(20,26) |
U(20,26) |
Maximum width of event - shape 2 | NA |
U(30,38) |
Maximum width of event - shape 3 | NA |
U(50,58) |
A list with following components:
data |
The data stream returned as a data frame. |
details |
A data frame containing the details of the events: their positions, class labels, etc.. . This is needed for identifying class labels of events during event extraction. |
eventlabs |
A matrix with 1 at event locations and 0 elsewhere. |
out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) image(1:nrow(zz), 1:ncol(zz),zz, xlab="Time", ylab="Location")
out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) image(1:nrow(zz), 1:ncol(zz),zz, xlab="Time", ylab="Location")
This function extracts events from a two-dimensional (1 spatial x 1 time) data stream.
get_clusters( dat, filename = NULL, thres = 0.95, vis = FALSE, epsilon = 5, miniPts = 10, rolling = TRUE )
get_clusters( dat, filename = NULL, thres = 0.95, vis = FALSE, epsilon = 5, miniPts = 10, rolling = TRUE )
dat |
The data matrix |
filename |
If set, the figure of extracted events are saved in this name. The |
thres |
The cut-off quantile. Default is set to |
vis |
If |
epsilon |
The |
miniPts |
The |
rolling |
This parameter is set to |
A list with following components
clusters |
The cluster assignment according to DBSCAN output. |
data |
The data of this cluster assignment. |
out <- gen_stream(2, sd=15) zz <- as.matrix(out$data) clst <- get_clusters(zz, vis=TRUE)
out <- gen_stream(2, sd=15) zz <- as.matrix(out$data) clst <- get_clusters(zz, vis=TRUE)
This function extracts events from a three-dimensional (2D spatial x 1D time) data stream.
get_clusters_3d(dat, thres = 0.95, epsilon = 3, miniPts = 15)
get_clusters_3d(dat, thres = 0.95, epsilon = 3, miniPts = 15)
dat |
The data matrix |
thres |
The cut-off quantile. Default is set to |
epsilon |
The |
miniPts |
The |
A list with following components
clusters |
The cluster assignment according to DBSCAN output. |
data |
The data of this cluster assignment. |
set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # getting events out <- get_clusters_3d(arr, thres=0.985) # plots oldpar <- par(mfrow=c(1,3)) plot(out$data[,c(1,2)], xlab="x", ylab="y", col=as.factor(out$clusters$cluster)) plot(out$data[,c(1,3)], xlab="x", ylab="z",col=as.factor(out$clusters$cluster)) plot(out$data[,c(2,3)], xlab="y", ylab="z",col=as.factor(out$clusters$cluster)) par(oldpar)
set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # getting events out <- get_clusters_3d(arr, thres=0.985) # plots oldpar <- par(mfrow=c(1,3)) plot(out$data[,c(1,2)], xlab="x", ylab="y", col=as.factor(out$clusters$cluster)) plot(out$data[,c(1,3)], xlab="x", ylab="z",col=as.factor(out$clusters$cluster)) plot(out$data[,c(2,3)], xlab="y", ylab="z",col=as.factor(out$clusters$cluster)) par(oldpar)
This function computes event features of 2D events.
get_features( dat.xyz, res.cluster, normal.stats.splines, win_size = 200, tt = 10 )
get_features( dat.xyz, res.cluster, normal.stats.splines, win_size = 200, tt = 10 )
dat.xyz |
The data in a cluster friendly format. The first two columns have |
res.cluster |
Cluster details from |
normal.stats.splines |
The background statistics, output from |
win_size |
The window length of the moving window model, default is set to |
tt |
Related to event ages. For example if |
An Nx22x4
array is returned for 2D data streams and an Nx13x4
array for 3D data streams. Here N
is the total number of events extracted from all windows. The second dimension has m
features and the class label for the supervised
setting. The third dimension has 4
different event ages : tt, 2tt, 3tt, 4tt
.
For example, the element at [10,6,3]
has the 6th feature, of the 10th extracted event when the age of the event is 3tt
. The features for 2D streams are listed below. For 3D streams the features cluster_id, pixels, length, width, height, total_value, l2w_ratio, centroid_x, centroid_y, centroid_z, mean, std_dev
and sd_from_global_mean
are computed.
cluster_id |
An identification number for each event. |
pixels |
The number of pixels of each event. |
length |
The length of the event. |
width |
The width of the event. |
total_value |
The total value of the pixels. |
l2w_ratio |
Length to width ratio of event. |
centroid_x |
x coordinate of event centroid. |
centroid_y |
y coordinate of event centroid. |
mean |
Mean value of event pixels. |
std_dev |
Standard deviation of event pixels. |
avg_slope |
The slope of an |
quad_1 |
The linear coefficient of a second order polynomial fitted to event pixels using |
quad_2 |
The quadratic coefficient of a second order polynomial fitted to event pixels using |
2sd_from_mean |
The proportion of event pixels/cells that has values greater than 2 global standard deviations from the global mean of the window. |
3sd_from_mean |
The proportion of event pixels/cells that has values greater than 3 global standard deviations from the global mean of the window. |
4sd_from_mean |
The proportion of event pixels/cells that has values greater than 4 global standard deviations from the global mean of the window. |
5iqr_from_median |
A small portion of each window and its column medians and column IQRs are used to construct two smoothing splines: a median spline and an IQR spline. The value of the median smoothing spline at each event centroid is used as the local median for that event. Similarly, the value of the IQR smoothing spline at each event centroid is used as the local IQR for that event. This feature gives the proportion of event pixels/cells that has values greater than 5 local IQRs from the local median. |
6iqr_from_median |
The proportion of event pixels/cells that has values greater than 6 local IQRs from the local median computed using splines. |
7iqr_from_median |
The proportion of event pixels/cells that has values greater than 7 local IQRs from the local median computed using splines. |
8iqr_from_median |
The proportion of event pixels/cells that has values greater than 8 local IQRs from the local median computed using splines. |
iqr_from_median |
Let us denote the 75th percentile of the event pixels value by |
sd_from_mean |
Let us denote the 80th percentile of the event pixels value by |
out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) clst <- get_clusters(zz, vis=TRUE) sstats <- spline_stats(zz[1:100,]) ftrs <- get_features(clst$data, clst$clusters$cluster, sstats)
out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) clst <- get_clusters(zz, vis=TRUE) sstats <- spline_stats(zz[1:100,]) ftrs <- get_features(clst$data, clst$clusters$cluster, sstats)
This function computes event features of 3D events.
get_features_3d(dat.xyz, res.cluster, normal.stats, win_size, tt)
get_features_3d(dat.xyz, res.cluster, normal.stats, win_size, tt)
dat.xyz |
The data in a cluster friendly format. The first three columns have |
res.cluster |
Cluster details from |
normal.stats |
The background statistics, output from |
win_size |
The window length of the moving window model. |
tt |
Related to event ages. For example if |
An Nx22x4
array is returned. Here N
is the total number of events extracted in all windows. The second dimension has 30
features and the class label for the supervised
setting. The third dimension has 4
different event ages : tt, 2tt, 3tt, 4tt
.
For example, the element at [10,6,3]
has the 6th feature, of the 10th extracted event when the age of the event is 3tt
. The features are listed below:
cluster_id |
An identification number for each event. |
pixels |
The number of pixels of each event. |
length |
The length of the event. |
width |
The width of the event. |
total_value |
The total value of the pixels. |
l2w_ratio |
Length to width ratio of event. |
centroid_x |
x coordinate of event centroid. |
centroid_y |
y coordinate of event centroid. |
centroid_z |
z coordinate of event centroid. |
mean |
Mean value of event pixels. |
std_dev |
Standard deviation of event pixels. |
slope |
Slope of a linear model fitted to the event. |
quad1 |
First coefficient of a quadratic model fitted to the event. |
quad2 |
Second coefficient of a quadratic model fitted to the event. |
sd_from_mean |
Let us denote the 80th percentile of the event pixels value by |
set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # getting events out <- get_clusters_3d(arr, thres=0.985) mean_sd <- stats_3d(arr[1:20,1:6,1:8]) ftrs <- get_features_3d(out$data, out$cluster$cluster, mean_sd, win_size=40, tt=2 )
set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # getting events out <- get_clusters_3d(arr, thres=0.985) mean_sd <- stats_3d(arr[1:20,1:6,1:8]) ftrs <- get_features_3d(out$data, out$cluster$cluster, mean_sd, win_size=40, tt=2 )
This dataset contains smoothed NO2 data from March to September 2010
NO2_2010
NO2_2010
An array of 4 x 179 x 360 dimensions.
Each NO2_2010[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2010[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2011
NO2_2011
NO2_2011
An array of 4 x 179 x 360 dimensions.
Each NO2_2011[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2011[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2012
NO2_2012
NO2_2012
An array of 4 x 179 x 360 dimensions.
Each NO2_2012[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2012[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2013
NO2_2013
NO2_2013
An array of 4 x 179 x 360 dimensions.
Each NO2_2013[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2013[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2014
NO2_2014
NO2_2014
An array of 4 x 179 x 360 dimensions.
Each NO2_2014[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2014[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2015
NO2_2015
NO2_2015
An array of 4 x 179 x 360 dimensions.
Each NO2_2015[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2015[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2016
NO2_2016
NO2_2016
An array of 4 x 179 x 360 dimensions.
Each NO2_2016[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2016[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2017
NO2_2017
NO2_2017
An array of 4 x 179 x 360 dimensions.
Each NO2_2017[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2017[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2018
NO2_2018
NO2_2018
An array of 4 x 179 x 360 dimensions.
Each NO2_2018[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2018[ ,x, y]
contains NO2 concentration for a given position in the world map.
This dataset contains smoothed NO2 data from March to September 2019
NO2_2019
NO2_2019
An array of 4 x 179 x 360 dimensions.
Each NO2_2019[t, , ]
contains NO2 data for a given month with t=1
corresponding to March and t=7
corresponding to September
Each NO2_2019[ ,x, y]
contains NO2 concentration for a given position in the world map.
Predicts using the incomplete-event-classifier.
predict_tdl(model, t, X, probs = FALSE)
predict_tdl(model, t, X, probs = FALSE)
model |
The fitted incomplete-event-classifier. |
t |
The age of events. |
X |
The event features. |
probs |
If |
The predicted values using the model object. If prob = TRUE
, then the probabilities are returned.
# Generate data N <- 1000 t <- sort(rep(1:10, N)) set.seed(821) for(kk in 1:10){ if(kk==1){ X <- seq(-11,9,length=N) }else{ temp <- seq((-11-kk+1),(9-kk+1),length=N) X <- c(X,temp) } } real.a.0 <- seq(2,20, by=2) real.a.1 <- rep(2,10) Zstar <-real.a.0[t] + real.a.1[t]*X + rlogis(N, scale=0.5) Z <- 1*(Zstar > 0) # Plot data for t=1 and t=8 oldpar <- par(mfrow=c(1,2)) plot(X[t==1],Z[t==1], main="t=1 data") abline(v=-1, lty=2) plot(X[t==8],Z[t==8],main="t=8 data") abline(v=-8, lty=2) par(oldpar) # Fit model train_inds <- c() for(i in 0:9){train_inds <- c(train_inds , i*N + 2*(1:499))} model_td <- td_logistic(t[train_inds],X[train_inds],Z[train_inds]) # Prediction preds <- predict_tdl(model_td,t[-train_inds],X[-train_inds] ) sum(preds==Z[-train_inds])/length(preds)
# Generate data N <- 1000 t <- sort(rep(1:10, N)) set.seed(821) for(kk in 1:10){ if(kk==1){ X <- seq(-11,9,length=N) }else{ temp <- seq((-11-kk+1),(9-kk+1),length=N) X <- c(X,temp) } } real.a.0 <- seq(2,20, by=2) real.a.1 <- rep(2,10) Zstar <-real.a.0[t] + real.a.1[t]*X + rlogis(N, scale=0.5) Z <- 1*(Zstar > 0) # Plot data for t=1 and t=8 oldpar <- par(mfrow=c(1,2)) plot(X[t==1],Z[t==1], main="t=1 data") abline(v=-1, lty=2) plot(X[t==8],Z[t==8],main="t=8 data") abline(v=-8, lty=2) par(oldpar) # Fit model train_inds <- c() for(i in 0:9){train_inds <- c(train_inds , i*N + 2*(1:499))} model_td <- td_logistic(t[train_inds],X[train_inds],Z[train_inds]) # Prediction preds <- predict_tdl(model_td,t[-train_inds],X[-train_inds] ) sum(preds==Z[-train_inds])/length(preds)
This dataset contains the location of class A events in the real_stream dataset. This can be used for classifying the events in real_stream.
real_details
real_details
A data frame with 4 rows and 3 variables:
Orignal file name
class of event, A or B
y
coordinate of file, relating to the location of event
x
coordinate of file, relating to the start time of event
x
coordinate of real_stream
, relating to the start time of event
y
coordinate of real_stream
, relating to the location of event
A dataset containing fibre optic cable signals. A pulse is periodically sent through the cable and this results in a data matrix where each horizontal row (real_stream[x, ]
) gives the strength of the signal at a fixed location x
, and each vertical column (real_stream[ ,t]
) gives the strength of the signal along the cable at a fixed time t
.
real_stream
real_stream
A matrix with 587 rows and 379 columns.
This function computes 4 splines, from median, iqr, mean and standard deviation values.
spline_stats(dat)
spline_stats(dat)
dat |
The data matrix |
A list with following components
med.spline |
The spline computed from the median values. |
iqr.spline |
The spline computed from IQR values. |
mean.spline |
The spline computed from mean values. |
sd.spline |
The spline computed from standard deviation values. |
mean.dat |
The mean of the data matrix. |
sd.dat |
The standard deviation of the data matrix. |
out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) sstats <- spline_stats(zz[1:100,]) oldpar <- par(mfrow=c(2,1)) image(1:ncol(zz), 1:nrow(zz),t(zz), xlab="Location", ylab="Time" ) plot(sstats[[1]], type="l") par(oldpar)
out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) sstats <- spline_stats(zz[1:100,]) oldpar <- par(mfrow=c(2,1)) image(1:ncol(zz), 1:nrow(zz),t(zz), xlab="Location", ylab="Time" ) plot(sstats[[1]], type="l") par(oldpar)
This function is used for 3D event extraction and feature computation.
stats_3d(dat)
stats_3d(dat)
dat |
The data array |
A list with following components
mean.dat |
The mean of the data array |
sd.dat |
The standard deviation of the data array |
set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 mean_sd <- stats_3d(arr[1:20,1:6,1:8]) mean_sd
set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 mean_sd <- stats_3d(arr[1:20,1:6,1:8]) mean_sd
Generates a two dimensional data stream from data files in a given folder.
stream_from_files(folder)
stream_from_files(folder)
folder |
The folder with the data files. |
## Not run: folder <- tempdir() out <- gen_stream(2, folder = folder) stream <- stream_from_files(paste(folder, "/data", sep="")) dim(stream) unlink(folder, recursive = TRUE) ## End(Not run)
## Not run: folder <- tempdir() out <- gen_stream(2, folder = folder) stream <- stream_from_files(paste(folder, "/data", sep="")) dim(stream) unlink(folder, recursive = TRUE) ## End(Not run)
This function does classification of incomplete events. The events grow with time. The input vector t
denotes the age of the event. The classifier takes the growing event features, X
and combines with a L2
penalty for smoothness.
td_logistic( t, X, Y, lambda = 1, scale = TRUE, num_bins = 4, quad = TRUE, interact = FALSE, logg = TRUE )
td_logistic( t, X, Y, lambda = 1, scale = TRUE, num_bins = 4, quad = TRUE, interact = FALSE, logg = TRUE )
t |
The age of events. |
X |
The event features. |
Y |
The class labels. |
lambda |
The penalty coefficient. Default is 1. |
scale |
If |
num_bins |
The number of time slots to use. |
quad |
If |
interact |
if |
logg |
If |
A list with following components:
par |
The parameters of the incomplete-event-classifier, after its fitted. |
convergence |
The difference between the final two output values. |
scale |
If |
t |
The age of events |
quad |
The value of |
interact |
The value of |
predict_tdl
for prediction.
# Generate data N <- 1000 t <- sort(rep(1:10, N)) set.seed(821) for(kk in 1:10){ if(kk==1){ X <- seq(-11,9,length=N) }else{ temp <- seq((-11-kk+1),(9-kk+1),length=N) X <- c(X,temp) } } real.a.0 <- seq(2,20, by=2) real.a.1 <- rep(2,10) Zstar <-real.a.0[t] + real.a.1[t]*X + rlogis(N, scale=0.5) Z <- 1*(Zstar > 0) # Plot data for t=1 and t=8 oldpar <- par(mfrow=c(1,2)) plot(X[t==1],Z[t==1], main="t=1 data") abline(v=-1, lty=2) plot(X[t==8],Z[t==8],main="t=8 data") abline(v=-8, lty=2) par(oldpar) # Fit model model_td <- td_logistic(t,X,Z)
# Generate data N <- 1000 t <- sort(rep(1:10, N)) set.seed(821) for(kk in 1:10){ if(kk==1){ X <- seq(-11,9,length=N) }else{ temp <- seq((-11-kk+1),(9-kk+1),length=N) X <- c(X,temp) } } real.a.0 <- seq(2,20, by=2) real.a.1 <- rep(2,10) Zstar <-real.a.0[t] + real.a.1[t]*X + rlogis(N, scale=0.5) Z <- 1*(Zstar > 0) # Plot data for t=1 and t=8 oldpar <- par(mfrow=c(1,2)) plot(X[t==1],Z[t==1], main="t=1 data") abline(v=-1, lty=2) plot(X[t==8],Z[t==8],main="t=8 data") abline(v=-8, lty=2) par(oldpar) # Fit model model_td <- td_logistic(t,X,Z)
This function finds best parameters for 2D event detection using labeled data.
tune_cpdbee_2D( x, cl, alpha_min = 0.95, alpha_max = 0.98, alpha_step = 0.01, epsilon_min = 2, epsilon_max = 12, epsilon_step = 2, minPts_min = 4, minPts_max = 12, minPts_step = 2 )
tune_cpdbee_2D( x, cl, alpha_min = 0.95, alpha_max = 0.98, alpha_step = 0.01, epsilon_min = 2, epsilon_max = 12, epsilon_step = 2, minPts_min = 4, minPts_max = 12, minPts_step = 2 )
x |
The data in an mxn matrix or dataframe. |
cl |
The actual locations of the events. |
alpha_min |
The minimum threshold value. |
alpha_max |
The maximum threshold value. |
alpha_step |
The incremental step size for alpha. |
epsilon_min |
The minimum epsilon value for DBSCAN clustering. |
epsilon_max |
The maximum epsilon value for DBSCAN clustering. |
epsilon_step |
The incremental step size for epsilon for DBSCAN clustering. |
minPts_min |
The minimum minPts value for for DBSCAN clustering. |
minPts_max |
The maximum minPts value for for DBSCAN clustering. |
minPts_step |
The incremental step size for minPts for DBSCAN clustering. |
A list with following components
best |
The best threshold, epsilon and MinPts for 2D event detection and the associated Jaccard Index. |
all |
All parameter values used and the associated Jaccard Index values. |
## Not run: out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) clst <- get_clusters(zz, filename = NULL, thres = 0.95, vis = TRUE, epsilon = 5, miniPts = 10, rolling = FALSE) clst_loc <- clst$data[ ,1:2] out <- tune_cpdbee_2D(zz, clst_loc) out$best ## End(Not run)
## Not run: out <- gen_stream(1, sd=15) zz <- as.matrix(out$data) clst <- get_clusters(zz, filename = NULL, thres = 0.95, vis = TRUE, epsilon = 5, miniPts = 10, rolling = FALSE) clst_loc <- clst$data[ ,1:2] out <- tune_cpdbee_2D(zz, clst_loc) out$best ## End(Not run)
This function finds best parameters for 3D event detection using labeled data.
tune_cpdbee_3D( x, cl, alpha_min = 0.95, alpha_max = 0.98, alpha_step = 0.01, epsilon_min = 2, epsilon_max = 12, epsilon_step = 2, minPts_min = 8, minPts_max = 16, minPts_step = 2 )
tune_cpdbee_3D( x, cl, alpha_min = 0.95, alpha_max = 0.98, alpha_step = 0.01, epsilon_min = 2, epsilon_max = 12, epsilon_step = 2, minPts_min = 8, minPts_max = 16, minPts_step = 2 )
x |
The data in an mxn matrix or dataframe. |
cl |
The actual locations of the events. |
alpha_min |
The minimum threshold value. |
alpha_max |
The maximum threshold value. |
alpha_step |
The incremental step size for alpha. |
epsilon_min |
The minimum epsilon value for DBSCAN clustering. |
epsilon_max |
The maximum epsilon value for DBSCAN clustering. |
epsilon_step |
The incremental step size for epsilon for DBSCAN clustering. |
minPts_min |
The minimum minPts value for for DBSCAN clustering. |
minPts_max |
The maximum minPts value for for DBSCAN clustering. |
minPts_step |
The incremental step size for minPts for DBSCAN clustering. |
A list with following components
best |
The best threshold, epsilon and MinPts for 2D event detection and the associated Jaccard Index. |
all |
All parameter values used and the associated Jaccard Index values. |
## Not run: set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # Getting events out <- get_clusters_3d(arr, thres=0.985) out <- tune_cpdbee_3D(arr, out$data[ ,1:3]) out$best ## End(Not run)
## Not run: set.seed(1) arr <- array(rnorm(12000),dim=c(40,25,30)) arr[25:33,12:20, 20:23] <- 10 # Getting events out <- get_clusters_3d(arr, thres=0.985) out <- tune_cpdbee_3D(arr, out$data[ ,1:3]) out$best ## End(Not run)