Title: | Class Cover Catch Digraph Classification |
---|---|
Description: | Fit Class Cover Catch Digraph Classification models that can be used in machine learning. Pure and proper and random walk approaches are available. Methods are explained in Priebe et al. (2001) <doi:10.1016/S0167-7152(01)00129-8>, Priebe et al. (2003) <doi:10.1007/s00357-003-0003-7>, and Manukyan and Ceyhan (2016) <doi:10.48550/arXiv.1904.04564>. |
Authors: | Fatih Saglam [aut, cre] |
Maintainer: | Fatih Saglam <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.3 |
Built: | 2025-03-08 05:16:42 UTC |
Source: | https://github.com/fatihsaglam/rcccd |
pcccd_classifier
fits a Pure and Proper Class Cover Catch
Digraph (PCCCD) classification model.
pcccd_classifier(x, y, proportion = 1)
pcccd_classifier(x, y, proportion = 1)
x |
feature matrix or dataframe. |
y |
class factor variable. |
proportion |
proportion of covered samples. A real number between |
Multiclass framework for PCCCD. PCCCD determines target class dominant points
set and their circular cover area by determining balls
with radii r using minimum amount of
dominant point which satisfies
(pure) and
(proper).
This guarantees that balls of target class never covers any non-target samples (pure) and balls cover all target samples (proper).
For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).
Note: Much faster than cccd
package.
an object of "cccd_classifier" which includes:
i_dominant_list |
dominant sample indexes. |
x_dominant_list |
dominant samples from feature matrix, x |
radii_dominant_list |
Radiuses of the circle for dominant samples |
class_names |
class names |
k_class |
number of classes |
proportions |
proportions each class covered |
Fatih Saglam, [email protected]
Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8
Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7
Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf
n <- 1000 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) m_pcccd <- pcccd_classifier(x = x, y = y) # dataset plot(x, col = y, asp = 1) # dominant samples of first class x_center <- m_pcccd$x_dominant_list[[1]] # radii of balls for first class radii <- m_pcccd$radii_dominant_list[[1]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_pcccd <- pcccd_classifier(x = x_train, y = y_train) pred <- predict(object = m_pcccd, newdata = x_test) # confusion matrix table(y_test, pred) # test accuracy sum(y_test == pred)/nrow(x_test)
n <- 1000 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) m_pcccd <- pcccd_classifier(x = x, y = y) # dataset plot(x, col = y, asp = 1) # dominant samples of first class x_center <- m_pcccd$x_dominant_list[[1]] # radii of balls for first class radii <- m_pcccd$radii_dominant_list[[1]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_pcccd <- pcccd_classifier(x = x_train, y = y_train) pred <- predict(object = m_pcccd, newdata = x_test) # confusion matrix table(y_test, pred) # test accuracy sum(y_test == pred)/nrow(x_test)
pcccd_ensemble_classifier
fits an Ensemble Pure and Proper Class Cover Catch
Digraph (PCCCD) classification model.
pcccd_ensemble_classifier( x, y, n_model = 30, n_var = ncol(x), replace = FALSE, prop_sample = ifelse(replace, 1, 0.67), min_proportion = 0.7, max_proportion = 1, verbose = TRUE )
pcccd_ensemble_classifier( x, y, n_model = 30, n_var = ncol(x), replace = FALSE, prop_sample = ifelse(replace, 1, 0.67), min_proportion = 0.7, max_proportion = 1, verbose = TRUE )
x |
feature matrix or dataframe. |
y |
class factor variable. |
n_model |
an integer. Number of weak classifiers. |
n_var |
an integer. number of variables in weak classifiers. |
replace |
a bool. Should replacement be used in data sampling |
prop_sample |
a value between 0 and 1. Proportion the number of resampled samples to the number of samples in x. |
min_proportion |
Minimum proportion of cover proportion in weak classifiers. |
max_proportion |
Maximum proportion of cover proportion in weak classifiers. |
Bagging framework for PCCCD.
an object of "cccd_classifier" which includes:
i_dominant_list |
dominant sample indexes. |
x_dominant_list |
dominant samples from feature matrix, x |
radii_dominant_list |
Radiuses of the circle for dominant samples |
class_names |
class names |
k_class |
number of classes |
proportions |
proportions each class covered |
Fatih Saglam, [email protected]
n <- 1000
n <- 1000
predict.pcccd_classifier
makes prediction using pcccd_classifier
object.
## S3 method for class 'pcccd_classifier' predict(object, newdata, type = "pred", ...)
## S3 method for class 'pcccd_classifier' predict(object, newdata, type = "pred", ...)
object |
a |
newdata |
newdata as matrix or dataframe. |
type |
"pred" or "prob". Default is "pred". "pred" is class estimations,
"prob" is |
... |
not used. |
Estimations are based on nearest dominant neighbor in radius unit.
For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).
a vector of class predictions (if type is "pred") or a
matrix of class probabilities (if type is "prob").
Fatih Saglam, [email protected]
Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8
Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7
Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf
n <- 1000 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_pcccd <- pcccd_classifier(x = x_train, y = y_train) pred <- predict(object = m_pcccd, newdata = x_test) # confusion matrix table(y_test, pred) # test accuracy sum(y_test == pred)/nrow(x_test)
n <- 1000 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_pcccd <- pcccd_classifier(x = x_train, y = y_train) pred <- predict(object = m_pcccd, newdata = x_test) # confusion matrix table(y_test, pred) # test accuracy sum(y_test == pred)/nrow(x_test)
predict.pcccd_ensemble_classifier
makes prediction using
pcccd_ensemble_classifier
object.
## S3 method for class 'pcccd_ensemble_classifier' predict(object, newdata, type = "pred", ...)
## S3 method for class 'pcccd_ensemble_classifier' predict(object, newdata, type = "pred", ...)
object |
a |
newdata |
newdata as matrix or dataframe. |
type |
"pred" or "prob". Default is "pred". "pred" is class estimations,
"prob" is |
... |
not used. |
a vector of class predictions (if type is "pred") or a
matrix of class probabilities (if type is "prob").
Fatih Saglam, [email protected]
n <- 1000
n <- 1000
predict.rwcccd_classifier
makes prediction using
rwcccd_classifier
object.
## S3 method for class 'rwcccd_classifier' predict(object, newdata, type = "pred", e = 0, ...)
## S3 method for class 'rwcccd_classifier' predict(object, newdata, type = "pred", e = 0, ...)
object |
a |
newdata |
newdata as matrix or dataframe. |
type |
"pred" or "prob". Default is "pred". "pred" is class estimations,
"prob" is |
e |
0 or 1. Default is 0. Penalty based on |
... |
not used. |
Estimations are based on nearest dominant neighbor in radius unit.
e
argument is used to penalize estimations based on scores in
rwcccd_classifier
object.
For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).
a vector of class predictions (if type is "pred") or a
matrix of class probabilities (if type is "prob").
Fatih Saglam, [email protected]
Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8
Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7
Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf
n <- 1000 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train) pred <- predict(object = m_rwcccd, newdata = x_test, e = 0) # confusion matrix table(y_test, pred) # test accuracy sum(y_test == pred)/nrow(x_test)
n <- 1000 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train) pred <- predict(object = m_rwcccd, newdata = x_test, e = 0) # confusion matrix table(y_test, pred) # test accuracy sum(y_test == pred)/nrow(x_test)
rwcccd_classifier
and rwcccd_classifier_2
fits a
Random Walk Class Cover Catch Digraph (RWCCCD) classification model.
rwcccd_classifier
uses C++ for speed and rwcccd_classifier_2
uses R language to determine balls.
rwcccd_classifier(x, y, method = "default", m = 1, proportion = 0.99) rwcccd_classifier_2( x, y, method = "default", m = 1, proportion = 0.99, partial_ordering = FALSE )
rwcccd_classifier(x, y, method = "default", m = 1, proportion = 0.99) rwcccd_classifier_2( x, y, method = "default", m = 1, proportion = 0.99, partial_ordering = FALSE )
x |
feature matrix or dataframe. |
y |
class factor variable. |
method |
"default" or "balanced". |
m |
penalization parameter. Takes value in |
proportion |
proportion of covered samples. A real number between |
partial_ordering |
|
Random Walk Class Cover Catch Digraphs (RWCCD) are determined by calculating
score for each class as target class as
Here, is radius and determined by maximum
calculated for each target sample.
is
and is
removes penalty.
for default and
for balanced method.
is the number of uncovered samples in the current iteration and
is
.
This method is more robust to noise compared to PCCCD However, balls covers
classes improperly and can be selected.
For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).
a rwcccd_classifier object
i_dominant_list |
dominant sample indexes. |
x_dominant_list |
dominant samples from feature matrix, x |
radii_dominant_list |
Radiuses of the circle for dominant samples |
class_names |
class names |
k_class |
number of classes |
proportions |
proportions each class covered |
Fatih Saglam, [email protected]
Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8
Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7
Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf
n <- 500 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) # dataset m_rwcccd_1 <- rwcccd_classifier(x = x, y = y, method = "default", m = 1) plot(x, col = y, asp = 1, main = "default") # dominant samples of second class x_center <- m_rwcccd_1$x_dominant_list[[2]] # radii of balls for second class radii <- m_rwcccd_1$radii_dominant_list[[2]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # dataset m_rwcccd_2 <- rwcccd_classifier_2(x = x, y = y, method = "default", m = 1, partial_ordering = TRUE) plot(x, col = y, asp = 1, main = "default, prartial_ordering = TRUE") # dominant samples of second class x_center <- m_rwcccd_2$x_dominant_list[[2]] # radii of balls for second class radii <- m_rwcccd_2$radii_dominant_list[[2]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # dataset m_rwcccd_3 <- rwcccd_classifier(x = x, y = y, method = "balanced", m = 1, proportion = 0.5) plot(x, col = y, asp = 1, main = "balanced, proportion = 0.5") # dominant samples of second class x_center <- m_rwcccd_3$x_dominant_list[[2]] # radii of balls for second class radii <- m_rwcccd_3$radii_dominant_list[[2]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train, method = "balanced") pred <- predict(object = m_rwcccd, newdata = x_test) # confusion matrix table(y_test, pred) # accuracy sum(y_test == pred)/nrow(x_test)
n <- 500 x1 <- runif(n, 1, 10) x2 <- runif(n, 1, 10) x <- cbind(x1, x2) y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B")) # dataset m_rwcccd_1 <- rwcccd_classifier(x = x, y = y, method = "default", m = 1) plot(x, col = y, asp = 1, main = "default") # dominant samples of second class x_center <- m_rwcccd_1$x_dominant_list[[2]] # radii of balls for second class radii <- m_rwcccd_1$radii_dominant_list[[2]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # dataset m_rwcccd_2 <- rwcccd_classifier_2(x = x, y = y, method = "default", m = 1, partial_ordering = TRUE) plot(x, col = y, asp = 1, main = "default, prartial_ordering = TRUE") # dominant samples of second class x_center <- m_rwcccd_2$x_dominant_list[[2]] # radii of balls for second class radii <- m_rwcccd_2$radii_dominant_list[[2]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # dataset m_rwcccd_3 <- rwcccd_classifier(x = x, y = y, method = "balanced", m = 1, proportion = 0.5) plot(x, col = y, asp = 1, main = "balanced, proportion = 0.5") # dominant samples of second class x_center <- m_rwcccd_3$x_dominant_list[[2]] # radii of balls for second class radii <- m_rwcccd_3$radii_dominant_list[[2]] # balls for (i in 1:nrow(x_center)) { xx <- x_center[i, 1] yy <- x_center[i, 2] r <- radii[i] theta <- seq(0, 2*pi, length.out = 100) xx <- xx + r*cos(theta) yy <- yy + r*sin(theta) lines(xx, yy, type = "l", col = "green") } # testing the performance i_train <- sample(1:n, round(n*0.8)) x_train <- x[i_train,] y_train <- y[i_train] x_test <- x[-i_train,] y_test <- y[-i_train] m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train, method = "balanced") pred <- predict(object = m_rwcccd, newdata = x_test) # confusion matrix table(y_test, pred) # accuracy sum(y_test == pred)/nrow(x_test)