Package 'rcccd'

Title: Class Cover Catch Digraph Classification
Description: Fit Class Cover Catch Digraph Classification models that can be used in machine learning. Pure and proper and random walk approaches are available. Methods are explained in Priebe et al. (2001) <doi:10.1016/S0167-7152(01)00129-8>, Priebe et al. (2003) <doi:10.1007/s00357-003-0003-7>, and Manukyan and Ceyhan (2016) <doi:10.48550/arXiv.1904.04564>.
Authors: Fatih Saglam [aut, cre]
Maintainer: Fatih Saglam <[email protected]>
License: MIT + file LICENSE
Version: 0.3.3
Built: 2025-03-08 05:16:42 UTC
Source: https://github.com/fatihsaglam/rcccd

Help Index


Pure and Proper Class Cover Catch Digraph Classifier

Description

pcccd_classifier fits a Pure and Proper Class Cover Catch Digraph (PCCCD) classification model.

Usage

pcccd_classifier(x, y, proportion = 1)

Arguments

x

feature matrix or dataframe.

y

class factor variable.

proportion

proportion of covered samples. A real number between (0,1](0,1]. 1 by default. Smaller numbers results in less dominant samples.

Details

Multiclass framework for PCCCD. PCCCD determines target class dominant points set SS and their circular cover area by determining balls B(xtarget,ri)B(x^{\text{target}}, r_{i}) with radii r using minimum amount of dominant point which satisfies Xnon-targetiBi=X^{\text{non-target}}\cap \bigcup_{i} B_{i} = \varnothing (pure) and XtargetiBiX^{\text{target}}\subset \bigcup_{i} B_{i} (proper).

This guarantees that balls of target class never covers any non-target samples (pure) and balls cover all target samples (proper).

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).

Note: Much faster than cccd package.

Value

an object of "cccd_classifier" which includes:

i_dominant_list

dominant sample indexes.

x_dominant_list

dominant samples from feature matrix, x

radii_dominant_list

Radiuses of the circle for dominant samples

class_names

class names

k_class

number of classes

proportions

proportions each class covered

Author(s)

Fatih Saglam, [email protected]

References

Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8

Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7

Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf

Examples

n <- 1000
x1 <- runif(n, 1, 10)
x2 <- runif(n, 1, 10)
x <- cbind(x1, x2)
y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))

m_pcccd <- pcccd_classifier(x = x, y = y)

# dataset
plot(x, col = y, asp = 1)

# dominant samples of first class
x_center <- m_pcccd$x_dominant_list[[1]]

# radii of balls for first class
radii <- m_pcccd$radii_dominant_list[[1]]

# balls
for (i in 1:nrow(x_center)) {
xx <- x_center[i, 1]
yy <- x_center[i, 2]
r <- radii[i]
theta <- seq(0, 2*pi, length.out = 100)
xx <- xx + r*cos(theta)
yy <- yy + r*sin(theta)
lines(xx, yy, type = "l", col = "green")
}

# testing the performance
i_train <- sample(1:n, round(n*0.8))

x_train <- x[i_train,]
y_train <- y[i_train]

x_test <- x[-i_train,]
y_test <- y[-i_train]

m_pcccd <- pcccd_classifier(x = x_train, y = y_train)
pred <- predict(object = m_pcccd, newdata = x_test)

# confusion matrix
table(y_test, pred)

# test accuracy
sum(y_test == pred)/nrow(x_test)

Pure and Proper Class Cover Catch Digraph Ensemble Classifier

Description

pcccd_ensemble_classifier fits an Ensemble Pure and Proper Class Cover Catch Digraph (PCCCD) classification model.

Usage

pcccd_ensemble_classifier(
  x,
  y,
  n_model = 30,
  n_var = ncol(x),
  replace = FALSE,
  prop_sample = ifelse(replace, 1, 0.67),
  min_proportion = 0.7,
  max_proportion = 1,
  verbose = TRUE
)

Arguments

x

feature matrix or dataframe.

y

class factor variable.

n_model

an integer. Number of weak classifiers.

n_var

an integer. number of variables in weak classifiers.

replace

a bool. Should replacement be used in data sampling

prop_sample

a value between 0 and 1. Proportion the number of resampled samples to the number of samples in x.

min_proportion

Minimum proportion of cover proportion in weak classifiers.

max_proportion

Maximum proportion of cover proportion in weak classifiers.

Details

Bagging framework for PCCCD.

Value

an object of "cccd_classifier" which includes:

i_dominant_list

dominant sample indexes.

x_dominant_list

dominant samples from feature matrix, x

radii_dominant_list

Radiuses of the circle for dominant samples

class_names

class names

k_class

number of classes

proportions

proportions each class covered

Author(s)

Fatih Saglam, [email protected]

Examples

n <- 1000

Pure and Proper Class Cover Catch Digraph Prediction

Description

predict.pcccd_classifier makes prediction using pcccd_classifier object.

Usage

## S3 method for class 'pcccd_classifier'
predict(object, newdata, type = "pred", ...)

Arguments

object

a pcccd_classifier object

newdata

newdata as matrix or dataframe.

type

"pred" or "prob". Default is "pred". "pred" is class estimations, "prob" is n×kn\times k matrix of class probabilities.

...

not used.

Details

Estimations are based on nearest dominant neighbor in radius unit.

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).

Value

a vector of class predictions (if type is "pred") or a n×pn\times p matrix of class probabilities (if type is "prob").

Author(s)

Fatih Saglam, [email protected]

References

Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8

Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7

Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf

Examples

n <- 1000
x1 <- runif(n, 1, 10)
x2 <- runif(n, 1, 10)
x <- cbind(x1, x2)
y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))

# testing the performance
i_train <- sample(1:n, round(n*0.8))

x_train <- x[i_train,]
y_train <- y[i_train]

x_test <- x[-i_train,]
y_test <- y[-i_train]

m_pcccd <- pcccd_classifier(x = x_train, y = y_train)
pred <- predict(object = m_pcccd, newdata = x_test)

# confusion matrix
table(y_test, pred)

# test accuracy
sum(y_test == pred)/nrow(x_test)

Pure and Proper Class Cover Catch Digraph Ensemble Prediction

Description

predict.pcccd_ensemble_classifier makes prediction using pcccd_ensemble_classifier object.

Usage

## S3 method for class 'pcccd_ensemble_classifier'
predict(object, newdata, type = "pred", ...)

Arguments

object

a rwcccd_classifier object

newdata

newdata as matrix or dataframe.

type

"pred" or "prob". Default is "pred". "pred" is class estimations, "prob" is n×kn\times k matrix of class probabilities.

...

not used.

Value

a vector of class predictions (if type is "pred") or a n×pn\times p matrix of class probabilities (if type is "prob").

Author(s)

Fatih Saglam, [email protected]

Examples

n <- 1000

Random Walk Class Cover Catch Digraph Prediction

Description

predict.rwcccd_classifier makes prediction using rwcccd_classifier object.

Usage

## S3 method for class 'rwcccd_classifier'
predict(object, newdata, type = "pred", e = 0, ...)

Arguments

object

a rwcccd_classifier object

newdata

newdata as matrix or dataframe.

type

"pred" or "prob". Default is "pred". "pred" is class estimations, "prob" is n×kn\times k matrix of class probabilities.

e

0 or 1. Default is 0. Penalty based on TT scores in rwcccd_classifier object.

...

not used.

Details

Estimations are based on nearest dominant neighbor in radius unit. e argument is used to penalize estimations based on TT scores in rwcccd_classifier object.

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).

Value

a vector of class predictions (if type is "pred") or a n×pn\times p matrix of class probabilities (if type is "prob").

Author(s)

Fatih Saglam, [email protected]

References

Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8

Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7

Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf

Examples

n <- 1000
x1 <- runif(n, 1, 10)
x2 <- runif(n, 1, 10)
x <- cbind(x1, x2)
y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))

# testing the performance
i_train <- sample(1:n, round(n*0.8))

x_train <- x[i_train,]
y_train <- y[i_train]

x_test <- x[-i_train,]
y_test <- y[-i_train]

m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train)
pred <- predict(object = m_rwcccd, newdata = x_test, e = 0)

# confusion matrix
table(y_test, pred)

# test accuracy
sum(y_test == pred)/nrow(x_test)

Random Walk Class Cover Catch Digraph Classifier

Description

rwcccd_classifier and rwcccd_classifier_2 fits a Random Walk Class Cover Catch Digraph (RWCCCD) classification model. rwcccd_classifier uses C++ for speed and rwcccd_classifier_2 uses R language to determine balls.

Usage

rwcccd_classifier(x, y, method = "default", m = 1, proportion = 0.99)

rwcccd_classifier_2(
  x,
  y,
  method = "default",
  m = 1,
  proportion = 0.99,
  partial_ordering = FALSE
)

Arguments

x

feature matrix or dataframe.

y

class factor variable.

method

"default" or "balanced".

m

penalization parameter. Takes value in [0,)[0,\infty).

proportion

proportion of covered samples. A real number between (0,1](0,1].

partial_ordering

TRUE or FALSE Default is FALSE TRUE uses partial ordering in determining dominant points. It orders incompletely but faster. Only for rwcccd_classifier_2.

Details

Random Walk Class Cover Catch Digraphs (RWCCD) are determined by calculating TtargetT_{\text{target}} score for each class as target class as

Ttarget=Rtarget(rtarget)rtargetnu2dm(x).T_{\text{target}}=R_{\text{target}}(r_{\text{target}})-\frac{r_{\text{target}}n_u}{2d_m(x)}.

Here, rtargetr_{\text{target}} is radius and determined by maximum Rtarget(r)Ptarget(r)R_{\text{target}}(r) - P_{\text{target}}(r) calculated for each target sample. Rtarget(r)R_{\text{target}}(r) is

Rtarget(r):=wtargetzXntargettarget:d(xtarget,z)rwnontargetzXnnon-targetnon-target:d(xtarget,z)rR_{\text{target}}(r):= w_{target}|{z\in X^{\text{target}}_{n_{\text{target}}}:d(x^{\text{target}},z)\leq r}| - w_{non-target}|{z\in X^{\text{non-target}}_{n_{\text{non-target}}}:d(x^{\text{target}},z)\leq r}|

and Ptarget(r)P_{\text{target}}(r) is

Ptarget(r)=m×d(xtarget,z)p.P_{\text{target}}(r) = m\times d(x^{\text{target}},z)^p.

m=0m=0 removes penalty. wtarget=1w_{target}=1 for default and wtarget=ntarget/nnon-targetw_{target}=n_{\text{target}/n_{\text{non-target}}} for balanced method. nun_u is the number of uncovered samples in the current iteration and dm(x)d_m(x) is maxd(xtarget,xuncovered)\max{d(x^{\text{target}},x^{\text{uncovered}})}.

This method is more robust to noise compared to PCCCD However, balls covers classes improperly and r=0r = 0 can be selected.

For detail, please refer to Priebe et al. (2001), Priebe et al. (2003), and Manukyan and Ceyhan (2016).

Value

a rwcccd_classifier object

i_dominant_list

dominant sample indexes.

x_dominant_list

dominant samples from feature matrix, x

radii_dominant_list

Radiuses of the circle for dominant samples

class_names

class names

k_class

number of classes

proportions

proportions each class covered

Author(s)

Fatih Saglam, [email protected]

References

Priebe, C. E., DeVinney, J., & Marchette, D. J. (2001). On the distribution of the domination number for random class cover catch digraphs. Statistics & Probability Letters, 55(3), 239–246. https://doi.org/10.1016/s0167-7152(01)00129-8

Priebe, C. E., Marchette, D. J., DeVinney, J., & Socolinsky, D. A. (2003). Classification Using Class Cover Catch Digraphs. Journal of Classification, 20(1), 3–23. https://doi.org/10.1007/s00357-003-0003-7

Manukyan, A., & Ceyhan, E. (2016). Classification of imbalanced data with a geometric digraph family. Journal of Machine Learning Research, 17(1), 6504–6543. https://jmlr.org/papers/volume17/15-604/15-604.pdf

Examples

n <- 500
x1 <- runif(n, 1, 10)
x2 <- runif(n, 1, 10)
x <- cbind(x1, x2)
y <- as.factor(ifelse(3 < x1 & x1 < 7 & 3 < x2 & x2 < 7, "A", "B"))

# dataset
m_rwcccd_1 <- rwcccd_classifier(x = x, y = y, method = "default", m = 1)

plot(x, col = y, asp = 1, main = "default")
# dominant samples of second class
x_center <- m_rwcccd_1$x_dominant_list[[2]]
# radii of balls for second class
radii <- m_rwcccd_1$radii_dominant_list[[2]]

# balls
for (i in 1:nrow(x_center)) {
  xx <- x_center[i, 1]
  yy <- x_center[i, 2]
  r <- radii[i]
  theta <- seq(0, 2*pi, length.out = 100)
  xx <- xx + r*cos(theta)
  yy <- yy + r*sin(theta)
  lines(xx, yy, type = "l", col = "green")
}

# dataset
m_rwcccd_2 <- rwcccd_classifier_2(x = x, y = y, method = "default", m = 1, partial_ordering = TRUE)

plot(x, col = y, asp = 1, main = "default, prartial_ordering = TRUE")
# dominant samples of second class
x_center <- m_rwcccd_2$x_dominant_list[[2]]
# radii of balls for second class
radii <- m_rwcccd_2$radii_dominant_list[[2]]

# balls
for (i in 1:nrow(x_center)) {
  xx <- x_center[i, 1]
  yy <- x_center[i, 2]
  r <- radii[i]
  theta <- seq(0, 2*pi, length.out = 100)
  xx <- xx + r*cos(theta)
  yy <- yy + r*sin(theta)
  lines(xx, yy, type = "l", col = "green")
}

# dataset
m_rwcccd_3 <- rwcccd_classifier(x = x, y = y, method = "balanced", m = 1, proportion = 0.5)

plot(x, col = y, asp = 1, main = "balanced, proportion = 0.5")
# dominant samples of second class
x_center <- m_rwcccd_3$x_dominant_list[[2]]
# radii of balls for second class
radii <- m_rwcccd_3$radii_dominant_list[[2]]

# balls
for (i in 1:nrow(x_center)) {
  xx <- x_center[i, 1]
  yy <- x_center[i, 2]
  r <- radii[i]
  theta <- seq(0, 2*pi, length.out = 100)
  xx <- xx + r*cos(theta)
  yy <- yy + r*sin(theta)
  lines(xx, yy, type = "l", col = "green")
}

# testing the performance
i_train <- sample(1:n, round(n*0.8))

x_train <- x[i_train,]
y_train <- y[i_train]

x_test <- x[-i_train,]
y_test <- y[-i_train]

m_rwcccd <- rwcccd_classifier(x = x_train, y = y_train, method = "balanced")
pred <- predict(object = m_rwcccd, newdata = x_test)

# confusion matrix
table(y_test, pred)

# accuracy
sum(y_test == pred)/nrow(x_test)