| Title: | Robust Categorical Data Analysis |
|---|---|
| Description: | Robust categorical data analysis based on the theory of C-estimation developed in Welz (2024) <doi:10.48550/arXiv.2403.11954>. For now, the package only implements robust estimation of polychoric correlation as proposed in Welz, Mair and Alfons (2026) <doi:10.1017/psy.2025.10066> and robust estimation of polyserial correlation (Welz, 2026 <doi:10.1017/psy.2026.10091>) with methods for printing and plotting. We will implement further models in future releases. In addition, the package is still experimental, so input arguments and class structure may change in future releases. |
| Authors: | Max Welz [aut, cre] (ORCID: <https://orcid.org/0000-0003-2945-1860>), Andreas Alfons [aut] (ORCID: <https://orcid.org/0000-0002-2513-3788>), Patrick Mair [aut] (ORCID: <https://orcid.org/0000-0003-0100-6511>) |
| Maintainer: | Max Welz <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.2 |
| Built: | 2026-05-11 08:58:51 UTC |
| Source: | https://github.com/mwelz/robcat |
Initializes starting values for numerical optimization in a neutral way. The optimization problem itself is convex, so the initialization should not matter much.
initialize_param(x, y)initialize_param(x, y)
x |
Vector of integer-valued responses to first rating variable, or contingency table (a |
y |
Vector of integer-valued responses to second rating variable; only required if |
A vector of initial values for the polychoric correlation coefficient, the X-threshold parameters, and the Y-threshold parameters
## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) initialize_param(x, y)## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) initialize_param(x, y)
"robpolycor" and "polycor".Plot method for classes "robpolycor" and "polycor".
## S3 method for class 'robpolycor' plot(x, cutoff = 3, ...)## S3 method for class 'robpolycor' plot(x, cutoff = 3, ...)
x |
Object of class |
cutoff |
Cutoff beyond which the color scale for Pearson residuals is truncated. |
... |
Additional parameters to be passed down. |
An object of class "ggplot".
## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) fit <- polycor(x,y) plot(fit)## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) fit <- polycor(x,y) plot(fit)
Implements to robust estimator of Welz, Mair and Alfons (2024, doi:10.48550/arXiv.2407.18835) for the polychoric correlation model, based on the general theory of C-estimation proposed by Welz (2024, doi:10.48550/arXiv.2403.11954).
polycor( x, y = NULL, c = 0.6, variance = TRUE, constrained = "ifneeded", method = NULL, maxcor = 0.999, tol_thresholds = 0.01, init = initialize_param(x, y) )polycor( x, y = NULL, c = 0.6, variance = TRUE, constrained = "ifneeded", method = NULL, maxcor = 0.999, tol_thresholds = 0.01, init = initialize_param(x, y) )
x |
Vector of integer-valued responses to first item, or contingency table (a |
y |
Vector of integer-valued responses to second item; only required if |
c |
Tuning constant that governs robustness; must be in |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
init |
Initialization of numerical optimization. Default is neutral. |
An object of class "robpolycor", which is a list with the following components.
thetahatA vector of estimates for the polychoric correlation coefficient (rho) as well as thresholds for x (named a1,a2,...,a_{Kx-1}) and y (named b1,b2,...,b_{Ky-1}).
stderrA vector of standard errors for each estimate in thetahat.
vcovEstimated asymptotic covariance matrix of thetahat. The matrix in the paper (asymptotic covariance matrix of ) can be obtained via vcov * N, where N is the sample size.
chisq,pval,dfCurrently NULL, will in a future release be the test statistic, p-value, and degrees of freedom of a test for bivariate normality.
objectiveValue of minimized loss function.
optimObject of class optim.
## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) polycor(x,y) # robust polycor_mle(x,y) # non-robust MLE## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) polycor(x,y) # robust polycor_mle(x,y) # non-robust MLE
Implements the maximum likelihood estimator of Olsson (1979, Psychometrika, doi:10.1007/BF02296207) for the polychoric correlation model.
polycor_mle( x, y = NULL, variance = TRUE, constrained = "ifneeded", twostep = FALSE, method = NULL, maxcor = 0.999, tol_thresholds = 0.01, init = initialize_param(x, y) )polycor_mle( x, y = NULL, variance = TRUE, constrained = "ifneeded", twostep = FALSE, method = NULL, maxcor = 0.999, tol_thresholds = 0.01, init = initialize_param(x, y) )
x |
Vector of integer-valued responses to first item, or contingency table (a |
y |
Vector of integer-valued responses to second item; only required if |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
twostep |
Shall two-step estimation of Olsson (1979) <doi:10.1007/BF02296207> be performed? Default is |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
init |
Initialization of numerical optimization. Default is neutral. |
An object of class "robpolycor". See polycor() for details.
## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) polycor(x,y) # robust polycor_mle(x,y) # non-robust MLE## example data set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) polycor(x,y) # robust polycor_mle(x,y) # non-robust MLE
A useful wrapper of polycor to robustly estimate a polychoric correlation matrix by calculating all unique pairwise polychoric correlation coefficients.
polycormat( data, c = 0.6, parallel = FALSE, num_cores = 1L, return_polycor = TRUE, variance = TRUE, constrained = "ifneeded", method = NULL, maxcor = 0.999, tol_thresholds = 0.01 )polycormat( data, c = 0.6, parallel = FALSE, num_cores = 1L, return_polycor = TRUE, variance = TRUE, constrained = "ifneeded", method = NULL, maxcor = 0.999, tol_thresholds = 0.01 )
data |
Data matrix or |
c |
Tuning constant that governs robustness; must be in |
parallel |
Logical. Shall parallelization be used? Default is |
num_cores |
Number of cores to be used, only relevant if |
return_polycor |
Logical. Shall the individual |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
If return_polycor = TRUE, returns a list with a polychoric correlation matrix and list of "polycor" objects. If return_polycor = FALSE, then only a correlation matrix is returned.
## example data set.seed(123) data <- matrix(sample(c(1,2,3), size = 3*100, replace = TRUE), nrow = 100) polycormat(data) # robust polycormat_mle(data) # non-robust MLE## example data set.seed(123) data <- matrix(sample(c(1,2,3), size = 3*100, replace = TRUE), nrow = 100) polycormat(data) # robust polycormat_mle(data) # non-robust MLE
A useful wrapper of polycor_mle to estimate a polychoric correlation matrix via maximum likelihood by calculating all unique pairwise polychoric correlation coefficients.
polycormat_mle( data, parallel = FALSE, num_cores = 1L, return_polycor = TRUE, variance = TRUE, constrained = "ifneeded", method = NULL, maxcor = 0.999, tol_thresholds = 0.01 )polycormat_mle( data, parallel = FALSE, num_cores = 1L, return_polycor = TRUE, variance = TRUE, constrained = "ifneeded", method = NULL, maxcor = 0.999, tol_thresholds = 0.01 )
data |
Data matrix or |
parallel |
Logical. Shall parallelization be used? Default is |
num_cores |
Number of cores to be used, only relevant if |
return_polycor |
Logical. Shall the individual |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
If return_polycor = TRUE, returns a list with a polychoric correlation matrix and list of "polycor" objects. If return_polycor = FALSE, then only a correlation matrix is returned.
## example data set.seed(123) data <- matrix(sample(c(1,2,3), size = 3*100, replace = TRUE), nrow = 100) polycormat(data) # robust polycormat_mle(data) # non-robust MLE## example data set.seed(123) data <- matrix(sample(c(1,2,3), size = 3*100, replace = TRUE), nrow = 100) polycormat(data) # robust polycormat_mle(data) # non-robust MLE
Implements the robust estimator of Welz (2025, doi:10.48550/arXiv.2510.15632) for the polyserial correlation model.
polyserial( x, y, alpha = 0.5, num_y = max(y), constrained = "ifneeded", method = NULL, variance = TRUE, weights = TRUE, init = polyserial_initialize_param(x = x, num_y = num_y, robust = TRUE), maxcor = 0.999, tol_thresholds = 0.01, tol_sigma2 = 0.01 )polyserial( x, y, alpha = 0.5, num_y = max(y), constrained = "ifneeded", method = NULL, variance = TRUE, weights = TRUE, init = polyserial_initialize_param(x = x, num_y = num_y, robust = TRUE), maxcor = 0.999, tol_thresholds = 0.01, tol_sigma2 = 0.01 )
x |
Vector of numeric values. |
y |
Vector of integer-valued ordinal values. |
alpha |
Tuning constant that governs robustness-efficiency tradeoff; must be in |
num_y |
Number of response categories in y; defaults to max(y) |
constrained |
Shall parameter restructions be enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
weights |
Shall weights be returned? Default is |
init |
Initialization of numerical optimization. Default is neutral. |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
tol_sigma2 |
Minimum value of sigma2 parameter (population variance of X); only relevant in case of constrained optimization. Default is 0.01. |
An object of class "polyserial", which is a list with the following components.
thetahatA vector of estimates for the polyserial correlation coefficient (rho), population mean of X (mu), population variance of Y (sigma2), as well as thresholds for y (named tau1,tau2,...,tau_{r-1}).
stderrA vector of standard errors for each estimate in thetahat.
vcovEstimated asymptotic covariance matrix of thetahat. The matrix in the paper (asymptotic covariance matrix of ) can be obtained via vcov * N, where N is the sample size.
pointpolyserialEstimated polyserial correlation coefficient, calculated with provided scoring of Y
weightsList of rescaled and raw outlyingness weights for each observation as well as maximum possible raw weight that was used for rescaling (sup).
objectiveValue of minimized loss function.
optimObject of class optim.
inputsList of provided inputs.
## example data set.seed(123) x <- rnorm(n = 100) y <- sample(c(1,2), size = 100, replace = TRUE) polyserial(x,y)## example data set.seed(123) x <- rnorm(n = 100) y <- sample(c(1,2), size = 100, replace = TRUE) polyserial(x,y)
Calculate population asymptotic variance-covariance matrix associated with a parameter vector theta, assuming that the polyserial model is correctly specified and that theta is the true model parameter. May take a few moments to compute because a relatively large number of integrals need to be numerically solved.
polyserial_efficiency(theta, alpha)polyserial_efficiency(theta, alpha)
theta |
Parameter vector of polyserial model; assumed to be the true one. First element is polyserial correlatio coefficient, second is the population mean of X, third is population variance of X, and the remaining elements are the thresholds associated with the ordnal Y (must be in increasing order) |
alpha |
Tuning constant governing robustness-efficiency tradeoff. Set to 0 for maximum likelihood. |
A numeric matrix that is the population asymptotic variance-covariance matrix associated with a parameter vector theta and tuning constant alpha.
theta <- c(rho = 0, mu = 0, sigma2 = 1, tau1 = 0) # true parameter vector polyserial_efficiency(theta, alpha = 0.5)theta <- c(rho = 0, mu = 0, sigma2 = 1, tau1 = 0) # true parameter vector polyserial_efficiency(theta, alpha = 0.5)
Initializes starting values for numerical optimization in a neutral way.
polyserial_initialize_param(x, num_y, robust = FALSE)polyserial_initialize_param(x, num_y, robust = FALSE)
x |
Vector of numeric values |
num_y |
Number of response options of ordinal variable |
robust |
Should values of |
A vector of initial values for the polyserial correlation coefficient, mu, sigma2, and Y-threshold parameters
## example data set.seed(123) x <- rnorm(100) polyserial_initialize_param(x = x, num_y = 3)## example data set.seed(123) x <- rnorm(100) polyserial_initialize_param(x = x, num_y = 3)
Implements maximum likelihood estimation of the polyserial correlation model.
polyserial_mle( x, y, num_y = max(y), constrained = "ifneeded", method = NULL, variance = TRUE, init = polyserial_initialize_param(x = x, num_y = num_y, robust = FALSE), maxcor = 0.999, tol_thresholds = 0.01, tol_sigma2 = 0.01 )polyserial_mle( x, y, num_y = max(y), constrained = "ifneeded", method = NULL, variance = TRUE, init = polyserial_initialize_param(x = x, num_y = num_y, robust = FALSE), maxcor = 0.999, tol_thresholds = 0.01, tol_sigma2 = 0.01 )
x |
Vector of numeric values. |
y |
Vector of integer-valued ordinal values. |
num_y |
Number of response categories in y; defaults to max(y) |
constrained |
Shall parameter restructions be enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
init |
Initialization of numerical optimization. Default is neutral. |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
tol_sigma2 |
Minimum value of sigma2 parameter (population variance of X); only relevant in case of constrained optimization. Default is 0.01. |
An object of class "polyserial", which is a list with the following components.
thetahatA vector of estimates for the polyserial correlation coefficient (rho), population mean of X (mu), population variance of Y (sigma2), as well as thresholds for y (named tau1,tau2,...,tau_{r-1}).
stderrA vector of standard errors for each estimate in thetahat.
vcovEstimated asymptotic covariance matrix of thetahat. The matrix in the paper (asymptotic covariance matrix of ) can be obtained via vcov * N, where N is the sample size.
pointpolyserialEstimated polyserial correlation coefficient, calculated with provided scoring of Y
objectiveValue of minimized loss function.
optimObject of class optim.
inputsList of provided inputs.
## example data set.seed(123) x <- rnorm(n = 100) y <- sample(c(1,2), size = 100, replace = TRUE) polyserial_mle(x,y)## example data set.seed(123) x <- rnorm(n = 100) y <- sample(c(1,2), size = 100, replace = TRUE) polyserial_mle(x,y)
"robpolycor" and "polycor".Print method for classes "robpolycor" and "polycor".
## S3 method for class 'robpolycor' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'robpolycor' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
Object of class |
digits |
Number of digits to be printed. |
... |
Additional parameters to be passed down. |
A print to the console.
set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) fit <- polycor(x,y) print(fit) fit # equivalentset.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) fit <- polycor(x,y) print(fit) fit # equivalent
"robpolyserial" and "polyserial".Print method for classes "robpolyserial" and "polyserial".
## S3 method for class 'robpolyserial' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'robpolyserial' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
Object of class |
digits |
Number of digits to be printed. |
... |
Additional parameters to be passed down. |
A print to the console.
set.seed(123) x <- rnorm(100) y <- sample(c(1,2), size = 100, replace = TRUE) fit <- polyserial(x,y) print(fit) fit # equivalentset.seed(123) x <- rnorm(100) y <- sample(c(1,2), size = 100, replace = TRUE) fit <- polyserial(x,y) print(fit) fit # equivalent
Estimate asymptotic variance-covariance matrix of polychoric model.
## S3 method for class 'robpolycor' vcov(object, ...)## S3 method for class 'robpolycor' vcov(object, ...)
object |
Object of class |
... |
Additional parameters to be passed down. |
Method for classes "robpolycor" and "polycor". Returns the estimated asymptotic variance-covariance matrix of a point estimate thetahat. The matrix in the paper (asymptotic variance-covariance matrix of ) can be obtained via multiplying the returned matrix by the sample size.
A numeric matrix, being the estimated asymptotic covariance matrix for the model parameters
set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) fit <- polycor(x,y) vcov(fit)set.seed(123) x <- sample(c(1,2,3), size = 100, replace = TRUE) y <- sample(c(1,2,3), size = 100, replace = TRUE) fit <- polycor(x,y) vcov(fit)
Estimate asymptotic variance-covariance matrix of polyserial model.
## S3 method for class 'robpolyserial' vcov(object, ...)## S3 method for class 'robpolyserial' vcov(object, ...)
object |
Object of class |
... |
Additional parameters to be passed down. |
Method for classes "robpolyserial" and "polyserial". Returns the estimated asymptotic variance-covariance matrix of a point estimate thetahat. The matrix in the paper (asymptotic variance-covariance matrix of ) can be obtained via multiplying the returned matrix by the sample size.
A numeric matrix, being the estimated asymptotic covariance matrix for the model parameters
## example data set.seed(123) x <- rnorm(n = 100) y <- sample(c(1,2), size = 100, replace = TRUE) fit <- polyserial(x,y) vcov(fit)## example data set.seed(123) x <- rnorm(n = 100) y <- sample(c(1,2), size = 100, replace = TRUE) fit <- polyserial(x,y) vcov(fit)