% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Coxmos_common_functions.R
\name{deleteZeroOrNearZeroVariance}
\alias{deleteZeroOrNearZeroVariance}
\title{deleteZeroOrNearZeroVariance}
\usage{
deleteZeroOrNearZeroVariance(
  X,
  remove_near_zero_variance = FALSE,
  remove_zero_variance = TRUE,
  toKeep.zv = NULL,
  freqCut = 95/5
)
}
\arguments{
\item{X}{Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be
transform into binary variables.}

\item{remove_near_zero_variance}{Logical. If remove_near_zero_variance = TRUE, near zero variance
variables will be removed (default: TRUE).}

\item{remove_zero_variance}{Logical. If remove_zero_variance = TRUE, zero variance variables will
be removed (default: TRUE).}

\item{toKeep.zv}{Character vector. Name of variables in X to not be deleted by (near) zero variance
filtering (default: NULL).}

\item{freqCut}{Numeric. Cutoff for the ratio of the most common value to the second most common
value (default: 95/5).}
}
\value{
Return a list of two objects:
\code{X}: The new data.frame X filtered.
\code{variablesDeleted}: The variables that have been removed by the filter.
}
\description{
Provides a robust mechanism to filter out variables from a dataset that exhibit zero
or near-zero variability, thereby enhancing the quality and interpretability of subsequent statistical
analyses.
}
\details{
The \code{deleteZeroOrNearZeroVariance} function is an indispensable tool in the preprocessing
phase of statistical modeling. In many datasets, especially high-dimensional ones, certain variables
might exhibit zero or near-zero variability. Such variables can be problematic as they offer limited
information variance and can potentially distort the results of statistical models, leading to
issues like overfitting. By leveraging the \code{caret::nearZeroVar()} function, this tool offers a
rigorous method to identify and exclude these variables. Users are afforded flexibility in their
choices, with options to remove only zero variance variables, near-zero variability variables, or
both. The function also provides the capability to set a frequency cutoff, \code{freqCut}, which
determines the threshold for near-zero variability based on the ratio of the most frequent value to
the second most frequent value. For scenarios where certain variables are deemed essential and
should not be removed regardless of their variance, the \code{toKeep.zv} parameter allows users to
specify a list of such variables.
}
\examples{
data("X_proteomic")
X <- X_proteomic
filter <- deleteZeroOrNearZeroVariance(X, remove_near_zero_variance = TRUE)
}
\author{
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
}
