leakr: Data Leakage Detection Tools for Machine Learning

Provides utilities to detect common data leakage patterns including train/test contamination, temporal leakage, and data duplication, enhancing model reliability and reproducibility in machine learning workflows. Generates diagnostic reports and visual summaries to support data validation. Methods based on best practices from Hastie, Tibshirani, and Friedman (2009, ISBN:978-0387848570).

Version: 0.1.0
Imports: ggplot2, arrow, data.table, digest, htmltools, openxlsx, readxl, stringr, workflows, jsonlite
Suggests: testthat (≥ 3.0.0), caret, mlr3, tidymodels, knitr, rmarkdown
Published: 2025-10-26
DOI: 10.32614/CRAN.package.leakr (may not be active yet)
Author: Cheryl Isabella Lim [aut, cre]
Maintainer: Cheryl Isabella Lim <cheryl.academic at gmail.com>
License: MIT + file LICENSE
NeedsCompilation: no
Materials: README, NEWS
CRAN checks: leakr results

Documentation:

Reference manual: leakr.html , leakr.pdf
Vignettes: Advanced Leakage Detection with leakr (source, R code)
Framework Integration with leakr (source, R code)
Getting Started with leakr (source, R code)

Downloads:

Package source: leakr_0.1.0.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: leakr_0.1.0.zip
macOS binaries: r-release (arm64): leakr_0.1.0.tgz, r-oldrel (arm64): leakr_0.1.0.tgz, r-release (x86_64): leakr_0.1.0.tgz, r-oldrel (x86_64): leakr_0.1.0.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=leakr to link to this page.