| Title: | Assessing Predisposition Between Phenotypes using Polygenic Scores | 
| Version: | 1.0.0 | 
| Description: | Using polygenic scores (PGS, or PRS/GRS for binary outcomes), this package allows to investigate shared predisposition between different conditions, and do fast association analysis, export plots and views of the PGS distribution using 'ggplot2' object. | 
| Depends: | R (≥ 3.5.0) | 
| License: | GPL (≥ 3) | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| Imports: | ggplot2, stats, utils, MASS, nnet, parallel, ivreg | 
| LazyData: | true | 
| Suggests: | testthat (≥ 3.0.0) | 
| Config/testthat/edition: | 3 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-15 14:26:52 UTC; vincentp | 
| Author: | Vincent Pascat | 
| Maintainer: | Vincent Pascat <vincent.pascat@univ-lille.fr> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-15 14:40:02 UTC | 
Association of a PGS distribution with a Phenotype
Description
assoc() takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a data frame showing the association of PGS on the Phenotype
Usage
assoc(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  scale = TRUE,
  covar_col = NA,
  verbose = TRUE,
  log = ""
)
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| prs_col | a character specifying the PGS column name | 
| phenotype_col | a character specifying the Phenotype column name | 
| scale | a boolean specifying if scaling of PGS should be done before testing | 
| covar_col | a character vector specifying the covariate column names (facultative) | 
| verbose | a boolean (TRUE by default) to write in the console/log messages. | 
| log | a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. | 
Value
return a data frame showing the association of the PGS on the Phenotype with the following columns:
- PGS: the name of the PGS 
- Phenotype: the name of Phenotype 
- Phenotype_type: either - 'Continuous',- 'Ordered Categorical',- 'Categorical'or- 'Cases/Controls'
- Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either - 'Linear regression',- 'Binary logistic regression',- 'Ordinal logistic regression'or- 'Multinomial logistic regression'
- Covar: list all the covariates used for this association 
- N_cases: if Phenotype_type is Cases/Controls, gives the number of cases 
- N_controls: if Phenotype_type is Cases/Controls, gives the number of controls 
- N: the number of individuals/samples 
- Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression 
- SE: standard error of the Beta coefficient (if Phenotype_type is Continuous) 
- lower_CI: lower confidence interval of the related Effect (Beta or OR) 
- upper_CI: upper confidence interval of the related Effect (Beta or OR) 
- P_value: associated P-value 
Examples
results <- assoc(
  df = comorbidData,
  prs_col = "ldl_PGS",
  phenotype_col = "log_ldl",
  scale = TRUE,
  covar_col = c("age", "sex", "gen_array")
)
print(results)
Multiple PGS Associations Plot
Description
assocplot() takes a data frame of associations. Returns plot of the associations
from assoc() (ggplot2 object or list of ggplot object)
Usage
assocplot(score_table = NULL, axis = "vertical", pval = FALSE)
Arguments
| score_table | a dataframe with association results with at least the following columns: 
 | 
| axis | a character,  | 
| pval | a parameter specifying information on how to display P-value 
 | 
Value
return either:
- a ggplot object representing the association results. 
- a list of two ggplot objects, accessible by $continuous_phenotype and $discrete_phenotype, if there are both Continuous Phenotypes and Discrete Phenotypes (i.e. "Categorical" or "Cases/Controls") 
Centiles Plot from a PGS Association
Description
centileplot() takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot (ggplot2 object) with centiles (or deciles if not enough individuals)
of PGS in x and Prevalence/Median/Mean of the Phenotype in y
Usage
centileplot(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  decile = FALSE,
  continuous_metric = NA
)
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| prs_col | a character specifying the PGS column name | 
| phenotype_col | a character specifying the Phenotype column name | 
| decile | a boolean specifying if centiles or deciles should be used | 
| continuous_metric | a facultative character specifying what metric to
use for continuous Phenotype, only three options:  | 
Value
return a figure of results in the format ggplot2 object
Mock dataset for comorbidPGS package
Description
A dataset with sets of PGSs, Phenotypes and Covariates to demo the comorbidPGS package
Usage
comorbidData
Format
who
A data frame with 10,000 rows (individuals) and 16 columns:
- ID
- Individual's identifier, characters 
- sex
- Sex of the individuals, binary numeric values 
- age
- Age of the individuals, numeric value 
- gen_array
- The genotypic array used for those individuals, factor values 
- ethnicity
- The ethnicity of individuals, can be also used as Categorical Phenotype, factor values 
- brc_PGS, t2d_PGS, ldl_PGS
- Three distributions of PGS for Breast Cancer, Type 2 Diabetes and Hypertension respectively; numeric values 
- brc, t2d, hypertension
- Three Cases/Controls Phenotypes, representing Breast Cancer, Type 2 Diabetes and Hypertension respectively; binary values 
- ldl, bmi, sbp
- Three Continuous Phenotypes, representing low-density lipoprotein, body-mass index, and systolic blood pressure respectively; numeric values 
- log_ldl
- A continuous Phenotype, based on log(ldl) to have a normal distribution; numeric values 
- sbp_cat
- An Ordered Categorical Phenotype, with 3 possible outcomes: low, normal or high systolic blood pressure; factor values 
Source
https://github.com/VP-biostat/comorbidPGS
Deciles BoxPlot from a PGS Association with a Continuous Phenotype
Description
decileboxplot() takes a distribution of PGS, a Continuous Phenotype.
Returns a plot with deciles of PGS in x and Boxplot of the Phenotype in y
Usage
decileboxplot(df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype")
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| prs_col | a character specifying the PGS column name | 
| phenotype_col | a character specifying the Continuous Phenotype column name | 
Value
return a ggplot object (ggplot2)
Density Plot from a PGS Association
Description
densityplot() takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot with density of PGS in x by Categories of the Phenotype
Usage
densityplot(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  scale = TRUE,
  threshold = NA
)
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| prs_col | a character specifying the PGS column name | 
| phenotype_col | a character specifying the Phenotype column name | 
| scale | a boolean specifying if scaling of PGS should be done before plotting | 
| threshold | a facultative numeric specifying for Continuous Phenotype the Threshold to consider individuals as Cases/Controls as following: 
 | 
Value
return a ggplot object (ggplot2)
Mendelian Randomization Two-Stage Least Square (2SLS) method with external PGS
Description
mr_2sls() takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame of the result of the Mendelian Randomization 2SLS methods using PGS
Usage
mr_2sls(
  df = NULL,
  prs_col = "SCORESUM",
  exposure_col = NA,
  outcome_col = NA,
  scale = TRUE,
  verbose = TRUE,
  log = ""
)
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| prs_col | a character specifying the PGS column name | 
| exposure_col | a character specifying the Exposure (Phenotype) column name | 
| outcome_col | a character specifying the Outcome (Phenotype) column name | 
| scale | a boolean specifying if scaling of PGS should be done before testing | 
| verbose | a boolean (TRUE by default) to write in the console/log messages. | 
| log | a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. | 
Value
return a data frame with the Mendelian Randomization association result using 2SLS method with the following columns:
- PGS: the name of the PGS used 
- Exposure: the name of Phenotype used as Exposure 
- Outcome: the name of Phenotype used as Outcome 
- Method: the MR method used (here 2SLS) 
- N_cases: if Phenotype_type is Cases/Controls, the number of cases 
- N_controls: if Phenotype_type is Cases/Controls, the number of controls 
- N: the number of individuals/samples 
- MR_estimate: the MR estimate (beta) using the ratio method 
- SE: the associated standard error (second order) 
- F_stat: the F-statistic of the Exposure ~ PGS association 
Examples
result <- mr_2sls(
  df = comorbidData,
  prs_col = "ldl_PGS",
  exposure_col = "log_ldl",
  outcome_col = "bmi",
  scale = TRUE
)
print(result)
Mendelian Randomization ratio method with external PGS
Description
mr_ratio() takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame showing the Mendelian Randomization ratio methods using PGS
Usage
mr_ratio(
  df = NULL,
  prs_col = "SCORESUM",
  exposure_col = NA,
  outcome_col = NA,
  scale = TRUE,
  verbose = TRUE,
  log = ""
)
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| prs_col | a character specifying the PGS column name | 
| exposure_col | a character specifying the Exposure (Phenotype) column name | 
| outcome_col | a character specifying the Outcome (Phenotype) column name | 
| scale | a boolean specifying if scaling of PGS should be done before testing | 
| verbose | a boolean (TRUE by default) to write in the console/log messages. | 
| log | a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. | 
Value
return a data frame with the Mendelian Randomization association result using the ratio method with the following columns:
- PGS: the name of the PGS used 
- Exposure: the name of Phenotype used as Exposure 
- Outcome: the name of Phenotype used as Outcome 
- Method: the MR method used (here Ratio) 
- N_cases: if Phenotype_type is Cases/Controls, the number of cases 
- N_controls: if Phenotype_type is Cases/Controls, the number of controls 
- N: the number of individuals/samples 
- MR_estimate: the MR estimate (beta) using the ratio method 
- SE: the associated standard error (second order) 
- F_stat: the F-statistic of the Exposure ~ PGS association 
Examples
result <- mr_ratio(
  df = comorbidData,
  prs_col = "ldl_PGS",
  exposure_col = "log_ldl",
  outcome_col = "bmi",
  scale = TRUE
)
print(result)
Multiple PGS Associations from a Data Frame
Description
multiassoc() takes a data frame with distribution(s) of PGS and Phenotype(s),
and a table of associations to make from this data frame.
Returns a data frame showing the association results
Usage
multiassoc(
  df = NULL,
  assoc_table = NULL,
  scale = TRUE,
  covar_col = NA,
  verbose = TRUE,
  log = "",
  parallel = FALSE,
  num_cores = NA
)
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| assoc_table | a dataframe or matrix specifying the associations to make from df, with 2 columns: PGS and Phenotype (in this order) | 
| scale | a boolean specifying if scaling of PGS should be done before testing | 
| covar_col | a character vector specifying the covariate column names (facultative) | 
| verbose | a boolean (TRUE by default) to write in the console/log messages. | 
| log | a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. If parallel = TRUE, the log will be incomplete | 
| parallel | a boolean, if TRUE,  | 
| num_cores | an integer, if parallel = TRUE (default),  | 
Value
return a data frame showing the association of the PGS(s) on the Phenotype(s) with the following columns:
- PGS: the name of the PGS 
- Phenotype: the name of Phenotype 
- Phenotype_type: either - 'Continuous',- 'Ordered Categorical',- 'Categorical'or- 'Cases/Controls'
- Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either - 'Linear regression',- 'Binary logistic regression',- 'Ordinal logistic regression'or- 'Multinomial logistic regression'
- Covar: list all the covariates used for this association 
- N_cases: if Phenotype_type is Cases/Controls, gives the number of cases 
- N_controls: if Phenotype_type is Cases/Controls, gives the number of controls 
- N: the number of individuals/samples 
- Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression, OR of logistic regression otherwise 
- SE: standard error of the related Effect (Beta or OR) 
- lower_CI: lower confidence interval of the related Effect (Beta or OR) 
- upper_CI: upper confidence interval of the related Effect (Beta or OR) 
- P_value: associated P-value 
Examples
assoc_table <- expand.grid(
  c("t2d_PGS", "ldl_PGS"),
  c("ethnicity","brc","t2d","log_ldl","sbp_cat")
)
results <- multiassoc(
  df = comorbidData,
  assoc_table = assoc_table,
  covar_col = c("age", "sex", "gen_array"),
  parallel = FALSE,
  verbose = FALSE
)
print(results)
Multiple PGS Associations from different Phenotypes
Description
multiphenassoc() takes a distribution of PGS and multiple Phenotypes and eventual confounders.
Returns a data frame showing the association results
Usage
multiphenassoc(
  df = NULL,
  prs_col = "SCORESUM",
  phenotype_col = "Phenotype",
  scale = TRUE,
  covar_col = NA,
  verbose = TRUE,
  log = ""
)
Arguments
| df | a dataframe with individuals on each row, and at least the following columns: 
 | 
| prs_col | a character specifying the PGS column name | 
| phenotype_col | a character vector specifying the Phenotype column names | 
| scale | a boolean specifying if scaling of PGS should be done before testing | 
| covar_col | a character vector specifying the covariate column names (facultative) | 
| verbose | a boolean (TRUE by default) to write in the console/log messages. | 
| log | a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. | 
Value
return a data frame showing the association of the PGS on the Phenotypes with the following columns:
- PGS: the name of the PGS 
- Phenotype: the name of Phenotype 
- Phenotype_type: either - 'Continuous',- 'Ordered Categorical',- 'Categorical'or- 'Cases/Controls'
- Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either - 'Linear regression',- 'Binary logistic regression',- 'Ordinal logistic regression'or- 'Multinomial logistic regression'
- Covar: list all the covariates used for this association 
- N_cases: if Phenotype_type is Cases/Controls, gives the number of cases 
- N_controls: if Phenotype_type is Cases/Controls, gives the number of controls 
- N: the number of individuals/samples 
- Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression 
- SE: standard error of the Beta coefficient (if Phenotype_type is Continuous) 
- lower_CI: lower confidence interval of the related Effect (Beta or OR) 
- upper_CI: upper confidence interval of the related Effect (Beta or OR) 
- P_value: associated P-value