--- title: Shiny for Genetic Analysis Package (gap) Designs subtitle: "Jing Hua Zhao" output: bookdown::html_document2: toc: true toc_float: true fontsize: 11pt lot: true lof: true author: - Department of Public Health and Primary Care - University of Cambridge - Cambridge, UK - https://jinghuazhao.github.io/ date: '`r Sys.Date()`' bibliography: '`r system.file("REFERENCES.bib", package="gap")`' csl: nature-genetics.csl vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Shiny for Genetic Analysis Package (gap) Designs} %\VignetteEncoding{UTF-8} --- This is an initial attempt to enable easy calculation/visualization of study designs from R/gap which benchmarked relevant publications and eventually the app can produce more generic results. One can run the app with R/gap installation as follows, ```r setwd(file.path(find.package("gap"),"shinygap")) library(shiny) runApp() ``` Alternatively, one can run the app from source using `gap/inst/shinygap`. In fact, these are conveniently wrapped up as `runshinygap()` function. To set the default parameters, some compromises need to be made, e.g., Kp=[1e-5, 0.4], MAF=[1e-3, 0.8], alpha=[1e-8, 0.05], beta=[0.01, 0.4]. The slider inputs provide upper bounds of parameters. # Family-based study This is a call to `fbsize()`. # Population-based study This is a call to `pbsize()`. # Case-cohort study This is a call to `ccsize()` whose `power` argument indcates power (TRUE) or sample size (FALSE) calculation. # Two-stage case-control design We implement it in function \texttt{tscc} whose format is ``` tscc(model, GRR, p1, n1, n2, M, alpha.genome, pi.samples, pi.markers, K) ``` which requires specification of disease model (multiplicative, additive, dominant, recessive), genotypic relative risk (GRR), the estimated risk allele frequency in cases ($p_1$), total number of cases ($n_1$) total number of controls ($n_2$), total number of markers ($M$), the false positive rate at genome level ($\alpha_\mathit{genome}$), the proportion of markers to be selected ($\pi_\mathit{markers}$, also used as the false positive rate at stage 1) and the population prevalence ($K$). --- # Appendix: Theory {-} ## A. Family-based and population-based designs {-} This is detailed in the package vignettes gap, , or jss @zhao07. ## B. Case-cohort design {-} Our implemention is with respect to two aspects @cai04. ### B.1 Power {-} $$\Phi\left(Z_\alpha+\tilde{n}^\frac{1}{2}\theta\sqrt{\frac{p_1p_2p_D}{q+(1-q)p_D}}\right)$$ where $\alpha$ is the significance level, $\theta$ is the log-hazard ratio for two groups, $p_j, j = 1, 2$, are the proportion of the two groups in the population ($p_1 + p_2 = 1$), $\tilde{n}$ is the total number of subjects in the subcohort, $p_D$ is the proportion of the failures in the full cohort, and $q$ is the sampling fraction of the subcohort. ### B.2 Sample size {-} $$\tilde{n}=\frac{nBp_D}{n-B(1-p_D)}$$ where $B=\frac{Z_{1-\alpha}+Z_\beta}{\theta^2p_1p_2p_D}$ and $n$ is the whole cohort size. ## C. Two-stage case-control design {-} Tests of allele frequency differences between cases and controls in a two-stage design are described here @skol06. The usual test of proportions can be written as $$z(p_1,p_2,n_1,n_2,\pi_{samples})=\frac{p_1-p_2}{\sqrt{\frac{p_1(1-p_1)}{2n_1\pi_{sample}}+\frac{p_2(1-p_2)}{2n_2\pi_{sample}}}}$$ where $p_1$ and $p_2$ are the allele frequencies, $n_1$ and $n_2$ are the sample sizes, $\pi_{samples}$ is the proportion of samples to be genotyped at stage 1. The test statistics for stage 1, for stage 2 as replication and for stages 1 and 2 in a joint analysis are then $z_1 = z(\hat p_1,\hat p_2,n_1,n_2,\pi_{samples})$, $z_2 = z(\hat p_1,\hat p_2,n_1,n_2,1-\pi_{samples})$, $z_j = \sqrt{\pi_{samples}}z_1+\sqrt{1-\pi_{samples}}z_2$, respectively. Let $C_1$, $C_2$, and $C_j$ be the thresholds for these statistics, the false positive rates can be obtained according to $P(|z_1|>C_1)P(|z_2|>C_2,sign(z_1)=sign(z_2))$ and $P(|z_1|>C_1)P(|z_j|>C_j||z_1|>C_1)$ for replication-based and joint analyses, respectively. # References {-}