PRS evaluation
PGS performance was evaluated using two main metrics:
-
PGS Correlation
Pearson’s correlation between PGS derived from imputed SNP array data and PGS from whole-genome sequencing (WGS). -
ADPR (Absolute Difference in Percentile Ranking)
The absolute difference in percentile ranking between PGS from array-imputed data and the WGS-derived gold standard.
These evaluations were conducted across multiple p-value thresholds to ensure unbiased comparison, based on the method from Nguyen et al., 20221.
PRS formula
For an individual \(i\), the polygenic score at p-value threshold \(P_{T}\) is calculated as:
- \(P_T\) : p-value threshold
- \(M\) : number of SNPs after clumping
- \(x_{ij}\) : allele count for SNP \(j\) in individual \(i\)
- \(\hat{\beta}_j\) : marginal effect size from GWAS for SNP \(j\)
- \(1_{\{P_j < P_T\}}\) : indicator function for p-value filtering
PGS correlation¶
- It is the Pearson correlation coefficient between imputed and true sets of raw PGS values computed for the same individuals.
- Interpretation: Measures how similar the PGS values are in scale and ranking across two methods.
Absolute Difference in Percentile Ranking (ADPR)¶
- It is the average absolute difference in percentile rank of each individual between imputed and true sets of PGS.
-
Formula:
\[ \text{ADPR} = \frac{1}{N} \sum_{i=1}^{N} \left| \text{percentile}_i^{(A)} - \text{percentile}_i^{(B)} \right| \]N
is the number of individualspercentile_i_A
andpercentile_i_B
are the percentile ranks of individuali
in each PGS imputed and true distribution
-
Dat Thanh Nguyen, Trang TH Tran, Mai Hoang Tran, Khai Tran, Duy Pham, Nguyen Thuy Duong, Quan Nguyen, and Nam S Vo. A comprehensive evaluation of polygenic score and genotype imputation performances of human snp arrays in diverse populations. Scientific Reports, 12(1):17556, 2022. ↩