LPS evaluation
Evaluation methods¶
To evaluate imputation performance, we employed two primary metrics: Imputation Accuracy and Imputation Coverage.
Imputation accuracy was quantified using the SNP-wise Pearson correlation (\(r^2\)) between imputed and true genotypes, whereas Imputation coverage was defined as the proportion of variants within each minor allele frequency (MAF) bin achieving \(r^2\ge0.8\).
These metrics collectively assess both the reliability and completeness of imputed genetic data and were calculated on a per-chromosome basis across all autosomes. Our evaluation framework follows the methodology described in Nguyen et al., 20221
| Metric | Description | Purpose |
|---|---|---|
| Imputation Accuracy | Mean \(r^2\) of sites within a MAF bin | Measures how well imputed values match true genotypes |
| Imputation Coverage | Proportion of variants with \(r^2 \geq 0.8\) in a bin | Assesses the proportion of high-confidence imputations |
Evaluation process¶
Input data
- restructed lpWGS VCFs
- restructed SNP-array VCFs
- True VCFs
Code
- compute_MAF.sh: Retrieve MAF values from true VCF files
- run_evaluate.py: Evaluation by using SNP-wise matrix
Output
Evaluation process output:
| LPS | Pseudo array | |
|---|---|---|
| SNP-wise accuracy | lps_all_acc.txt | array_all_acc.txt |
| Imputation coverage | lps_all_cov.txt | array_all_cov.txt |
-
Dat Thanh Nguyen, Trang TH Tran, Mai Hoang Tran, Khai Tran, Duy Pham, Nguyen Thuy Duong, Quan Nguyen, and Nam S Vo. A comprehensive evaluation of polygenic score and genotype imputation performances of human snp arrays in diverse populations. Scientific Reports, 12(1):17556, 2022. ↩
