Skip to content

LPS evaluation

Evaluation methods

evaluate methods

To evaluate imputation performance, we employed two primary metrics: Imputation Accuracy and Imputation Coverage.

Imputation accuracy was quantified using the SNP-wise Pearson correlation (\(r^2\)) between imputed and true genotypes, whereas Imputation coverage was defined as the proportion of variants within each minor allele frequency (MAF) bin achieving \(r^2\ge0.8\).

These metrics collectively assess both the reliability and completeness of imputed genetic data and were calculated on a per-chromosome basis across all autosomes. Our evaluation framework follows the methodology described in Nguyen et al., 20221

Metric Description Purpose
Imputation Accuracy Mean \(r^2\) of sites within a MAF bin Measures how well imputed values match true genotypes
Imputation Coverage Proportion of variants with \(r^2 \geq 0.8\) in a bin Assesses the proportion of high-confidence imputations

Evaluation process

Input data

  • restructed lpWGS VCFs
  • restructed SNP-array VCFs
  • True VCFs

Code

1
2
3
4
5
6
    compute_MAF.sh chr${i}_${pop}_true.vcf.gz maf.txt

    run_evaluate.py --true_vcf chr${i}_${pop}_true.vcf.gz \
                    --imputed_vcf ${imputed_vcf} \
                    --af maf.txt \
                    --out_snp_wise chr${i}_${lps_cov}_${pop}_snp_wise.acc

Code

1
2
3
  get_coverage.py --input ${res_snp_wise} \
                  --cov perbin_${res_snp_wise}_cov.txt \
                  --acc perbin_${res_snp_wise}_mean_r2.txt 

Output

Evaluation process output:

LPS Pseudo array
SNP-wise accuracy lps_all_acc.txt array_all_acc.txt
Imputation coverage lps_all_cov.txt array_all_cov.txt

  1. Dat Thanh Nguyen, Trang TH Tran, Mai Hoang Tran, Khai Tran, Duy Pham, Nguyen Thuy Duong, Quan Nguyen, and Nam S Vo. A comprehensive evaluation of polygenic score and genotype imputation performances of human snp arrays in diverse populations. Scientific Reports, 12(1):17556, 2022.