Skip to content

Available data

License

This dataset is released under the CC0 1.0 Universal (Public Domain Dedication).

You are free to copy, modify, distribute, and use the data for any purpose, even commercially, without asking permission.

No attribution is required, but citation is appreciated if you find this dataset useful.

Disclosing data

Process Step Input Output
Processing data Cross-Validation Framework - Samples list of batch
- 2504 samples list
- Population meta
Variant Filtering - 3202 samples 1KGP1
- 2504 samples list
- Raw imputation panel
Data Simulation - SNP-array pos data2
- Samples list of batch
- Raw imputation panel
- GRCh38/hg38
- URL metadata
- Pseudo-array VCFs
- Downsampled BAM
Genotype Imputation lpWGS imputation - Samples list of batch
- Phasing reference
- Raw imputation panel
- Downsampled BAM
- lpWGS VCF files
SNP arrays imputation - Samples list of batch
- Phasing reference
- Raw imputation panel
- Pseudo-array VCFs
- SNP-array VCF files
Evaluation Restructure imputed data - lpWGS VCF files
- SNP-array VCF files
- Population meta
- Raw imputation panel
- Restructed lpWGS VCFs
- Restructed SNP-array VCFs
- True VCFs
lpWGS performance - Restructed lpWGS VCFs
- Restructed SNP-array VCFs
- True VCFs
- LPS-arrays evaluation output
- LPS visualizing figures
- LPS visualizing tables
PRS performance - Restructed lpWGS VCFs
- Restructed SNP-array VCFs
- True VCFs
- Base sumstats
- Raw PRS scores
- Percentile PRS scores
- PRS visualizing figures
- PRS visualizing tables

  1. Marta Byrska-Bishop, Uday S Evani, Xuefang Zhao, Anna O Basile, Haley J Abel, Allison A Regier, André Corvelo, Wayne E Clarke, Rajeeva Musunuri, Kshithija Nagulapalli, and others. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell, 185(18):3426–3440, 2022. 

  2. Dat Thanh Nguyen, Trang TH Tran, Mai Hoang Tran, Khai Tran, Duy Pham, Nguyen Thuy Duong, Quan Nguyen, and Nam S Vo. A comprehensive evaluation of polygenic score and genotype imputation performances of human snp arrays in diverse populations. Scientific Reports, 12(1):17556, 2022.