Available data
License¶
This dataset is released under the CC0 1.0 Universal (Public Domain Dedication).
You are free to copy, modify, distribute, and use the data for any purpose, even commercially, without asking permission.
No attribution is required, but citation is appreciated if you find this dataset useful.
Disclosing data¶
Process | Step | Input | Output |
Processing data | Cross-Validation Framework | - Samples list of batch - 2504 samples list - Population meta |
|
Variant Filtering | - 3202 samples 1KGP1 - 2504 samples list |
- Raw imputation panel | |
Data Simulation | - SNP-array pos data2 - Samples list of batch - Raw imputation panel - GRCh38/hg38 - URL metadata |
- Pseudo-array VCFs - Downsampled BAM |
|
Genotype Imputation | lpWGS imputation | - Samples list of batch - Phasing reference - Raw imputation panel - Downsampled BAM |
- lpWGS VCF files |
SNP arrays imputation | - Samples list of batch - Phasing reference - Raw imputation panel - Pseudo-array VCFs |
- SNP-array VCF files | |
Evaluation | Restructure imputed data | - lpWGS VCF files - SNP-array VCF files - Population meta - Raw imputation panel |
- Restructed lpWGS VCFs - Restructed SNP-array VCFs - True VCFs |
lpWGS performance | - Restructed lpWGS VCFs - Restructed SNP-array VCFs - True VCFs |
- LPS-arrays evaluation output - LPS visualizing figures - LPS visualizing tables |
|
PRS performance | - Restructed lpWGS VCFs - Restructed SNP-array VCFs - True VCFs - Base sumstats |
- Raw PRS scores - Percentile PRS scores - PRS visualizing figures - PRS visualizing tables |
-
Marta Byrska-Bishop, Uday S Evani, Xuefang Zhao, Anna O Basile, Haley J Abel, Allison A Regier, André Corvelo, Wayne E Clarke, Rajeeva Musunuri, Kshithija Nagulapalli, and others. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell, 185(18):3426–3440, 2022. ↩
-
Dat Thanh Nguyen, Trang TH Tran, Mai Hoang Tran, Khai Tran, Duy Pham, Nguyen Thuy Duong, Quan Nguyen, and Nam S Vo. A comprehensive evaluation of polygenic score and genotype imputation performances of human snp arrays in diverse populations. Scientific Reports, 12(1):17556, 2022. ↩