From 1000 Genome Project, we download high-coverage (30X) VCF files containing 3202 samples (folder link).
Warning
Be sure to verify the MD5 checksums of the VCF files. Due to their large size, file transfers may be prone to interruption or corruption during transmission.
Code
Code was used to download VCF files containing 3202 samples.
set-ue
RAW_VCF=$1SAMPLE_LIST=$2FILTERED_VCF=$3# Get samples in each batch and filtering to get biallelic variantsbcftoolsview\-S$SAMPLE_LIST$RAW_VCF\-m2-M2\-vsnps|bcftoolsview\--exclude'AC<=2'\-Oz-o$FILTERED_VCF# Indexing the filtered VCFbcftoolsindex-f$FILTERED_VCF
Marta Byrska-Bishop, Uday S Evani, Xuefang Zhao, Anna O Basile, Haley J Abel, Allison A Regier, André Corvelo, Wayne E Clarke, Rajeeva Musunuri, Kshithija Nagulapalli, and others. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell, 185(18):3426–3440, 2022. ↩