Cross-Validation Framework
- 10-fold cross-validation is used for selected 2054 samples.
- Samples are distributed in 10 batches and stratified by superpopulation (EAS, EUR, SAS, AFR, AMR) to ensure balanced representation:
- 4 batches of 251 samples
- 6 batches of 250 samples
- In each fold:
- 90% of data serves as the reference panel.
- 10% of data serves as the target set for imputation (using to prepare true VCFs and downsampled/psudo-array inputs).