Cross-Validation Framework

  • 10-fold cross-validation is used for selected 2054 samples.
  • Samples are distributed in 10 batches and stratified by superpopulation (EAS, EUR, SAS, AFR, AMR) to ensure balanced representation:
    • 4 batches of 251 samples
    • 6 batches of 250 samples
  • In each fold:
    • 90% of data serves as the reference panel.
    • 10% of data serves as the target set for imputation (using to prepare true VCFs and downsampled/psudo-array inputs).