Skip to content

Pseudo-array imputation

Requirements

  • Ubuntu 22.04 (8 CPUs, 32 GB)
  • bcftools (version==1.13)
  • SHAPEIT5 (version==5.1.1)
  • Minimac3 (version==2.0.1)
  • Minimac4 (version==1.0.3)

Input data

Array imputation workflow

The reference panel (VCF) was used directly as the phasing reference for pseudo-array genotype data (VCF) and was additionally indexed using Minimac3 to produce the required m3vcf files for imputation. Phasing was performed prior to imputation using SHAPEIT5, with intermediate results stored in BCF format. Imputation was then carried out using Minimac4, generating the final imputed VCF outputs.

Prepare imputation reference

Code

This script extracts a reference panel, phases pseudo SNP array data using SHAPEIT5, and prepares the reference for imputation by indexing it in Minimac3 format.

set -ue


CHR=$1                             # Chromosome number (e.g., 1, 2, ..., 22)
ARRAY_NAME=$2                      # Name of the pseudo-array (e.g., array1, array2, ...)    
BATCH_SAMPLE_LIST=$3               # /path/to/batch_sample_list_file
PSEUDO_ARRAY_VCF=$4                # /path/to/pseudo_array_vcf_file
REFERENCE_VCF_FILE=$5              # /path/to/reference_vcf_file.vcf.gz
PHASING_REFERENCE=$6               # /path/to/phasing_reference_file.vcf.gz

## Extract reference
bcftools view        -S ^${BATCH_SAMPLE_LIST} ${REFERENCE_VCF_FILE} |\
bcftools annotate    --rename-chrs rename_chr.txt \
                     -Oz -o ref_chr${CHR}.vcf.gz

bcftools index -f ref_chr${CHR}.vcf.gz

## Phasing
shapeit5_phase_common_static --input ${PSEUDO_ARRAY_VCF}   \
                             --reference ref_chr${CHR}.vcf.gz         \
                             --region 4 --map ${PHASING_REFERENCE}   \
                             --thread 8                               \
                             --output phased_${ARRAY_NAME}_chr${CHR}.bcf

## Indexing by Minimac3
Minimac3 --refHaps ref_chr${CHR}.vcf.gz   \
         --processReference               \
         --prefix m3vcf_ref_chr${CHR}     \
         --cpus 8 
rename_chr.txt was used to convert to chromosome numeric format.

Imputation process

Code

Genotype imputation is performed using Minimac4. The phased BCF file is converted and indexed, imputed against a reference panel, and temporary files are removed upon completion.

set -ue

ARRAY=$1
CHR=$2

## Input
PHASED_BCF=phased_${ARRAY}_chr${CHR}.bcf
MINIMAC3_INDEX_VCF=m3vcf_ref_chr${CHR}.m3vcf.gz

## Imputation
bcftools view -Oz -o tem_${ARRAY}_chr${CHR}.vcf.gz ${PHASED_BCF}
bcftools index -f tem_${ARRAY}_chr${CHR}.vcf.gz


minimac4 --refHaps ${MINIMAC3_INDEX_VCF}         \
         --ChunkLengthMb 50                      \
         --ChunkOverlapMb 5                      \
         --haps tem_${ARRAY}_chr${CHR}.vcf.gz    \
         --format GT,DS,GP                       \
         --prefix imputed_${ARRAY}_chr${CHR}     \
         --ignoreDuplicates                      \
         --cpus 8                                \
         --vcfBuffer 1100

rm tem_${ARRAY}_chr${CHR}.vcf.*

Output data

  • SNP-array VCF files