Skip to content

Pseudo-array imputation

Requirements

  • Ubuntu 22.04 (8 CPUs, 32 GB)
  • bcftools (version==1.13)
  • shapeit5 (version==5.1.1)
  • Minimac3 (version==2.0.1)
  • Minimac4 (version==1.0.3)

Input data

Array imputation workflow

Prepare imputation reference

Code

This script extracts a reference panel, phases pseudo SNP array data using Shapeit5, and prepares the reference for imputation by indexing it in Minimac3 format

set -ue


CHR=$1                             # Chromosome number (e.g., 1, 2, ..., 22)
ARRAY_NAME=$2                      # Name of the pseudo-array (e.g., array1, array2, ...)    
BATCH_SAMPLE_LIST=$3               # /path/to/batch_sample_list_file
PSEUDO_ARRAY_VCF=$4                # /path/to/pseudo_array_vcf_file
REFERENCE_VCF_FILE=$5              # /path/to/reference_vcf_file.vcf.gz
PHASING_REFERENCE=$6               # /path/to/phasing_reference_file.vcf.gz

## Extract reference
bcftools view        -S ^${BATCH_SAMPLE_LIST} ${REFERENCE_VCF_FILE} |\
bcftools annotate    --rename-chrs rename_chr.txt \
                     -Oz -o ref_chr${CHR}.vcf.gz

bcftools index -f ref_chr${CHR}.vcf.gz

## Phasing
shapeit5_phase_common_static --input ${PSEUDO_ARRAY_VCF}   \
                             --reference ref_chr${CHR}.vcf.gz         \
                             --region 4 --map ${PHASING_REFERENCE}   \
                             --thread 8                               \
                             --output phased_${ARRAY_NAME}_chr${CHR}.bcf

## Indexing by Minimac3
Minimac3 --refHaps ref_chr${CHR}.vcf.gz   \
         --processReference               \
         --prefix m3vcf_ref_chr${CHR}     \
         --cpus 8 
rename_chr.txt was used to convert to chromosome numeric format.

Imputation process

Code

Genotype imputation is performed using Minimac4. The phased BCF file is converted and indexed, imputed against a reference panel, and temporary files are removed upon completion.

set -ue

ARRAY=$1
CHR=$2

## Input
PHASED_BCF=phased_${ARRAY}_chr${CHR}.bcf
MINIMAC3_INDEX_VCF=m3vcf_ref_chr${CHR}.m3vcf.gz

## Imputation
bcftools view -Oz -o tem_${ARRAY}_chr${CHR}.vcf.gz ${PHASED_BCF}
bcftools index -f tem_${ARRAY}_chr${CHR}.vcf.gz


minimac4 --refHaps ${MINIMAC3_INDEX_VCF}         \
         --ChunkLengthMb 50                      \
         --ChunkOverlapMb 5                      \
         --haps tem_${ARRAY}_chr${CHR}.vcf.gz    \
         --format GT,DS,GP                       \
         --prefix imputed_${ARRAY}_chr${CHR}     \
         --ignoreDuplicates                      \
         --cpus 8                                \
         --vcfBuffer 1100

rm tem_${ARRAY}_chr${CHR}.vcf.*

Output data

  • SNP-array VCF files