09 July 2019

Recently (07/09/2019), dbSNP have been updated dbSNP153 from dbSNP152. However, NCBI only provided dbSNP153 in hg38 (GRCH38) version without any source for hg19 version. Here, I prepared a approach to generate dbSNP153 in hg19..

dbSNP153 in hg19, GRCH37

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz
wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz.tbi
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.25_GRCh37.p13_assembly_report.txt
awk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.25_GRCh37.p13_assembly_report.txt > dbSNP-to-UCSC-GRCh37.p13.map
perl -p -i -e '{s/chr//}' dbSNP-to-UCSC-GRCh37.p13.map
bcftools annotate --rename-chrs dbSNP-to-UCSC-GRCh37.p13.map GCF_000001405.25.gz -Oz -o dbSNP153.hg19.vcf.gz
bcftools view dbSNP153.hg19.vcf.gz -Ov -o dbSNP153.hg19.vcf

dbSNP153 in hg38, GRCH38

Bash, Perl, Python and (GATK or CrossMap), awk and wget are required in this approach. Crossmap is Python based method which I don’t recommend to use since the version problem may waste tons of your time. I prefer to use GATK4 LiftoverVcf.

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz
wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz.tbi
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.38_GRCh38.p12_assembly_report.txt
awk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.38_GRCh38.p12_assembly_report.txt > dbSNP-to-UCSC-GRCh38.p12.map
bcftools annotate --rename-chrs dbSNP-to-UCSC-GRCh38.p12.map GCF_000001405.38.gz | gawk '/^#/ && !/^##contig=/ { print } !/^#/ { if( $1!="na" ) print }' | bgzip -c > GCF_000001405.38.dbSNP153.GRCh38p12b.GATK.vcf.gz
gatk LiftoverVcf -I GCF_000001405.38.dbSNP152.GRCh38p12b.GATK.vcf.gz -O dbSNP153.hg19.vcf -C hg38ToHg19.over.chain.gz --REJECT rejected.vcf -R ~/hpc/db/hg19/hg19.fa

liftOver with python CrossMap.py

python CrossMap.py vcf hg38Tohg19.over.chain.gz GCF_000001405.38.dbSNP153.GRCh38p12b.GATK.vcf.gz ~/hpc/db/hg19/hg19.fa  GCF_000001405.38.dbSNP153.hg19.gz
perl -p -i -e '{s/chr//}' dbSNP-to-UCSC-GRCh38.p12.map

02/16/2020

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz
wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz.tbi
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.25_GRCh37.p13_assembly_report.txt
awk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.25_GRCh37.p13_assembly_report.txt > dbSNP-to-UCSC-GRCh37.p13.map
perl -p -i -e '{s/chr//}' dbSNP-to-UCSC-GRCh37.p13.map
bcftools annotate --rename-chrs dbSNP-to-UCSC-GRCh37.p13.map GCF_000001405.25.gz | gawk '/^#/ && !/^##contig=/ { print } !/^#/ { if( $1!="na" ) print }' | bgzip -c > dbSNP153.hg19.vcf.gz

03/11/2020 (again, I moved to UW-Madison, department of Medical Genetics, I need to rebuild the dbSNP153 in deepThought server)

### Script to check plink .bim files against HRC/1000G for strand, id names, positions, alleles, 
### ref/alt assignment, William Rayner 2015, wrayner@well.ox.ac.uk, Version 4.2.7
cd /mnt/sas0/AD/sguo234/asa
wget http://www.well.ox.ac.uk/~wrayner/tools/HRC-1000G-check-bim-v4.2.7.zip
wget http://ngs.sanger.ac.uk/production/hrc/HRC.r1-1/HRC.r1-1.GRCh37.wgs.mac5.sites.tab.gz
wget https://www.well.ox.ac.uk/~wrayner/tools/1000GP_Phase3_combined.legend.gz
gunzip 1000GP_Phase3_combined.legend.gz
unzip HRC-1000G-check-bim-v4.2.7.zip
gunzip HRC.r1-1.GRCh37.wgs.mac5.sites.tab.gz
wget http://qbrc.swmed.edu/zhanxw/software/checkVCF/checkVCF-20140116.tar.gz
tar xzvf checkVCF-20140116.tar.gz
## compare bim to HRC European Population(HEP)
perl HRC-1000G-check-bim.pl -b RA3000.R3.bim -f RA3000.R3.frq -r HRC.r1-1.GRCh37.wgs.mac5.sites.tab -h
sh Run-plink.sh
## compare bim to 1000G East-Asian Population(EAP)
perl HRC-1000G-check-bim.pl -b RA3000.R3.bim -f RA3000.R3.frq -r 1000GP_Phase3_combined.legend -g -p EAS
sh Run-plink.sh
wget https://raw.githubusercontent.com/Shicheng-Guo/Gscutility/master/addfakegenotype.pl
perl addfakegenotype.pl > dbSNP153.hg19.plink.vcf
plink --vcf dbSNP153.hg19.plink.vcf --make-bed --allow-extra-chr --out dbSNP153.hg19

How to share dbSNP153(hg19,GRCH37) and how to download dbSNP153(hg19,GRCH37)

In order to share dbSNP153(hg19,GRCH37), I uploaded dbSNP153(hg19,GRCH37) to . You can download dbSNP153(hg19,GRCH37) with the follow link. Good luck for your analysis. Finally, Thanks to Dr. Raony Guimarães for the help on the ideas of CrossMap to liftover vcf files..



blog comments powered by Disqus