Genome to Function 3. Import VCF Files & Filter Variant Features
(A) Preparation - Download the following files by clicking on these links.
-
https://www.jalview.org/tutorial/ENSG00000113924.hg19.vcf.gz
-
https://www.jalview.org/tutorial/ENSG00000113924.hg19.vcf.gz.tbi
(B) Open a locus view for a gene and its linked CDS/Protein view
-
Import ‘ENSG00000113924’ into Jalview from the Ensembl database via File ⇒ Fetch Sequences ⇒ ENSEMBL. Transcripts are shown aligned to the reference genome locus.
-
Open the Protein-CDS split-screen view by selecting Calculate ⇒ Get Cross-References ⇒ UNIPROT.
-
To save screen space, hide the annotation rows by un-ticking the option Show annotations in the Annotations menu. Do this for the protein and CDS panels.
(C) Overlay DNA alignment’s features on the protein sequences
-
Open the Sequence Feature dialog box by clicking View ⇒ Feature Settings. Select the Protein tab, then tick the option Show CDS Features (below the colour transparency slider).
-
Return to the CDS tab (centre top), in the Show column deselect the exon option, and select OK. This leaves only the sequence_variant selected.
(D) Import variants from VCF files
-
In the upper DNA panel of the CDS/Protein alignment that is entitled ‘retrieved from Ensembl’, select File ⇒ Load VCF File.
-
Use the file browser to locate the ENSG00000113924.hg19.vcf.gz.tbi file downloaded earlier. Load the file by selecting Open. This can take a few seconds. The alignment status bar reports how many variants are added.
-
If the alignment isn’t coloured, go to View ⇒ Show Sequence Features, this toggles the features on the alignment on and off.
-
Select View ⇒ Feature Settings to open the Sequence Feature Settings dialog box.
-
A VCF group is listed in the upper region of the dialog box in the CDS tab. Deselects the other database groups dbSNP, ensemble_havana, HGMD-PUBLIC, havana, that are listed, but leave the VCF group ticked.
-
In the Feature Type column in the Sequence Feature Settings dialog box, right-click the mouse on the sequence_variant name. In the context menu that opens select the hide all columns that do not contain a variant option.
-
Adjust the Colour transparency slider so that the features are almost transparent.
-
Click OK to close the Sequence Feature Settings dialog box.
-
Open a Feature details table, by placing the mouse cursor on a variant on the alignment, right-click the mouse and select Feature details ⇒ the names of the feature.
Question: How do the VCF features differ from the previous features table?
(E) Select Variants from VCF files using Filters
-
Place the mouse cursor on a blue triangle in the alignment ruler and right-click the mouse, in the context menu that opens select ⇒ Reveal All to undo the hidden column effect.
-
Re-open the Sequence Feature Setting dialog box and select the CDS tab, in the Configuration column click the blank sequence_variant cell. This opens the Display settings for sequence_variant features dialog box.
-
In the Filters panel (lower third), in the Label drop-down menu select AF_fin (allele frequency in Finnish population)
-
In the adjacent drop-down menu select greater than > option
-
In the next cell enter the number 0.4
-
Click OK
-
-
Scroll across the alignment, place the mouse cursor on a coloured variant, right click to open the context menu and select Feature details ⇒ the names of the feature.
-
Scroll down the Feature details table to the AF_Fin entry and view the associated number.
(F) Select Features using Filters
-
Open the Sequence Feature Settings dialog box and select the CDS tab, select to enable the dbSNP and HGMD-PUBLIC databases.
-
In the Configuration column, click on the sequence_variant cell. This re-opens the Display setting dialog box.
-
In the Filters panel, remove the previous filters using the cross
-
Click on the Label drop-down menu and select consequence_type
-
Select Contains option from the adjacent drop-down menu
-
In the blank cell enter the text stop_gained (use underscore)
-
Click OK
-
-
View the coloured variants in the alignment. Only those variants that have a stop gain variant are highlighted on the alignment.
(G) Explore human population variants in the context of orthologs and 3D structure
-
In the protein panel, select the sequence name of ENSP00000283871 and right-click the mouse.
-
This opens a context menu and select 3D Structure Data.
-
In the Structure Chooser dialog box select the structure 1ey2, then select New View to open the 3D structure.
-
Examine the structural locations for the stop-gained feature. (If the sequence is green, go to Sequence Feature dialog box and in the protein tab untick the Show box for RESNUM and click OK.)