View Ensembl Features on Protein-CDS Alignments

Genome to Function 2. View Ensembl Features on Protein-CDS Alignments

(A) Import gene from Ensembl database

Select File ⇒ Fetch Sequences and select ENSEMBL from the list of databases in the New Sequence Fetcher dialog box, enter the accession number ENSG00000113924 and click OK. Progress is indicated in the status bar at bottom of the window.
Transcripts are shown aligned to the reference genome locus. Select Calculate ⇒ Get Cross-References ⇒ UNIPROT to open the Protein-CDS split-screen view.
Hide the annotation rows by deselecting the option in Annotations ⇒ Show annotations in the protein and CDS panels.

(B) Adjust the CDS panel's feature display

In the CDS panel, select View ⇒ Feature Settings to open the Sequence Feature Setting for CDS and Protein dialog box. Move the Sequence Feature dialog box to one side so it doesn’t conceal the alignment.
Untick the databases (eg dbSNP, ensembl_havana, HGMD-PUBLIC and havana) and view the effect in the alignment panel. Then reverse the process so that they are all selected.
Click Optimise Order button to ensure ’exon’ features are below ‘sequence_variant’ features. Or place the mouse cursor on sequence_variant name in the Feature Type column and click-and-drag it to the top of the list.

With the CDS tab (centre at the top) selected in the Sequence Feature dialog box, deselect the exon feature in the Show column, leaving only the sequence_variant selected.
Select the Protein tab (centre at the top), then tick the option Show CDS Features (below the colour transparency slider).
Move the slider on the Colour transparency scale to the right, and observe the effect in the protein panel.
Place the mouse cursor on the amino acid residues in the protein alignment that are coloured red (denoting the location of variants), and note the differences in the information in the tooltip of the different coloured residues.
Move the slider on the Colour transparency scale back to the left.

(D) Select and colour variants that contain a clinical_significance field

Return to the Sequence Feature dialog box and select the CDS tab (centre at the top).
In the Configuration column associated with the sequence_variant feature click on the blank cell.
This opens the Display setting for sequence_variant features dialog box.
- In the Colour panel, select By text of:
- in Label drop-down menu select clinical_significance
- Click OK to apply the new CDS alignment feature colouring
Place the mouse cursor on the coloured residues or bases in the alignment. Note each clinical significance type produces a different residue colour. Mouse over the coloured residues and examine the text in the associated tooltip.
For more information about these features, place the mouse cursor on a coloured residue, right-click to open a context menu, and select Feature details ⇒ the names of the feature to opens a Feature details table.
Compare the Feature details tables from different coloured variants in the alignment.

(E) Select and colour variants that contain a pathogenic label

Return to the Sequence Feature Settings dialog box and select the CDS tab (centre at the top).
In the Configuration column, click on the blank cell associated with the sequence_variant feature to re-open the Display setting for sequence_variant features dialog box.
- In the Filters panel (in the lower third), click on the Label drop-down menu and select clinical_significance
- Select Contains option from the adjacent drop-down menu
- In the blank cell enter the text pathogenic
- Click OK
In the alignment, mouse over the coloured residues and examine the text in the associated tooltip. This has selected for clinical_significance variants that are either ‘pathogenic’ or ‘likely pathogenic’.

(F) Import Uniprot Database Features

In the protein panel of the split-screen view, select Web Service ⇒ Fetch DB References ⇒ From EMBL to UNIPROT ⇒ UNIPROT.
Re-open the Sequence Feature dialog box from the protein panel by selecting View ⇒ Feature Settings.
View the protein features (Protein tab, centre at the top).

Note: If only one or two features are listed it is probably because a single sequence was selected, click the [ESC] key to clear the selection and fetch the DB References again.

Click the Optimise Order button. This re-orders the features based on the average length of each feature type.
Feature colours can be changed, click the ‘binding site’ Colour box. In the Select colour dialog box that opens, select cyan and click OK.
Scroll across the alignment to locate the position of the cyan binding site features.
Return to the Sequence Feature Settings dialog box. In the Protein tab, ensure the option Show CDS features (below the colour transparency slider) is ticked, then tick the box on top and click OK. This places the CDS features above the protein features in the protein alignment.

(G) View Features on 3D Structure

In the protein panel, select the sequence name ENSP00000283871 and right-click the mouse.
This opens a context menu and select 3D Structure Data. In the Structure Chooser dialog box select the structure 1ey2, then select New View to open the 3D structure.
Examine the structural locations for some of features such as the binding site. (If the sequence is green, go to Sequence Feature dialog box and and in the protein tab untick the Show box for RESNUM and click OK.)

If the Jmol window disappears, go to Web Service ⇒ Fetch DB References in the desktop window menu and select Jmol view for ENSP00000283871:1EY2. This brings the window to the front.