The widespread use of high-throughput sequencing techniques is leading to identification of a rapidly increasing number of potentially disease-associated genes and pathogenic variants. Pathogenicity assessment of new variants can be supported by using publicly available databases and scores. However, these data sources may be difficult to exploit. Here, we present aRgus https://argus.urz.uni-heidelberg.de/ , a stand-alone R/shiny web server application for user-friendly compilation and visualization of gene, protein, variant, and functional impact prediction data. Our application provides a lightweight tool to access multilevel data sources (ENSEMBL, dbNSFP, gnomAD, UniProt, as well as Simple ClinVar), and enables visualization of exon-intron structure and UniProt protein domain annotation, together with ClinVar and gnomAD variant data. aRgus automatically determines the canonical transcript based on the user-supplied HGNC gene symbol and gathers all relevant data. The user can choose from a panel of six visualizations: 1.) unspliced transcript plot; 2.) protein plot; 3.) and 4.) the mutational constraint plots of pathogenic and likely pathogenic ClinVar variants, as well as tolerated gnomAD variants, respectively; 5.) a polynomial regression model with position-coded heatmap depiction of all annotated prediction score values; and 6) groupwise statistical comparison of scores as violin plots. An interactive table is available including all ClinVar variants and all annotated non-synonymous single nucleotide variants with color-coded prediction score values. All plots and tables can be exported separately. aRgus enables gene- and position-specific prediction score modeling to assess proteins and identification of regions susceptible to missense variation up to single amino acid resolution. It is a powerful tool for enhanced variant interpretation.
This website is free and open to all users and there is no login requirement.
University Hospital Heidelberg
Center for Pediatrics and Adolescent Medicine
Division of Pediatric Epileptology
Im Neuenheimer Feld 430
D-69120 Heidelberg, Germany
Julian Schröter, MD
Steffen Syrbe, MD
University Hospital Heidelberg
Center for Pediatrics and Adolescent Medicine
Division of Neuropediatrics and Metabolic Medicine
Im Neuenheimer Feld 430
D-69120 Heidelberg, Germany
Heiko Brennenstuhl, MD, MBA
Tal Dattner
Dominic Lenz, MD
Prof. Stefan Kölker, MD
Thomas Opladen, MD, MHBA
Christian Staufner, MD
Prof. Georg F. Hoffmann, MD
University Hospital Heidelberg
Institute of Human Genetics
Im Neuenheimer Feld 366
D-69120 Heidelberg, Germany
Prof. Christian P. Schaaf, MD
Interdisciplinary Center for Scientific Computing (IWR)
Engineering Mathematics and Computing Lab (EMCL)
Im Neuenheimer Feld 205
D-69120 Heidelberg, Germany
Prof. Vincent Heuveline, PhD (Head of EMCL)
Alejandra Jayme, MSc
German Cancer Research Center (DKFZ)
National Center for Tumor Diseases (NCT) Heidelberg
Molecular Precision Oncology Program
Computational Oncology
Im Neuenheimer Feld 460
D-69120 Heidelberg, Germany
Daniel Hübschmann, MD, PhD
Jennifer Hüllein, PhD
Sebastian Uhrig, PhD
Heidelberg University Hospital
Institute of Human Genetics
Im Neuenheimer Feld 366
D-69120 Heidelberg, Germany
Prof. Christian Schaaf, MD
University Medical Center Leipzig
Institute of Human Genetics
Philipp-Rosenthal-Str. 55, Building W
D-04103 Leipzig, Germany
Bernt Popp, MD
Physician-Scientist-Program
Medical Faculty of the University of Heidelberg
Heiko Brennenstuhl, MD, MBA
Julian Schröter, MD
Dietmar Hopp Foundation
Grant 1DH1813319
Steffen Syrbe, MD
Julian Schröter, MD
Deutsche Forschungsgemeinschaft (DFG)
Grant PO2366/2-1
Bernt Popp, MD
Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, El Houdaigui B, Fatima R, Gall A, Garcia Giron C, Grego T, Guijarro-Clarke C, Haggerty L, Hemrom A, Hourlier T, Izuogu OG, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Gonzalez Martinez J, Marugán JC, Maurel T, McMahon AC, Mohanan S, Moore B, Muffato M, Oheh DN, Paraschas D, Parker A, Parton A, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Steed E, Szpak M, Szuba M, Taylor K, Thormann A, Threadgold G, Walts B, Winterbottom A, Chakiachvili M, Chaubal A, De Silva N, Flint B, Frankish A, Hunt SE, IIsley GR, Langridge N, Loveland JE, Martin FJ, Mudge JM, Morales J, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Cunningham F, Yates AD, Zerbino DR, Flicek P.
Ensembl 2021.
Nucleic Acids Res. 2021 Jan 8;49(D1):D884-D891. doi: 10.1093/nar/gkaa942. PMID: 33137190; PMCID: PMC7778975.
Landrum MJ, Chitipiralla S, Brown GR, Chen C, Gu B, Hart J, Hoffman D, Jang W, Kaur K, Liu C, Lyoshin V, Maddipatla Z, Maiti R, Mitchell J, O'Leary N, Riley GR, Shi W, Zhou G, Schneider V, Maglott D, Holmes JB, Kattman BL.
ClinVar: improvements to accessing data.
Nucleic Acids Res. 2020 Jan 8;48(D1):D835-D844. doi: 10.1093/nar/gkz972. PMID: 31777943; PMCID: PMC6943040.
Pérez-Palma E, Gramm M, Nürnberg P, May P, Lal D.
Simple ClinVar: an interactive web server to explore and retrieve gene and disease variants aggregated in ClinVar database.
Nucleic Acids Res. 2019 Jul 2;47(W1):W99-W105. doi: 10.1093/nar/gkz411. PMID: 31114901; PMCID: PMC6602488.
UniProt Consortium.
UniProt: the universal protein knowledgebase in 2021.
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100. PMID: 33237286; PMCID: PMC7778908.
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O'Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME; Genome Aggregation Database Consortium, Neale BM, Daly MJ, MacArthur DG.
The mutational constraint spectrum quantified from variation in 141,456 humans.
Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27. Erratum in: Nature. 2021 Feb;590(7846):E53. PMID: 32461654; PMCID: PMC7334197.
Liu X, Li C, Mou C, Dong Y, Tu Y.
dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs.
Genome Med. 2020 Dec 2;12(1):103. doi: 10.1186/s13073-020-00803-9. PMID: 33261662; PMCID: PMC7709417.
Rodriguez JM, Pozo F, Cerdán-Vélez D, Di Domenico T, Vázquez J, Tress ML.
APPRIS: selecting functionally important isoforms.
Nucleic Acids Res. 2021 Nov 10:gkab1058. doi: 10.1093/nar/gkab1058. Epub ahead of print. PMID: 34755885.
RStudio Team (2020).
RStudio: Integrated Development for R.
RStudio, PBC, Boston, MA URL http://www.rstudio.com/ .
Winston Chang, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert and Barbara Borges (2021).
shiny: Web Application Framework for R.
R package version 1.7.1. https://CRAN.R-project.org/package=shiny
The initial and sole mandatory input for an aRgus query is the HGNC symbol of your gene of interest which can be either directly typed in the search bar or prompted after entering single letters contained by the desired gene name (> Select gene ). Prompts can subsequently be selected from the drop-down list which initiates the query. In order to harmonize variant data, aRgus now selects the corresponding Ensembl feature ID and RefSeq ID of the MANE -curated canonical transcript. If no MANE transcript is available, the highest curated APPRIS transcript is automatically selected (Fig. 1A).
Select plotsIf no pre-selection has been made, all plot types are displayed on the main panel directly after the query entry in the search bar including the Gene, Protein, ClinVar, gnomAD, and in-silico scores visualizations (see plot details below). The desired compilation of individual plots can be selected using check boxes under Select plots (Fig. 1B).
Fig. 1: Select gene (A) and Select plots (B) functions.
Here, the display of all ClinVar variants is selected by default. Alternatively, the ClinVar plot can be restricted to pathogenic (classified as pathogenic or likely pathogenic) or benign variants (benign, likely benign, uncertain significance, conflicting interpretations) (Fig. 2).
Fig. 2: Selection of both pathogenic and benign (A) or only pathogenic ClinVar variants (B) using the ClinVar significance function.
Under Select scores , a personal synopsis of in-silico variant effect prediction scores can be compiled. Up to three scores can be plotted simultaneously in an arbitrary order. Additionally, values of all scores, available on aRgus, are listed in the tab In-silico scores of the bottom table (see section Table ).
Fig. 3: Score selection using the drop-down list (A) for simultaneous plotting of up to three different scores (B).
Each plot generated by aRgus can be exported separately. Here, the desired file format can be set to PNG or SVG ( Download format ). Below, the plot dimensions can be adjusted by Width and Height (unit = inches).
The desired compilation of plots can be set using the checkboxes under Select plots (see Toolbar ). All plots can be individually exported as PNG or SVG files using the download buttons on the top right of each plot (see Toolbar > Figure download options ).
TranscriptIn this plot, the canonical, unspliced transcript is shown as linear representation according to its chromosomal coordinates (x-axis) with the untranslated regions as gray backbone and the exons as green rectangles from left to right in ascending order regardless of the gene's position on the genomic forward or reverse strand. The underlying Ensembl and RefSeq transcript identifiers are depicted in the title. The chromosomal positions of pathogenic and benign ClinVar variants are indicated as lollipop segments on top. Variant descriptions to corresponding lollipops can be faded-in by selection in the ClinVar table (see Table > ClinVar ) (Fig. 4).
Fig. 4: The Transcript plot with selection of a pathogenic variant selected in the ClinVar table (red).
Here, the translated protein is shown in a linear representation including amino acid positions on the x-axis. Protein domains and regions annotated in the UniProt database are shown as colored rectangles and depicted in the figure legend. The corresponding UniProt identifier is displayed in the plot title. As in the Gene plot, amino acid positions of pathogenic and benign ClinVar variants are indicated as lollipop segments and variant description can be faded-in by selection in the ClinVar table (Fig. 5).
Fig. 5: The Protein plot with selection of a pathogenic variant selected in the ClinVar table (red).
In this plot, pathogenic and benign Clinvar variants are color-coded in red and blue, respectively, and can be either plotted together or solely according to their clinical significance using the ClinVar significance selection bar (see Toolbar > ClinVar significance). At the bottom, the amino acid position of the respective variants is shown as colored segments. On top, a density plot displays the variant distribution with respect to their variant count as depicted in the ClinVar database (see Fig. 2).
gnomADThis plot displays variant distributions and frequencies of healthy individuals gathered in the comprehensive gnomAD database. gnomAD variants are shown as bars according to their amino acid position and their allele count with logarithmic (log10) scaling of the y-axis. Variants are grouped in two plots depending on their identification within the scope of clinical testing using whole-exome (top, green) or whole-genome sequencing (bottom, blue; Fig. 6).
Remark: Regarding genetic disorders with recessive inheritance, pathogenic variants are naturally present in healthy heterozygous variant carriers within the general population and therefore not be generally classified as "benign".
Fig. 6: The gnomAD plot with variants from the exomes (green) and genomes (blue) datasets.
This plot visualizes value distributions of in-silico scores that predict the variant effect on structure and function of the translated protein. The score values are pre-calculated for all biologically possible non-synonymous single nucleotide variants (nsSNVs) including three possible nucleotide substitutions at any base position throughout the transcript. Depending on the position in the base triplet, up to nine different resulting amino acid substitutions are possible, in case of a missense variant, from which any substitution is assigned by a distinct in-silico score value. These detailed scores are listed in the In-silico scores table (see Table > in-silico scores). For intuitive interpretation of these tabular data, this plot comprises two different visualizations of the in-silico score value distributions throughout the amino acid sequence. For both approaches, arithmetic means of the different score values at any amino acid position are calculated. On top, the resulting data is shown as a smoothed curve based on a polynomial regression model. At the bottom, the data is visualized as a heatmap where score values are color-coded with respect to their prediction as a rather damaging or non-damaging effect on the protein. The score range is shown in the legend below and color-coded with respect to the recommended cut-off value. Using the Score selection bar, the desired score can be chosen from a list of 26 different scores and up to three scores can be displayed simultaneously for comparison (see Scores ). By default, the REVEL and CADD_phred scores are shown initially.
Fig. 7: The in-silico scores plot with two different visualizations including a smoothed curve (top) and heatmap (bottom).
Here, score values are assigned to the corresponding pathogenic and benign ClinVar as well as gnomAD variants and are compared to score values of all biologically possible nsSNV shown in the in-silico table using a t-test. These groupwise comparisons are visualized as violin plots with included boxplots indicating median as well as first and third quartile (Fig. 8). Significant differences are indicated with * (p-value < 0.05), ** (p-value < 0.01), and *** (p-value < 0.001). Thereby, a possible discrimination between pathogenic and benign variants by a specific score can be analyzed.
Fig. 8: The Score statistics plot statistically comparing score values of pathogenic and benign ClinVar as well as gnomAD variants with all biologically possible nsSNVs.
The bottom table can be completely displayed by scrolling down. Data presented in the table can be filtered using the filter bars on top of each column. The number of displayed rows can be chosen from 25 to 100 using the Show […] entries bar on the bottom left. Additionally, specific terms in the table can be searched using the search bar on the top right. Data shown in both tabs can be copied to the clipboard and exported as CSV and Excel files using the buttons Copy , CSV , and Excel in the top part of the table (Fig. 8).
Here, all ClinVar variants of the queried gene are listed including their position specification related to chromosome and transcript as well as the associated phenotype as specified in the ClinVar entries. Compilation of ClinVar variant details was performed using the Simple ClinVar filtering algorithm. If the user is interested in a specific variant or a subset of variants, single or multiple rows can be selected and subsequently, the respective variants are displayed in the Gene and Protein plots with highlighting according to their clinical significance (red = pathogenic, blue = benign). ClinVar variant distribution data are additionally visualized in the ClinVar plot (see Plots > ClinVar ).
In-silico scoresIn this tab, all biologically possible nsSNVs are listed including their corresponding in-silico prediction score values (see Plots > In-silico scores ). If the simulated variant is present in the gnomAD exomes or genomes database, the respective allele count and frequency are shown in the last four columns gnomAD_exomes_AC, gnomAD_exomes_AF, gnomAD_genomes_AC , and gnomAD_genomes_AF . These tabular data are additionally visualized in the gnomAD and in-silico scores plots (see Plots . Table cells containing score values are colored in orange if a damaging variant effect is predicted according to the recommended value ranges of the score creators. As the MVP score provides two different cut-off values for "constraint" and "non-constraint genes", cells are color-coded in red and orange, respectively. Additional to the search and filter functions, score ranges can be selected using a scrollbar under the column title.
The subset of 43 pre-calculated variant effect prediction scores provided on aRgus is derived from dbNSFP which is an integrated database of functional annotations from multiple sources for the comprehensive collection of 84,013,490 human nsSNVs. A list of the scores found on aRgus with their corresponding source is shown below. The first 26 scores can additionally be visualized using the In silico scores and Score statistics plots and all 43 scores can be found in the in silico scores table. For further details on score ranges please refer to the dbNSFP v.4.1a Readme file, dbNSFP homepage , and the score web references listed below:
aRgus was tested on different browsers and operating systems which are shown in the table below:
Linux: 5.15.10 Chrome (96.0.4664.110), Edge (n/a), Firefox (87.0), Safari (n/a)
MacOS: 12.0.1 Chrome (96.0.4664.110), Edge (96.0.1054.62), Firefox (95.0.1), Safari (15.1)
iOS: 15.1 Chrome (96.0.4664.110), Edge (96.1054.49), Firefox (40.1), Safari (15.0)
Windows: 10 Chrome (96.0.4664.110), Edge (96.0.1054.62), Firefox (95.0.1), Safari (n/a)
Heiko Brennenstuhl, MD, MBA
Center for Pediatrics and Adolescent Medicine
University Hospital Heidelberg
Im Neuenheimer Feld 430
D-69120 Heidelberg, Germany
For individual queries on aRgus, please refer to:
Heiko.Brennenstuhl@med.uni-heidelberg.de
IT resources have been kindly provided by heiCLOUD, a service of University Computing Center Heidelberg.
University Computing Centre Heidelberg (URZ)
University of Heidelberg
Im Neuenheimer Feld 330
D-69120 Heidelberg, Germany