Institute of Bioinformatics WWU Münster


The uORFdb provides a manually curated, browsable and convenient literature survey, based on all uORF-related references collected by the 'National Center for Biotechnology Information' available from the PubMed web page ( We apply an ongoing Boolean search for "upstream open reading" OR "uORF" OR "uORFs" OR "upstream initiation" OR "uAUG" OR "small open reading" OR "sORF" OR "upORF" OR "ribosome profiling". After manual removal of non-related accidental hits and prokaryotic data, only publications investigating eukaryotic or viral transcripts/uORFs were selected to generate the uORFdb. Along with the curation process, we defined multiple numerical, structural, sequential, and co-factor-related uORF-properties extracted from the cited literature.

Users may

Query uORFdb by gene or taxon.
The respective query field an the QUERY page allows flexible input terms including gene name, gene symbol, alias, NCBI Gene/Genebank ID, taxon, common name, or NCBI taxonomy ID.

Query uORFdb by uORF-related property.
The query page allows users to select references matching all properties or all papers where at least one filtered property is addressed by checking the corresponding box. 'My query should return references addressing all/any properties selected.'

Query uORFdb by manuscript category.
Users may limit returned references to the indicated manuscript categories.

Find below a description of the filter categories provided by uORFdb

Determinants of uORF presence or absence

Alternative promoters / Alternative splicing / Tissue-specific uORFs

Sequence analyses of the human transcriptome revealed that about 50% of mRNAs contain one or more upstream AUGs (uAUGs) between the 5'-cap-structure and the CDS. The general prevalence of uAUG is, although higher than initially anticipated, still lower than expected by normal distribution, arguing for an evolutionary negative selection. For stochastic reasons, the prevalence of uORFs increases with the length of the 5'-regulatory region, yet in specific cases the presence or absence of one or several uORF(s) is dependent on the transcript variant produced by transcription initiation from alternative promoters or due to alternative splicing. Some of these variants are specific to particular tissues.


In a recent study using global translational initiation sequencing, 54% of human transcripts displayed one or more translational initiation site(s) preceding the CDS. Surprisingly, about three-fourths of upstream translation was initiated by near-cognate, non-AUG initiation codons, further relativizing the classical `first-AUG'-role. Nevertheless, uAUG codons appeared to be functionally most effective in repressing CDS translation.

Structural and sequence-dependent uORF properties

Number / Length / Distance from 5'-cap / Distance from uORF-STOP to CDS / CDS-overlap

Many publications investigated the importance of structural and sequence dependent uORF properties in mediating translational regulation. The repression of downstream translation appears to be positively correlated with the number of uORFs per transcript, the length of the uORF and the distance between the 5'-cap structure and the uORF initiation codon. Furthermore, translational repression correlates negatively with the distance between the uORF-STOP and the CDS initiation site and is even more profound, when the uORF overlaps the CDS initiation codon. Taken together, current data suggest a dynamic regulatory model, where indispensable initiation cofactors detach gradually from ribosomes during the elongation phase of uORF translation but may be reloaded to allow reinitiation at the CDS.

RNA-secondary structure

In eukaryotes long GC-rich transcript leader sequences tend to form stable secondary structures that inhibit ribosome progression and CDS translation. Similarly, specific secondary structures within or in the surrounding of uORFs may affect translation efficiency.

Functional consequences of uORF-mediated translational control

CDS repression/ CDS induction / Start site selection

Most uORFs analyzed to date repress translation of the subsequent initiation site(s) and inhibit/diminish translation of the main protein. Post-uORF initiation at the CDS initiation codon may occur from leaky scanning of ribosomes across the uORF start codon or from reinitiation, if the uORF stop codon precedes the CDS. Despite of a generally repressive function on downstream translation, several exceptions have been described where translation of specific uORFs or a certain alignment of subsequent uORFs mediates enhanced CDS initiation. Furthermore, uORF-directed start site selection can result in the production of N-terminally distinct protein isoforms that harbor unique biological functions.

Nonsense-mediated decay / mRNA destabilization

Nonsense-mediated decay (NMD) of mRNA is activated when specific cellular surveillance mechanisms detect premature termination of protein translation. Such premature termination events may result from the use of nonsense codons that arise in mature transcripts due to mutations, incorrect splicing or aberrant initiation site selection. uORFs have been suggested to induce NMD by conferring additional termination codons to the 5'-leader sequence of certain transcripts. Similarly, another mode of termination-dependent RNA destabilization that is distinct and independent of the common NMD pathway has been reported in yeast.

Ribosome load / Ribosome stalling / Ribosome pausing / Ribosome shunting

Artificial of mutational deletion of a uORF may result in increased ribosome load on a given transcript associated with increased translational activity. On the contrary, ribosome stalling at the uORF termination codon or pausing of ribosomes on inhibitory uORF structures may hamper CDS translation. Underlining the multiplicity of uORF-mediated translational control mechanisms, certain uORFs facilitate enhanced CDS translation by supporting a ribosome shunt across a highly structured and inhibitory 5'-transcript leader sequence.

Co-regulatory events affecting uORF functions

Kozak consensus sequence

Whether or not the ternary preinitiation complex recognizes an AUG or non-AUG triplet as a translational initiation codon is strongly influenced by the nucleotide context surrounding it. The optimal surrounding sequence for initiation is the Kozak consensus sequence. If the AUG codon is surrounded by a strong context, virtually all scanning ribosomes recognize the start codon and initiate translation. In an adequate or weak surrounding, a number of ribosomes scan through the initiation site and remain ready to recognize an initiation site located further downstream. Since the quality of the Kozak consensus sequence is not the only determinant of translation initiation efficiency, the mere evaluation of the surrounding nucleotides does not permit the precise prediction of initiation.

Translational status

Regulation through uORFs allows rapid integration of the overall translation status of a cell to adjust the translation rates of important regulatory proteins. The translational status is dependent on extracellular signals, environmental conditions and nutrient supply and is mainly reflected by the abundance of initiation co-factors required to form a functional preinitiation complex (ternary complex). A number of studies in yeast and human transcripts precisely analyzed uORF-mediated regulation under changing translational conditions.

Termination (context)

The sequence context surrounding a uORF termination codon may determine the reinitiation efficiency at downstream initiation sites. In particular, stable interactions between the terminating ribosome and the RNA, or stable base pairing of the RNA alone may cause ribosomal pausing or mediate premature mRNA decay.

uORF RNA/peptide sequence / Regulatory sequence motif / Co-factor/ribosome interaction

Altering the RNA- or peptide-sequence of a uORF frequently affects downstream translation. This suggests that either the uORF-encoded peptide or a specific RNA sequence mediates interaction with a co-factor or the translation machinery to regulate translation, or that specific secondary structure is functionally important.

Medical impact

Disease-related uORFs / Acquired mutations/SNPs

A defect in uORF-mediated translational control can be associated with the development of human disease. Despite of only few unequivocal cases at this time, it is evident that uORF mutations may be involved in a wide variety of diseases, including malignancies, metabolic or neurologic disorders, and inherited syndromes. Considering that many important regulatory proteins, including cell surface receptors, tyrosine kinases, and transcription factors, act in a dose-dependent fashion and possess uORFs, a substantial number of as yet unexplained pathologies might be traced back to uORF mutations altering expression levels of such key regulatory genes.

Manuscript categories

Mouse models / Ribosome profiling / Bioinformatics/arrays/screens / Proteomics

Pathophysiological importance of uORFs has been demonstrated by genetically altered mouse models. Recent progress in computational and sequencing based technologies, the development of the ribosome profiling method, and mass spectrometry approaches allow genomic wide studies of uORF function.

Methods / Review

Rather than describing individual transcripts, part of the bibliography on uORFs focuses on methods for their study or reviews particular aspects of the field of uORF research.