Structure of U1 RNA bound to U1A RRM domain

Gene Expression Regulation

pre-mRNA Processing

Prokaryotic and Eukaryotic protein expression processes compared

Eubacterial systems Eukaryotic systems
Translation is concurrent with transcription Transcript must be processed:
capping, splicing, polyadenylation, export
    No barrier restricts movement of transcript to translational apparatus mRNA is sequestered as RNP in the nucleus, and must be transported to the cytoplasm through the nuclear pore complex.
    No barrier restricts access of polypeptides to transcriptional and translational machinery. Both functional and regulatory factors controlling pre-mRNA production and processing must be imported into the nucleus.

    Post-transcriptional processing of pre mRNA in eukaryotic systems

    1) m7G-capping at 5' end
    2) splicing to remove introns
    3) polyadenylation at 3'-end
    4) sequestration as RNP

    5'-end capping

    5'-Capping occurs shortly after transcription initiation, when nascent transcripts are 20-25 nt in length. The Pol II CTP becomes hyperphosphorylated during the transition into productive elongation. The unphosphorylated CTD exists closely associated with Pol II core, as well as associated factors such as TFIIE. Phosphorylation dissociates the CTD so that it tails out behind the Pol II elongation complex. The end of the CTD tail binds two enzymes, capping enzyme (CE), and N7G methyltransferase (MT) (Shatkin and Manley, 2000).

    Mammalian CE catalyzes two reactions:

    m7-G-cap structure The mammalian CE is a single bifunctional enzyme wherease yeast CE exists as two separate subunits. Appropriate fragments of the mammalian enzyme can complement defects in the respective yeast subunits. G becomes linked to the triphosphate through its 5'-O, so has the opposite orientation to the other nucleotides in the chain.

    MT methylates the attached G on N7. Methylation in this position gives rise to a quarternary N+, changing G into its enolate form. In many cases, the initiating nucleotide is A, which can be N6 methylated as well.

    Capping protects the nascent pre-mRNA against degradation; failure to cap or loss of cap leads to rapid breakdown of the RNA. The m7G-cap is bound by two nuclear Cap Binding Proteins, CBP20 and CBP80. Only CBP20 appears to be in direct contact with the cap. These proteins play a role in initiating subsequent splicing reactions, and anti-CBP antibodies inhibit subsequent splicing reactions. Overexpression of short capped transcripts with no splicing sites can also inhibit splicing by competing for splicing factors.

    RNA Splicing

    genomic sequence of ovalbumin mapped onto its mRNA
    Metazoan genes commonly contain intervening sequences inserted into the open reading frame. The insertions are called introns (gray in the figure above) while the expressed fragments are called exons (coloured in the figure above). Introns must be removed by RNA splicing, which occurs concurrently with transcription. The long upper box in the figure represents the genomic DNA of ovalbumin, 7700 bp, mapped onto the mRNA, about 1900 nt, shown in the lower box. Introns occur less frequently and are much smaller in lower eukaryotes such as yeast.

    Removal of introns from pre-mRNA transcripts involves cleavage at the 5'- end of the intron by attack of a specific 2'OH group, the branch site. This forms a phosphodiester bond with the 5'-phosphate of the intron, creating a lariat structure.
    The intron lariat is then removed, proceeding by attack of 3'-OH on Exon 1 to displace the intron from the 5'-phosphate of Exon 2.
    During the whole process, the number of phosphodiester bonds remains constant, so this is not an endonuclease cleavage and ligation process as occurs in tRNA processing, but an ATP independent transesterification.

    At step 1, the phosphodiester bond between Exon1 and intron is converted into the 2'-branch site phosphodiester.

    At step 2, the phosphodiester bond between intron and Exon 2 is converted into the Exon 1 - Exon 2 phosphodiester.

    Some single-celled eukaryotes, e.g. the cilate Tetrahymena, produce pre-mRNA with self splicing introns. In these cases the intron forms a unique tertiary structure promoting self catalysis. The catalytic action is embodied in the RNA itself.. In some examples, catalysis involves attack by the 3'-O of a separate bound molecule guanosine nucleotide, and in other cases the 2'-O of an in-chain A produces the lariat structure.
    Autocatalysis is mediated via a metal ion, Mg2+ or Ca2+, bound to a specific site formed by the tertiary structure of the intron. Self splicing introns have been used as the models for ribozymes, or catalytic RNA. For additional information, see Ribosomes and Ribozymes

    In most eukaryotes, splicing is mediated by a large ribonucleoprotein complex comparable in size to the ribosome or polymerase II holoenzyme called the spliceosome.

    The spliceosome contains a specific set of U-rich small nuclear ribonucleoproteins or snRNPs.

    U1 5'- site recognition
    U2 branch site recognition
    U4 forms base paired complex and acts with U6
    U5 3'- junction binding of U4-U6 complex
    U6 complex with U4 makes up the spliceosome transesterase.
    The common spliceosome recognizes introns starting with 5'-GU and ending in AG-3'. More recently, a subclass of spliceosome has been found to recognize introns with 5'-AU and AC-3' ends. Since we typically read sequences in DNA, these have been called AT-AC introns. Site recognition involves snRNPs U11, U12 as analogs of U1 and U2, and the transesterase consists of variant snRNPs U4atac-U6atac bound by the conventional U5. (Tarn and Steitz, 1997).

    Incidence of introns increases with developmental complexity of the organism:

    Yeast has few introns, with well conserved and consistent splice and branch site sequences.
    Higher organisms often have multiple introns within a single pre-mRNA, poorly defined splice and branch site sequences, and a more complex regulatory system controlling their selection.
    Splicing may be constitutive meaning that the same introns are always identified and spliced out from a pre-mRNA, resulting in translation to yield a single protein product. However higher organisms make extensive use of alternative splicing to generate functionally different isoforms of a protein, which are expressed in particular states of differentiation or development. Regulatory mechanisms must determine which splice sites are selected.

    The number of genes mapped in the human genome (30-40 thousand) turned out to be significantly lower than prior estimates; however about 30% of genes appear to be expressed in multiple isoforms as a result of alternative splicing.

    More rarely, trans-splicing can also generate unique mRNAs by association and linking of exons from different pre-mRNA transcripts.

    Mechanisms for splice site selection - constitutive splicing

    The splicing reaction takes place in two distinct stages, first the assembly of a complex containing the U1 and U2 snRNPs, which bind to and identify the 5' site, branchpoint and 3' splice site. This is followed by recruitment of the catalytic unit consisting of U5 and the U4-U6 complex (Hastings and Krainer, 2001; Will and Luhrmann, 2001).

    In addition to the snRNPs (which consist of RNA and specific associated proteins) a number of accessory protein factors are involved in various stages of the splicing reaction.

    Common structural elements in accessory splicing factors

    RNA recognition motif, (RRM)

    RNA recognition motif present in U1A protein
    U1A protein (PDB 1URN)

    (also known as RBD, RNA binding domain) consists of 4 antiparallel b-sheet strands interspersed with two a-helices, in the pattern S1-H1-S2-S3-H2-S4. The middle pair of strands in the structure carry conserved sequences RNP-1 K/RGF/YG/AFVxF/Y, and RNP-2 L/IF/YV/IG/KN/GL/M.
    This pattern binds open loops of RNA, in which the core base stack is interrupted, allowing for many specific contacts between bases and the polypeptide. The conserved Phe or Tyr residues are positioned on the surface of the b-sheet structure, where they can stack with the bases in the loop.
    (For more details, see RNA-protein interactions. ) In addition to splicing factors, CBP20 also carries the RRM domain.
    RS domains domains rich in the dipeptide repeat Arg-Ser (RS in single letter code).
    RS domains are of major importance in protein-protein contact among splicing factors. The RS domain is a target for phosphorylation, and phosphorylation controls entry of splicing factors into the splicing reaction cycle. Positive Arg will ion pair with negative Ser-phosphate, and vice-versa.
    DExD/H box also known as the DEAD box (based on amino acid motif AspGlu x Asp/His), this is the signature for RNA helicase or unwindase, which is required for several RNA base pairing rearrangements that take place in splicing reactions. Although the transesterification reactions of splicing are ATP independent, the helicase reactions do consume ATP.

    Accessory protein factors

    (SR) proteins
    are a family of proteins in metazoans recognised by a specific mAB raised against spliceosomes, and contain many RS domains: the family includes SRp20, SRp30, SRp40, SRp55, SRp75, ASF (Alternate Splicing Factor), SC35. In many cases, the N-terminal end carries a typical RRM (RNA recognition motif) common to many RNP associated proteins.The C-terminus contains multiple RS or dipeptides, which may be highly phosphorylated. The enzyme SRPK-1(SR-protein kinase) is a specific protein kinase, activated during mitosis that causes redistribution of SR proteins and spliceosomes. The general role of SR proteins seems to be to bridge between other splicing components. This bridging process may then allow binding of factors to sites that are too weak for effective binding of the basal factors U1 and U2.
    Mammalian U2AF consists of 65 kDa binding factor for the polypyrimidine tract, containing with RS and RRM domains, plus a 35 kDa SR type subunit involved in binding to other spliceosomal components, and helps determine the 3' site, by acting as a bridging factor between the exons. The 35 kDa subunit is highly conserved in higher organisms, suggesting that its importance is in selection of weaker splice sites. A single protein, Mud2p, carries out a similar role in yeast.
    U1A, U1C, U1 70k are structural components of U1 snRNP. U1A and U1 70 k are classic RRM stem-loop binding proteins, and U1 70k also contains RS domains needed for protein-protein interaction.
    Sm proteins A set of seven proteins forming the common structural core of snRNPs, and bind to a conserved sequence RAUUUUUUGR in U1,U2, U4 and U5.
    SF proteins Splicing factors SF1 and SF3a/b are associated with U2. SF2/ASF (alternative splicing factor) should really be classified as a SR protein. It plays a role in exon selection during alternative splicing, and binds to U1 70k.

    Sequence of reactions in the splicing process

    Spliceosome E complex 1a) Formation of the commitment or E complex involves binding of factor U1 snRNP (complex of U1 RNA, U1A RRM protein, U1C and U1 70K protein) to the 5'-intron GU site. Recognition is by base pairing of the 3' end of U1 with the consensus sequence AG|GUAGGU (vertical bar is the exon-intron junction), and ATP is consumed in the base pairing process.
    SR accessory factors associate with the exon towards the 5' direction, and facilitate binding of U1.
    splicing complex A

    1b) This is generally followed by binding of U2 auxilary factor U2AF (Mud2 in yeast) to the pyrimidine rich tract between the branch site and the 3'- end of the intron. U2, plus the associated SF3a/b can then base pair with the metazoan branch point sequence YNYURAY to give the A complex. An additional protein factor, BBP (branch binding protein) binds in the region of the branchpoint A. In yeast, the branch point is more conserved, UACUAAC.

    base pairing of branchpoint sequence and U2 The branchpoint sequence (BPS) is identified by base pairing with a section of the U2 snRNA bearing the sequence 5'-GUAGUA-3'. The BPS sequence is mismatched at a single A, which becomes looped out, exposing its 2'OH. This exposed ribose OH acts as the nucleophile attacking the 5'-splice site.
    When the sequence at the branchpoint deviates from the consensus, associated protein factors such as U2AF are needed to promote complex formation.
    splicing complex B 2a) The U4-U6, U5 tri snRNP is then recruited to give the B complex. There is some evidence that U4/U6 recruitment to the 5'-splice site can precede U2 assembly at the branchpoint.

    2b) Finally some radical ATP-dependent base pair rearrangements occur to organize the catalytically competent C complex. Two tri-snRNP factors,U5 100p and U5200p have been shown to contain DExD/H box domains.
    splicing complex c post splice complex U5 first base pairs to the upstream exon and the 5'-splice site, a process that requires RNA unwindase activity to displace U1 from the exon.
    U6 base-pairs to U2, resulting in displacemant of U4. Finally, U5 base pairs to exon 2 near the 3'- splice site on the same stem loop that already holds Exon1, bringing the 3'-OH of Exon 1 into close proximity to 5'-p of exon 2.

    Nucleophilic attack of the phosphodiester bond completes the splicing process, releasing the intron as a lariat carrying the various splicing factors.

    Fidelity in splice site determination.

    Yeast splice sites tend to adhere closely to consensus sequences, and introns are few and small, so that the basic spliceosomal machinery can recognize sites effectively in most cases. In metazoans, there is more diversity in the splice site environment, introns are large, and multiple splicing of a single pre-mRNA is common. In these conditions, accessory factors are of greater importance to ensure correct splicing (Reed, 2000, Will and Lührmann, 2001).

    array of SR proteins on exon enhancers

    Exons contain elements called exonic enhancers which are targets for binding SR and related RRM containing proteins. The organization that lays out the splicing pattern starts with the Cap binding complex of CBP20 and CBP80, and possibly even with the CTD of the RNA Pol II (Zeng and Berget, 2000). An array of protein factors, e.g. SC35, bind in a cooperative manner between cap and first splice site to define its location. Other SR proteins bridge the intron gapfrom U1 70k to facilitate U2AF binding, and establish branch point and 3' splice site. Once the U2 complex is in place, SR proteins link up to the next 5' splice site, to continue the process. Thus the pattern of splice sites is established progressively from the 5' cap towards the 3' end, and the spliceosome does not select intron targets for splicing at random.

    In metazoans, certain members of the hnRNP (heterogeneous nuclear ribonucleotprotein) class bind to sites in particular in the introns. These include hnRNP A1, which binds indiscriminately to pre-mRNA and has a negative effect on spliceosome assembly. The function of the enhancers and SR proteins seems to be to exclude hnRNA from the exons, and a gap in the chain of enhancers and SR proteins allows hnRNP A1 to act as a splicing repressor.

    Splice site specificity is reasonably conserved across species, allowing expression of transgenes. Occasionally splice sites may be misread, for example when wild type Green Fluorescent Protein is expressed in higher plants, the polypeptide may be disrupted by misinterpretation of a coding sequence as a plant specific splicing site.

    Alternative splicing

    Variation in splice site selection can result in expression of different, developmentally specific isoforms of a polypeptide from a single gene. This type of tissue specific variation is widespread in metazoans. A significant percentage of genetic diseases that result from defective protein in a particular tissue involve misreading of a splicing signal due to mutation.


    Alternative splicing factor/Splicing factor 2, is an SR protein involved in 5' splice site selection, shown to bind directly to U1 70 K protein through protein-protein interaction involving the RS domains in each protein. Presence or absence of ASF in a splicing system can determine which 5' splice site is chosen, and ASF/SF2 activity and site selection can be regulated by phosphorylation and dephosphorylation.

    A 5'-splice site that conforms to the ideal sequence binds U1 without ASF, and may be considered strong site. If the sequence deviates from normal, the intrinsic affinity for U1 is weak, and the assistance of active ASF/SF2 is needed for U1 binding. ASF/SF2 acts as an antagonist of hnRNP A1, which is an indiscriminate splicing repressor and can cause 5' splice sites to be skipped (Eperon et al., 2000). This results in intron retention, which may further control polypeptide expression due to inclusion of premature stop codons in the mRNA.

    PTB and SXL

    Polypyrimidine tract binding protein, lacking any RS domain, acts as a specific splicing repressor by competing with U2AF. The U2AF is then available to bind at a branch point and 3' splice site further downstream, causing exon skipping as the proximal branch point and 3' splice site are bypassed. PTB itself exists in several alternatively spliced isoforms. SXL (sex lethal) is a Drosophila hnRNP-like protein inducing female specific expression patterns in fruit flies by alternative splicing. It also acts by competition for U2AF sites.

    In many cases, splicing regulation controls a pair of mutually exclusive exons:

    e.g. variants of the muscle proteins tropomyosin and aactinin, where the different sequences confer different regulatory properties.
    mutually exclusive splicing Non-muscle tropomyosin skips exon 2 due to the repression by PTB. For a-actinin, however, PTB causes skipping of exon 3 (Southby et al., 1999). The different behaviours are due to different strengths of the branch point site. For tropomyosin, the presence of higher levels of PTB in non muscle cells represses the branchpoint for exon 2, so exon 3 is selected.
    . In smooth muscle cells, there is no repression and exon 2 is selected, but its close proximity to exon 3 prevents exon 3 from being used, so the splicing machinery skips over to exon 4. As a result, exon 2 and exon 3 are mutually exclusive.

    The same activity of PTB has the opposite effect on a-actinin. In this case, an intrinsically strong branchpoint site at exon 3 is repressed by PTB in non muscle cells. In smooth muscle, where exon 3 is accessible, it appears to outcompete exon 2 for branch-site factors.


    With the exception of histone pre-mRNAs, which lack introns and are processed by a different method, eukaryotic mRNAs are terminated at the 3'-end by post-transcriptional addition of a poly-A tail (Shatkin and Manley, 2000). Poly-A is added at an encoded AAUAAA - polyadenylation signal sequence found near the 3' end of the transcript.

    The process involves two steps:

    Cleavage occurs 10-30 nt downstream of the signal, and a non-DNA-encoded poly-A tail is added by poly-A polymerase:

    The length of poly-A is ~70 A units in yeast, to ~240 A units in mammals for nuclear pre-mRNA. Cytoplasmic enzymes may also cause the polyA to shorten with age, and occasionally lengthen (more next week). Not all transcripts are polyadenylated, e.g. histone mRNA is poly-A minus. For polyA-minus transcripts, the 3'- end may be protected or sequestered by association with other RNP factors in lieu of poly-A.

    Both higher eukaryotes and yeast control polyadenylation with a surprising array of protein factors:

    Polyadenylation in metazoans:

    metazoan polyadenylation factors

    Cleavage and Polyadenylation Specificity Factor (CPSF):

    - a protein with 160 kDa and 30 kDa (zinc finger) RNA binding subunits, which binds to the AAUAAA signal upstream of the cleavage site. Additional 73 kDa and 100 kDa subunits do not contact RNA and their function is unknown

    Cleavage stimulating factor (CstF):

    - binds downstream of the cleavage site at a GU-rich sequence, but associates with CPSF, implying a looped structure of the RNA target. CstF contains 77, 64 and 50 kDa subunits, with classic RNP1-containing RNA binding domains in the 64 kDa unit, and protein interaction domains recognizing CPSF in the 77 kDa subunit. The 50 kDa subunit has a protein interaction domain and binds to the RNA Pol II CTD.

    Cleavage factors Im and IIm (CF I, CF II):

    - four or more polypeptides responsible for the actual cleavage reaction, associating with CstF. CFI contains RS domains similar to splicing factors. Interactions with the Cap Binding Complex CBP20/CBP80 stimulate cleavage by CFII. This acts as a quality control - only pre-mRNA carrying the m7G cap gets polyadenylated, ensuring the integrity of the mRNA structure after splicing.

    PolyA polymerase (PAP):

    the enzyme that adds the poly-A after the cleavage reaction; interacts with 160 kDa subunit of CPSF. Low activity and little specificity in absence of CPSF. Many isoforms 77-82 kDa occur as a result of alternative splicing of the C terminal domain. The Ser/Thr rich C-terminal domain is a target of phosphorylation by cell cyclin regulated protein kinases (CDKs), deactivating the PAP when phosphorylated.

    PolyA binding protein (PAB II):

    - speeds up PAP by making the reaction processive, i.e. proceeding without dissociation of enzyme from the nascent chain. PAB II binds once about 10 A's have been added, and appears to act by tethering PAP in place. Once the distance between CPSF and PAP exceeds 250 A's, processive extension ceases, suggesting that the tether has reached its limit, thereby determining polyA tail length.

    Polyadenylation in yeast

    yeast polyadenylation complex Yeast lacks a distinct AAUAAA sequence, and polyadenylation is associated with less well defined A/U rich and A rich sites. The CFPS equivalents are Cft1, Cft2 and Yth1. A factor comparable to CstF, Rna15/Rna14/Pfs2, binds upstream of CPFS No factors appear to be involved in binding downstream of the cleavage site as for mammalian CstF.

    Cleavage factors PfsIA, IB are involved in cleavage, but functions do not correspond exactly to the mammalian counterparts.

    Pap 1p is the polyA polymerase. Two-hybrid screens have shown it to interact with protein Fip 1p which in turn bridges to the CstF-like factor Rna14/Rna15, which then binds to a protein complex associated with an additional upstream A/U rich sequence marker. An hnRNP-like protein Hrp1 binds in between the CstF and CPSF equivalent factors.

    Sequestration of mature mRNA as hnRNP

    Fully processed nuclear pre-mRNA can be distinguished by lack of association with splicing factors, and functional spliceosome associated RNA appears to be strictly retained in the nucleus.

    hnRNP A1 RNA binding domains As splicing progresses from the cap site towards the polyadenylation site, the exposed single-stranded RNA binds the protein factor hnRNP A1. The absence of bound splicing factors appears to mark fully mature pre-mRNA molecules which are ready to leave the nucleus

    Sequences on these polypeptides act as markers for Nuclear Export signals, (NES) or Nuclear Localization signals (NLS), which are required for transfer across the nuclear pore complex.

    hnRNP A1 protein shuttles rapidly between nuclear and cytoplasmic compartments, acting as a carrier of the mature pre-mRNA. It contains a sequence identified as M9, that acts as both NES and NLS.

    More to come about nuclear transport next week.


    Eperon, I.C. et al. (2000). Selection of alternative 5' splice sites: Role of U1 snRNP and models for the antagonistic effects of SF2/ASF and hnRNP A1. Molecular Cell Biol 20: 8303-8318.

    Hastings, M.L. and Krainer, R., (2001). pre-mRNA splicing in the new millenium. Current Opinion in Cell Biology 13: 302-309.

    Reed, R. (2000) Mechanisms of fidelity in pre-mRNA splicing. Current opininion in Cell Biology 12: 340-345.

    Shatkin, A.J and Manley J.L. (2000). The ends of the affair: Capping and polyadenylation. Nature Structural Biology 7: 838-842.

    Southby, J., Gooding, C. and Smith, C.W.J. (1999) Polypyrimidine tract binding protein functions as a repressor to regulate alternative splicing of a-actinin mutually exclusive exons. Molecular and Cellular Biology 19: 2699-2700.

    (Tarn, W-Y, and Steitz, J. (1997). Pre-mRNA splicing: the discovery of a new spliceosome doubles the challenge. Trends in Biochemical Sciences 22 132-137.

    Will, C.L. and Lührmann, R. (2001). Spliceosomal UsnRNP biogenesis, structure and function. Current Opinion in Cell Biology 13: 290-301.

    Zeng, C. and Berget, S.M. (2000) Participation of C-terminal domain of RNA Pol II in exon definition during pre- mRNA splicing. Molecular Cell Biol 20: 8290-8301.