Summary: LSM domain
Pfam includes annotations and additional family information from a range of different sources. These sources can be accessed via the tabs below.
This is the Wikipedia entry entitled "LSm". More...
The Wikipedia text that you see displayed here is a download from Wikipedia. This means that the information we display is a copy of the information from the Wikipedia database. The button next to the article title ("Edit Wikipedia article") takes you to the edit page for the article directly within Wikipedia. You should be aware you are not editing our local copy of this information. Any changes that you make to the Wikipedia article will not be displayed here until we next download the article from Wikipedia. We currently download new content on a nightly basis.
Does Pfam agree with the content of the Wikipedia entry ?
Pfam has chosen to link families to Wikipedia articles. In some case we have created or edited these articles but in many other cases we have not made any direct contribution to the content of the article. The Wikipedia community does monitor edits to try to ensure that (a) the quality of article annotation increases, and (b) vandalism is very quickly dealt with. However, we would like to emphasise that Pfam does not curate the Wikipedia entries and we cannot guarantee the accuracy of the information on the Wikipedia page.
Editing Wikipedia articles
Before you edit for the first time
Wikipedia is a free, online encyclopedia. Although anyone can edit or contribute to an article, Wikipedia has some strong editing guidelines and policies, which promote the Wikipedia standard of style and etiquette. Your edits and contributions are more likely to be accepted (and remain) if they are in accordance with this policy.
You should take a few minutes to view the following pages:
How your contribution will be recorded
Anyone can edit a Wikipedia entry. You can do this either as a new user or you can register with Wikipedia and log on. When you click on the "Edit Wikipedia article" button, your browser will direct you to the edit page for this entry in Wikipedia. If you are a registered user and currently logged in, your changes will be recorded under your Wikipedia user name. However, if you are not a registered user or are not logged on, your changes will be logged under your computer's IP address. This has two main implications. Firstly, as a registered Wikipedia user your edits are more likely seen as valuable contribution (although all edits are open to community scrutiny regardless). Secondly, if you edit under an IP address you may be sharing this IP address with other users. If your IP address has previously been blocked (due to being flagged as a source of 'vandalism') your edits will also be blocked. You can find more information on this and creating a user account at Wikipedia.
If you have problems editing a particular page, contact us at firstname.lastname@example.org and we will try to help.
The community annotation is a new facility of the Pfam web site. If you have problems editing or experience problems with these pages please contact us.
||This article may be too technical for most readers to understand. (August 2013)|
crystal structure of the sm-related protein of p. abyssi the biological unit is a heptamer
In molecular biology, LSm proteins are a family of RNA-binding proteins found in virtually every cellular organism. LSm is a contraction of 'like Sm', because the first identified members of the LSm protein family were the Sm proteins. LSm proteins are defined by a characteristic three-dimensional structure and their assembly into rings of six or seven individual LSm protein molecules, and play a large number of various roles in mRNA processing and regulation.
The Sm proteins were first discovered as antigens targeted by so-called Anti-Sm antibodies in a patient with a form of Systemic lupus erythematosus (SLE), a debilitating autoimmune disease. They were named Sm proteins in honor of Stephanie Smith, a patient who suffered from SLE. Other proteins with very similar structures were subsequently discovered and named LSm proteins. New members of the LSm protein family continue to be identified and reported.
Proteins with similar structures are grouped into a hierarchy of protein families, superfamilies, and folds. The LSm protein structure is an example of a small beta sheet folded into a short barrel. Individual LSm proteins assemble into a six or seven member doughnut ring (more properly termed a torus), which usually binds to a small RNA molecule to form a ribonucleoprotein complex. The LSm torus assists the RNA molecule to assume and maintain its proper three-dimensional structure. Depending on which LSm proteins and RNA molecule are involved, this ribonucleoprotein complex facilitates a wide variety of RNA processing including degradation, editing, splicing, and regulation.
Alternate terms for LSm family are LSm fold and Sm-like fold, and alternate capitalization styles such as lsm, LSM, and Lsm are common and equally acceptable.
- 1 History
- 2 Structure
- 3 Functions
- 4 Evolution and phylogeny
- 5 Biogenesis of snRNPs
- 6 References
- 7 External links
Discovery of the Smith antigen
The story of the discovery of the first LSm proteins begins with a young woman, Stephanie Smith, who was diagnosed in 1959 with systemic lupus erythematosus (SLE), eventually succumbing to complications of the disease in 1969 at the age of 22. During this period, she was treated at New York's Rockefeller University Hospital, under the care of Dr. Henry Kunkel and Dr. Eng Tan. As those with an autoimmune disease, SLE patients produce antibodies to antigens in their cells' nuclei, most frequently to their own DNA. However, Dr. Kunkel and Dr. Tan found in 1966 that Ms. Smith produced antibodies to a set of nuclear proteins, which they named the 'smith antigen' (Sm Ag). About 30% of SLE patients produce antibodies to these proteins, as opposed to double stranded DNA. This discovery improved diagnostic testing for SLE, but the nature and function of this antigen was unknown.
Sm proteins, snRNPs, the spliceosome and messenger RNA splicing
Research continued during the 1970s and early 1980s. The smith antigen was found to be a complex of ribonucleic acid (RNA) molecules and multiple proteins. A set of uridine-rich small nuclear RNA (snRNA) molecules was part of this complex, and given the names U1, U2, U4, U5 and U6. Four of these snRNAs (U1, U2, U4 and U5) were found to be tightly bound to several small proteins, which were named SmB, SmD, SmE, SmF, and SmG in decreasing order of size. SmB has an alternatively spliced variant, SmB', and a very similar protein, SmN replaces SmB'/B in certain (mostly neural) tissues. SmD was later discovered to be a mixture of three proteins, which were named SmD1, SmD2 and SmD3. These nine proteins (SmB, SmB', SmN, SmD1, SmD2, SmD3, SmE, SmF and SmG) became known as the Sm core proteins, or simply Sm proteins. The snRNAs are complexed with the Sm core proteins and with other proteins to form particles in the cell's nucleus called small nuclear ribonucleoproteins, or snRNPs. By the mid 1980s, it became clear that these snRNPs help form a large (4.8 MD molecular weight) complex, called the spliceosome, around pre-mRNA, excising portions of the pre-mRNA called introns and splicing the coding portions (exons) together. After a few more modifications, the spliced pre-mRNA becomes messenger RNA (mRNA) which is then exported from the nucleus and translated into a protein by ribosomes.
Discovery of proteins similar to the Sm proteins
The snRNA U6 (unlike U1, U2, U4 and U5) does not associate with the Sm proteins, even though the U6 snRNP is a central component in the spliceosome. In 1999 a protein heteromer was found that binds specifically to U6, and consisted of seven proteins clearly homologous to the Sm proteins. These proteins were denoted LSm (like Sm) proteins (LSm1, LSm2, LSm3, LSm4, LSm5, LSm6 and LSm7), with the similar LSm8 protein identified later. In the bacterium Escherichia coli the Sm-like protein HF-I encoded by the gene hfq was described in 1968 as an essential host factor for RNA bacteriophage Qβ replication. The genome of Saccharomyces cerevisiae (Baker's Yeast) was sequenced in the mid-1990s, providing a rich resource for identifying homologs of these human proteins. Subsequently as more eukaryotes genomes were sequenced, it became clear that eukaryotes, in general, share homologs to the same set of seven Sm and eight LSm proteins. Soon after, proteins homologous to these eukaryote LSm proteins were found in Archaea (Sm1 and Sm2) and Bacteria (Hfq and YlxS homologs). Interestingly, the archaeal LSm proteins are more similar to the eukaryote LSm proteins than either are to bacterial LSm proteins. The LSm proteins described thus far were rather small proteins, varying from 76 amino acids (8.7 kD molecular weight) for human SmG to 231 amino acids (29 kD molecular weight) for human SmB. But recently, larger proteins have been discovered that include a LSm structural domain in addition to other protein structural domains (such as LSm10, LSm11, LSm12, LSm13, LSm14, LSm15, LSm16, ataxin-2, as well as archaeal Sm3).
Discovery of the LSm fold
Around 1995, comparisons between the various LSm homologs identified two sequence motifs, 32 amino acids long and 14 amino acids long, that were very similar in each LSm homolog, and were separated by a non-conserved region of variable length. This indicated the importance of these two sequence motifs (named Sm1 and Sm2), and suggested that all LSm protein genes evolved from a single ancestral gene. In 1999, crystals of recombinant Sm proteins were prepared, allowing X-ray crystallography and determination of their atomic structure in three dimensions. This demonstrated that the LSm proteins share a similar three-dimensional fold of a short alpha helix and a five-stranded folded beta sheet, subsequently named the LSm fold. Other investigations found that LSm proteins assemble into a torus (doughnut-shaped ring) of six or seven LSm proteins, and that RNA binds to the inside of the torus, with one nucleotide bound to each LSm protein.
Uridine phosphate binds in archaeal Sm1 between the β2b/β3a loop and β4b/β5 loop. The uracil is stacked between the histidine and arginine residues, stabilized by hydrogen bonding to an asparagine residue, and hydrogen bonding between the aspartate residue and the ribose. LSm proteins are characterized by a beta sheet (the secondary structure), folded into the LSm fold (the tertiary structure), polymerization into a six or seven member torus (the quaternary structure), and binding to RNA oligonucleotides. A modern paradigm classifies proteins on the basis of protein structure and is a currently active field, with three major approaches, SCOP (Structural Classification of Proteins), CATH (Class, Architecture, Topology, Homologous superfamily), and FSSP/DALI (Families of Structually Similar Proteins).
The secondary structure of a LSm protein is a small five-strand anti-parallel beta sheet, with the strands identified from the N-terminal end to the C-terminal end as β1, β2, β3, β4, β5. The SCOP class of All beta proteins and the CATH class of Mainly Beta are defined as protein structures that are primarily beta sheets, thus including LSm. The SM1 sequence motif corresponds to the β1, β2, β3 strands, and the SM2 sequence motif corresponds to the β4 and β5 strands. The first four beta strands are adjacent to each other, but β5 is adjacent to β1, turning the overall structure into a short barrel. This structural topology is described as 51234. A short (two to four turns) N-terminal alpha helix is also present in most LSm proteins. The β3 and β4 strands are short in some LSm proteins, and are separated by an unstructured coil of variable length. The β2, β3 and β4 strands are strongly bent about 120° degrees at their midpoints The bends in these strands are often glycine, and the side chains internal to the beta barrel are often the hydrophobic residues valine, leucine, isoleucine and methionine.
SCOP simply classifies the LSm structure as the Sm-like fold, one of 149 different Beta Protein folds, without any intermediate groupings. The LSm beta sheet is sharply bent and described as a Roll architecture in CATH (one of 20 different beta protein architectures in CATH). One of the beta strands (β5 in LSm) crosses the open edge of the roll to form a small SH3 type barrel topology (one of 33 beta roll topologies in CATH). CATH lists 23 homologous superfamilies with an SH3 type barrel topology, one of which is the LSm structure (RNA Binding Protein in the CATH system). SCOP continues its structural classification after Fold to Superfamily, Family and Domain, while CATH continues to Sequence Family, but these divisions are more appropriately described in the "Evolution and phylogeny" section.
The SH3-type barrel tertiary structure of the LSm fold is formed by the strongly bent (about 120°) β2, β3 and β4 strands, with the barrel structure closed by the β5 strand. Emphasizing the tertiary structure, each bent beta strand can be described as two shorter beta strands. The LSm fold can be viewed as an eight-strand anti-parallel beta sandwich, with five strands in one plane and three strands in a parallel plane with about a 45° pitch angle between the two halves of the beta sandwich. The short (two to four turns) N-terminal alpha helix occurs at one edge of the beta sandwich. This alpha helix and the beta strands can be labeled (from the N-terminus to the C-terminus) α, β1, β2a, β2b, β3a, β3b, β4a, β4b, β5 where the a and b refer to either the two halves of a bent strand in the five-strand description, or to the individual strands in the eight-strand description. Each strand (in the eight-strand description) is formed from five amino acid residues. Including the bends and loops between the strands, and the alpha helix, about 60 amino acid residues contribute to the LSm fold, but this varies between homologs due to variation in inter-strand loops, the alpha helix, and even the lengths of β3b and β4a strands.
LSm proteins typically assemble into a LSm ring, a six or seven member torus, about 7 nanometers in diameter with a 2 nanometer hole. The ancestral condition is a homohexamer or homoheptamer of identical LSm subunits. LSm proteins in eukaryotes form heteroheptamers of seven different LSm subunits, such as the Sm proteins. Binding between the LSm proteins is best understood with the eight-strand description of the LSm fold. The five-strand half of the beta sandwich of one subunit aligns with the three-strand half of the beta sandwich of the adjacent subunit, forming a twisted 8-strand beta sheet Aβ4a/Aβ3b/Aβ2a/Aβ1/Aβ5/Bβ4b/Bβ3a/Bβ2b, where the A and B refer to the two different subunits. In addition to hydrogen bonding between the Aβ5 and Bβ4b beta strands of the two LSm protein subunits, there are energetically favorable contacts between hydrophobic amino acid side chains in the interior of the contact area, and energetically favorable contacts between hydrophilic amino acid side chains around the periphery of the contact area.
RNA oligonucleotide binding
LSm rings form ribonucleoprotein complexes with RNA oligonucleotides that vary in binding strength from very stable complexes (such as the Sm class snRNPs) to transient complexes. RNA oligonucleotides generally bind inside the hole (lumen) of the LSm torus, one nucleotide per LSm subunit, but additional nucleotide binding sites have been reported at the top (α helix side) of the ring. The exact chemical nature of this binding varies, but common motifs include stacking the heterocyclic base (often uracil) between planar side chains of two amino acids, hydrogen bonding to the heterocyclic base and/or the ribose, and salt bridges to the phosphate group.
The various kinds of LSm rings function as scaffolds or chaperones for RNA oligonucleotides, assisting the RNA to assume and maintain the proper three-dimensional structure. In some cases, this allows the oligonucleotide RNA to function catalytically as a ribozyme. In other cases, this facilitates modification or degradation of the RNA, or the assembly, storage, and intracellular transport of ribonucleoprotein complexes.
The Sm ring is found in the nucleus of all eukaryotes (about 2.5 x 106 copies per proliferating human cell), and has the best understood functions. The Sm ring is a heteroheptamer. The Sm-class snRNA molecule (in the 5' to 3' direction) enters the lumen (doughnut hole) at the SmE subunit and proceeds sequentially in a clockwise fashion (looking from the α helix side) inside the lumen (doughnut hole) to the SmG, SmD3, SmB, SmD1, SmD2 subunits, exiting at the SmF subunit. (SmB can be replaced by the splice variant SmB' and by SmN in neural tissues.) The Sm ring permanently binds to the U1, U2, U4 and U5 snRNAs which form four of the five snRNPs that constitute the major spliceosome. The Sm ring also permanently binds to the U11, U12 and U4atac snRNAs which form four of the five snRNPs (including the U5 snRNP) that constitute the minor spliceosome. Both of these spliceosomes are central RNA-processing complexes in the maturation of messenger RNA from pre-mRNA. Sm proteins have also been reported to be part of ribonucleoprotein component of telomerase.
The two Lsm2-8 snRNPs (U6 and U6atac) have the key catalyic function in the major and minor spliceosomes. These snRNPs do not include the Sm ring, but instead use the heteroheptameric Lsm2-8 ring. The LSm rings are about 20 times less abundant than the Sm rings. The order of these seven LSm proteins in this ring is not known, but based on amino acid sequence homology with the Sm proteins, it is speculated that the snRNA (in the 5' to 3' direction) may bind first to LSm5, and precedes sequentially clockwise to the LSm7, LSm4, LSm8, LSm2, LSm3, and exiting at the LSm6 subunit. Experiments with Saccharomyces cerevisiae (budding yeast) mutations suggest that the Lsm2-8 ring assists the reassociation of the U4 and U6 snRNPs into the U4/U6 di-snRNP. (After completion of exon deletion and intron splicing, these two snRNPs must reassociate for the spliceosome to initiate another exon/intron splicing cycle. In this role, the Lsm2-8 ring acts as an RNA chaperone instead of an RNA scaffold.) The Lsm2-8 ring also forms an snRNP with the U8 small nucleolar RNA (snoRNA) which localizes in the nucleolus. This ribonucleoprotein complex is necessary for processing ribosomal RNA and transfer RNA to their mature forms. The Lsm2-8 ring is reported to have a role in the processing of pre-P RNA into RNase P RNA. In contrast to the Sm ring, the Lsm2-8 ring does not permanently bind to its snRNA and snoRNA.
A second type of Sm ring exists where LSm10 replaces SmD1 and LSm11 replaces SmD2. LSm11 is a two domain protein with the C-terminal domain being a LSm domain. This heteroheptamer ring binds with the U7 snRNA in the U7 snRNP. The U7 snRNP mediates processing of the 3' UTR stem-loop of the histone mRNA in the nucleus. Like the Sm ring, it is assembled in the cytoplasm onto the U7 snRNA by a specialized SMN complex.
A second type of Lsm ring is the Lsm1-7 ring, which has the same structure as the Lsm2-8 ring except that LSm1 replaces LSm8. In contrast to the Lsm2-8 ring, the Lsm1-7 ring localizes in the cytoplasm where it assists in degrading messenger RNA in ribonucleoprotein complexes. This process controls the turnover of messenger RNA so that ribosomal translation of mRNA to protein responds quickly to changes in transcription of DNA to messenger RNA by the cell.
Gemin6 and Gemin7
The SMN complex (described under "Biogenesis of snRNPs") is composed of the SMN protein and Gemin2-8. Two of these, Gemin 6 and Gemin7 have been discovered to have the LSm structure, and to form a heterodimer. These may have a chaperone function in the SMN complex to assist the formation of the Sm ring on the Sm-class snRNAs. PRMT5 complex is composed of PRMT5, pICln, WD45 (Mep50). pICln helps to form Sm opened ring on SMN complex. SMN complex assists in the assembly of snRNPs where the Sm ring is in the open conformation on SMN complex and this Sm ring is loaded onto the snRNA by SMN complex.
LSm12-16 and other multi-domain LSm proteins
The LSm12-16 proteins have been described very recently. These are two-domain proteins with a N-terminal LSm domain and a C-terminal methyl transferase domain. Very little is known about the function of these proteins, but presumably they are member of LSm-domain rings that interact with RNA. There is some evidence that LSm12 is possibly involved in mRNA degradation and LSm13-16 may have roles in regulation of mitosis. A large protein of unknown function, ataxin-2, associated with the neurodegenerative disease spinocerebellar ataxia type 2, also has a N-terminal LSm domain.
Archaeal Sm rings
Two LSm proteins are found in a second domain of life, the Archaea. These are the Sm1 and Sm2 proteins (not to be confused with the Sm1 and Sm2 sequence motifs), and are sometimes identified as Sm-like archaeal proteins SmAP1 and SmAP2 for this reason. Sm1 and Sm2 generally form homoheptamer rings, although homohexamer rings have been observed. Sm1 rings are similar to eukaryote Lsm rings in that they form in the absence of RNA while Sm2 rings are similar to eukaryote Sm rings in that they require uridine-rich RNA for their formation. They have been reported to associate with RNase P RNA, suggesting a role in transfer RNA processing, but their function in archaea in this process (and possibly processing other RNA such as ribosomal RNA) is mostly unknown. One of the two main branches of archaea, the crenarchaeotes have a third known type of archaeal LSm protein, Sm3. This is a two-domain protein with a N-terminal LSm domain that forms a homoheptamer ring. Nothing is known about the function of this LSm protein, but presumably it interacts with, and probably helps process, RNA in these organisms.
Bacterial LSm rings
Several LSm proteins have been reported in the third domain of life, the Bacteria. Hfq protein forms homohexamer rings, and was originally discovered as necessary for infection by the bacteriophage Qβ, although this is clearly not the native function of this protein in bacteria. It is not universally present in all bacteria, but has been found in Proteobacteria, Firmicutes, Spirochaetes, Thermotogae, Aquificae and one species of Archaea. (This last instance is probably a case of horizontal gene transfer.) Hfq is pleiotropic with a variety of interactions, generally associated with translation regulation. These include blocking ribosome binding to mRNA, marking mRNA for degradation by binding to their poly-A tails, and association with bacterial small regulatory RNAs (such as DsrA RNA) that control translation by binding to certain mRNAs. A second bacterial LSm protein is YlxS (sometimes also called YhbC), which was first identified in the soil bacterium Bacillus subtilis. This is a two-domain protein with a N-terminal LSm domain. Its function is unknown, but amino acid sequence homologs are found in virtually every bacterial genome to date, and it may be an essential protein. The middle domain of the small conductance mechanosensitive channel MscS in Escherichia coli forms a homoheptameric ring. This LSm domain has no apparent RNA-binding function, but the homoheptameric torus is part of the central channel of this membrane protein.
Evolution and phylogeny
LSm homologs are found in all three domains of life, and may even be found in every single organism. Computational phylogenetic methods are used to infer phylogenetic relations. Sequence alignment between the various LSm homologs are the appropriate tool for this, such as multiple sequence alignment of the primary structure (amino acid sequence), and structural alignment of the tertiary structure (three-dimensional structure). It is hypothesized that a gene for a LSm protein was present in the last universal ancestor of all life. Based on the functions of known LSm proteins, this original LSm protein may have assisted ribozymes in the processing of RNA for synthesizing proteins as part of the RNA world hypothesis of early life. According to this view, this gene was passed from ancestor to descendent, with frequent mutations, gene duplications and occasional horizontal gene transfers. In principle, this process can be summarized in a phylogenetic tree with the root in the last universal ancestor (or earlier), and with the tips representing the universe of LSm genes existing today.
Homomeric LSm rings in bacteria and archaea
Based on structure, the known LSm proteins divide into a group consisting of the bacterial LSm proteins (Hfq, YlxS and MscS) and a second group of all other LSm proteins, in accordance with the most recently published phylogenetic trees. The three archaeal LSm proteins (Sm1, Sm2 and Sm3) also cluster as a group, distinct from the eukaryote LSm proteins. Both the bacterial and archaeal LSm proteins polymerize to homomeric rings, which is the ancestral condition.
Heteromeric LSm rings in eukaryotes
A series of gene duplications of a single eukaryote LSm gene resulted in most (if not all) of the known eukaryote LSm genes. Each of the seven Sm proteins has greater amino acid sequence homology to a corresponding Lsm protein than to the other Sm proteins. This suggests that an ancestral LSm gene duplicated several times, resulting in seven paralogs. These subsequently diverged from each other so that the ancestral homoheptamer LSm ring became a heteroheptamer ring. Based on the known functions of LSm proteins in eukaryotes and archaea, the ancestral function may have been processing of pre-ribosomal RNA, pre-transfer RNA, and pre-RNase P. Then, according to this hypothesis, the seven ancestral eukaryote LSm genes duplicated again to seven pairs of Sm/LSm paralogs; LSm1/SmB, LSm2/SmD1, LSm3/SmD2, LSm4/SmD3, LSm5/SmE, LSm6/SmF and LSm7/SmG. These two group of seven LSm genes (and the corresponding two kinds of LSm rings) evolved to an Sm ring (requiring RNA) and a Lsm ring (which forms without RNA). The LSm1/LSm8 paralog pair also seems to have originated prior to the last common eukaryote ancestor, for a total of at least 15 LSm protein genes. The SmD1/LSm10 paralog pair and the SmD2/LSm11 paralog pair exist only in animals, fungi, and the amoebozoa (sometimes identified as the unikont clade) and appears to be absent in the bikont clade (chromalveolates, excavates, plants and rhizaria). Therefore, these two gene duplications predated this fundamental split in the eukaryote lineage. The SmB/SmN paralog pair is seen only in the placental mammals, which dates this LSm gene duplication.
Biogenesis of snRNPs
- Reeves WH, Narain S, Satoh M (2003). "Henry Kunkel, Stephanie Smith, clinical immunology, and split genes". Lupus 12 (3): 213–7. doi:10.1191/0961203303lu360xx. PMID 12708785.
- Tan EM, Kunkel HG (March 1966). "Characteristics of a soluble nuclear antigen precipitating with sera of patients with systemic lupus erythematosus". J. Immunol. 96 (3): 464–71. PMID 5932578.
- Will CL, Lührmann R (June 2001). "Spliceosomal UsnRNP biogenesis, structure and function". Curr. Opin. Cell Biol. 13 (3): 290–301. doi:10.1016/S0955-0674(00)00211-8. PMID 11343899.
- He W, Parker R (June 2000). "Functions of Lsm proteins in mRNA degradation and splicing". Curr. Opin. Cell Biol. 12 (3): 346–50. doi:10.1016/S0955-0674(00)00098-3. PMID 10801455.
- Törö I, Thore S, Mayer C, Basquin J, Séraphin B, Suck D (May 2001). "RNA binding in an Sm core domain: X-ray structure and functional analysis of an archaeal Sm protein complex". EMBO J. 20 (9): 2293–303. doi:10.1093/emboj/20.9.2293. PMC 125243. PMID 11331594.
- Hermann H, Fabrizio P, Raker VA, Foulaki K, Hornig H, Brahms H, et al. (May 1995). "snRNP Sm proteins share two evolutionarily conserved sequence motifs which are involved in Sm protein-protein interactions". EMBO J. 14 (9): 2076–88. PMC 398308. PMID 7744013.
- Kambach C, Walke S, Young R, Avis JM, de la Fortelle E, Raker VA, et al. (February 1999). "Crystal structures of two Sm protein complexes and their implications for the assembly of the spliceosomal snRNPs". Cell 96 (3): 375–87. doi:10.1016/S0092-8674(00)80550-4. PMID 10025403.
- National Center for Biotechnology Information Structure Database PDB codes 1B34, 1D3B, 1I5L, 1KQ2, 1N9S, 1IB8.
- Khusial P, Plaag R, Zieve GW (September 2005). "LSm proteins form heptameric rings that bind to RNA via repeating motifs". Trends Biochem. Sci. 30 (9): 522–8. doi:10.1016/j.tibs.2005.07.006. PMID 16051491.
- Urlaub H, Raker VA, Kostka S, Lührmann R (January 2001). "Sm protein-Sm site RNA interactions within the inner ring of the spliceosomal snRNP core structure". EMBO J. 20 (1-2): 187–96. doi:10.1093/emboj/20.1.187. PMC 140196. PMID 11226169.
- Seto AG, Zaug AJ, Sobel SG, Wolin SL, Cech TR (September 1999). "Saccharomyces cerevisiae telomerase is an Sm small nuclear ribonucleoprotein particle". Nature 401 (6749): 177–80. doi:10.1038/43694. PMID 10490028.
- Beggs JD (June 2005). "Lsm proteins and RNA processing". Biochem. Soc. Trans. 33 (Pt 3): 433–8. doi:10.1042/BST0330433. PMID 15916535.
- Kufel J, Allmang C, Petfalski E, Beggs J, Tollervey D (January 2003). "Lsm Proteins are required for normal processing and stability of ribosomal RNAs". J. Biol. Chem. 278 (4): 2147–56. doi:10.1074/jbc.M208856200. PMID 12438310.
- Schümperli D, Pillai RS (October 2004). "The special Sm core structure of the U7 snRNP: far-reaching significance of a small nuclear ribonucleoprotein". Cell. Mol. Life Sci. 61 (19-20): 2560–70. doi:10.1007/s00018-004-4190-0. PMID 15526162.
- Ma Y, Dostie J, Dreyfuss G, Van Duyne GD (June 2005). "The Gemin6-Gemin7 heterodimer from the survival of motor neurons complex has an Sm protein-like structure". Structure 13 (6): 883–92. doi:10.1016/j.str.2005.03.014. PMID 15939020.
- Chari A, Golas MM, Klingenhäger M, Neuenkirchen N, Sander B, Englbrecht C, et al. (2008-10-31). "An Assembly Chaperone Collaborates with the SMN Complex to Generate Spliceosomal SnRNPs". Cell 135 (3): 497–509. doi:10.1016/j.cell.2008.09.020. PMID 18984161.
- Albrecht M, Lengauer T (July 2004). "Novel Sm-like proteins with long C-terminal tails and associated methyltransferases". FEBS Lett. 569 (1-3): 18–26. doi:10.1016/j.febslet.2004.03.126. PMID 15225602.
- Mura C, Kozhukhovsky A, Gingery M, Phillips M, Eisenberg D (April 2003). "The oligomerization and ligand-binding properties of Sm-like archaeal proteins (SmAPs)". Protein Sci. 12 (4): 832–47. doi:10.1110/ps.0224703. PMC 2323858. PMID 12649441.
- Schumacher MA, Pearson RF, Møller T, Valentin-Hansen P, Brennan RG (July 2002). "Structures of the pleiotropic translational regulator Hfq and an Hfq-RNA complex: a bacterial Sm-like protein". EMBO J. 21 (13): 3546–56. doi:10.1093/emboj/cdf322. PMC 126077. PMID 12093755.
- Lease RA, Woodson SA (December 2004). "Cycling of the Sm-like protein Hfq on the DsrA small regulatory RNA". J. Mol. Biol. 344 (5): 1211–23. doi:10.1016/j.jmb.2004.10.006. PMID 15561140.
- Yu L, Gunasekera AH, Mack J, Olejniczak ET, Chovan LE, Ruan X, et al. (August 2001). "Solution structure and function of a conserved protein SP14.3 encoded by an essential Streptococcus pneumoniae gene". J. Mol. Biol. 311 (3): 593–604. doi:10.1006/jmbi.2001.4894. PMID 11493012.
- Bass RB, Strop P, Barclay M, Rees DC (November 2002). "Crystal structure of Escherichia coli MscS, a voltage-modulated and mechanosensitive channel". Science 298 (5598): 1582–7. doi:10.1126/science.1077945. PMID 12446901.
- Achsel T, Stark H, Lührmann R (March 2001). "The Sm domain is an ancient RNA-binding motif with oligo(U) specificity". Proc. Natl. Acad. Sci. U.S.A. 98 (7): 3685–9. doi:10.1073/pnas.071033998. PMC 31112. PMID 11259661.
- Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P (March 2006). "Toward automatic reconstruction of a highly resolved tree of life". Science 311 (5765): 1283–7. doi:10.1126/science.1123061. PMID 16513982.
- Kiss T (December 2004). "Biogenesis of small nuclear RNPs". J. Cell. Sci. 117 (Pt 25): 5949–51. doi:10.1242/jcs.01487. PMID 15564372.
- Pfam entry LSM. Pfam is the Sanger Institute database, which is a collection of protein families and domains.
This tab holds the annotation information that is stored in the Pfam database. As we move to using Wikipedia as our main source of annotation, the contents of this tab will be gradually replaced by the Wikipedia tab.
LSM domain Provide feedback
The LSM domain contains Sm proteins as well as other related LSM (Like Sm) proteins. The U1, U2, U4/U6, and U5 small nuclear ribonucleoprotein particles (snRNPs) involved in pre-mRNA splicing contain seven Sm proteins (B/B', D1, D2, D3, E, F and G) in common, which assemble around the Sm site present in four of the major spliceosomal small nuclear RNAs. The U6 snRNP binds to the LSM (Like Sm) proteins . Sm proteins are also found in archaebacteria, which do not have any splicing apparatus suggesting a more general role for Sm proteins. All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker. This family also includes the bacterial Hfq (host factor Q) proteins. Hfq are also RNA-binding proteins, that form hexameric rings.
Hermann H, Fabrizio P, Raker VA, Foulaki K, Hornig H, Brahms H, Luhrmann R , EMBO J 1995;14:2076-2088.: snRNP Sm proteins share two evolutionarily conserved sequence motifs which are involved in Sm protein-protein interactions. PUBMED:7744013 EPMC:7744013
Kambach C, Walke S, Young R, Avis JM, de la Fortelle E, Raker VA, Luhrmann R, Li J, Nagai K; , Cell 1999;96:375-387.: Crystal structures of two Sm protein complexes and their implications for the assembly of the spliceosomal snRNPs. PUBMED:10025403 EPMC:10025403
Hajnsdorf E, Regnier P; , Proc Natl Acad Sci U S A 2000;97:1501-1505.: Host factor Hfq of Escherichia coli stimulates elongation of poly(A) tails by poly(A) polymerase I. PUBMED:10677490 EPMC:10677490
Kufel J, Allmang C, Petfalski E, Beggs J, Tollervey D; , J Biol Chem 2003;278:2147-2156.: Lsm Proteins are required for normal processing and stability of ribosomal RNAs. PUBMED:12438310 EPMC:12438310
External database links
This tab holds annotation information from the InterPro database.
InterPro entry IPR001163
This family is found in Lsm (like-Sm) proteins and in bacterial Lsm-related Hfq proteins. In each case, the domain adopts a core structure consisting of an open beta-barrel with an SH3-like topology.
Lsm (like-Sm) proteins have diverse functions, and are thought to be important modulators of RNA biogenesis and function [PUBMED:10801455, PUBMED:12438310]. The Sm proteins form part of specific small nuclear ribonucleoproteins (snRNPs) that are involved in the processing of pre-mRNAs to mature mRNAs, and are a major component of the eukaryotic spliceosome. Most snRNPs consist of seven Sm proteins (B/B', D1, D2, D3, E, F and G) arranged in a ring on a uridine-rich sequence (Sm site), plus a small nuclear RNA (snRNA) (either U1, U2, U5 or U4/6) [PUBMED:15130578]. All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker [PUBMED:7744013]. In other snRNPs, certain Sm proteins are replaced with different Lsm proteins, such as with U7 snRNPs, in which the D1 and D2 Sm proteins are replaced with U7-specific Lsm10 and Lsm11 proteins, where Lsm11 plays a role in histone U7-specific RNA processing [PUBMED:15526162]. Lsm proteins are also found in archaebacteria, which do not have any splicing apparatus suggesting a more general role for Lsm proteins.
The pleiotropic translational regulator Hfq (host factor Q) is a bacterial Lsm-like protein, which modulates the structure of numerous RNA molecules by binding preferentially to A/U-rich sequences in RNA [PUBMED:15561140]. Hfq forms an Lsm-like fold, however, unlike the heptameric Sm proteins, Hfq forms a homo-hexameric ring.
Below is a listing of the unique domain organisations or architectures in which this domain is found. More...
The graphic that is shown by default represents the longest sequence with a given architecture. Each row contains the following information:
- the number of sequences which exhibit this architecture
a textual description of the architecture, e.g. Gla, EGF x 2, Trypsin.
This example describes an architecture with one
Gladomain, followed by two consecutive
EGFdomains, and finally a single
- a link to the page in the Pfam site showing information about the sequence that the graphic describes
- the UniProt description of the protein sequence
- the number of residues in the sequence
- the Pfam graphic itself.
Note that you can see the family page for a particular domain by clicking on the graphic. You can also choose to see all sequences which have a given architecture by clicking on the Show link in each row.
Finally, because some families can be found in a very large number of architectures, we load only the first fifty architectures by default. If you want to see more architectures, click the button at the bottom of the page to load the next set.
Loading domain graphics...
We store a range of different sequence alignments for families. As well as the seed alignment from which the family is built, we provide the full alignment, generated by searching the sequence database (reference proteomes) using the family HMM. We also generate alignments using four representative proteomes (RP) sets, the UniProtKB sequence database, the NCBI sequence database, and our metagenomics sequence database. More...
There are various ways to view or download the sequence alignments that we store. We provide several sequence viewers and a plain-text Stockholm-format file for download.
We make a range of alignments for each Pfam-A family:
- the curated alignment from which the HMM for the family is built
- the alignment generated by searching the sequence database using the HMM
- Representative Proteomes (RPs) at 15%, 35%, 55% and 75% co-membership thresholds
- alignment generated by searching the UniProtKB sequence database using the family HMM
- alignment generated by searching the NCBI sequence database using the family HMM
- alignment generated by searching the metagenomics sequence database using the family HMM
You can see the alignments as HTML or in three different sequence viewers:
- a Java applet developed at the University of Dundee. You will need Java installed before running jalview
- an HTML page showing the whole alignment.Please note: full Pfam alignments can be very large. These HTML views are extremely large and often cause problems for browsers. Please use either jalview or the Pfam viewer if you have trouble viewing the HTML version
- an HTML-based representation of the alignment, coloured according to the posterior-probability (PP) values from the HMM. As for the standard HTML view, heatmap alignments can also be very large and slow to render.
You can download (or view in your browser) a text representation of a Pfam alignment in various formats:
You can also change the order in which sequences are listed in the alignment, change how insertions are represented, alter the characters that are used to represent gaps in sequences and, finally, choose whether to download the alignment or to view it in your browser directly.
You may find that large alignments cause problems for the viewers and the reformatting tool, so we also provide all alignments in Stockholm format. You can download either the plain text alignment, or a gzipped version of it.
We make a range of alignments for each Pfam-A family. You can see a description of each above. You can view these alignments in various ways but please note that some types of alignment are never generated while others may not be available for all families, most commonly because the alignments are too large to handle.
1Cannot generate PP/Heatmap alignments for seeds; no PP data available
Key: available, not generated, — not available.
Format an alignment
We make all of our alignments available in Stockholm format. You can download them here as raw, plain text files or as gzip-compressed files.
You can also download a FASTA format file containing the full-length sequences for all sequences in the full alignment.
HMM logos is one way of visualising profile HMMs. Logos provide a quick overview of the properties of an HMM in a graphical form. You can see a more detailed description of HMM logos and find out how you can interpret them here. More...
If you find these logos useful in your own work, please consider citing the following article:
This page displays the phylogenetic tree for this family's seed alignment. We use FastTree to calculate neighbour join trees with a local bootstrap based on 100 resamples (shown next to the tree nodes). FastTree calculates approximately-maximum-likelihood phylogenetic trees from our seed alignment.
Note: You can also download the data file for the tree.
Curation and family details
This section shows the detailed information about the Pfam family. You can see the definitions of many of the terms in this section in the glossary and a fuller explanation of the scoring system that we use in the scores section of the help pages.
|Seed source:||Psiblast SMD1_HUMAN|
|Number in seed:||112|
|Number in full:||7036|
|Average length of the domain:||69.50 aa|
|Average identity of full alignment:||24 %|
|Average coverage of the sequence by the domain:||54.64 %|
|HMM build commands:||
build method: hmmbuild -o /dev/null HMM SEED
search method: hmmsearch -Z 11927849 -E 1000 --cpu 4 HMM pfamseq
|Family (HMM) version:||19|
|Download:||download the raw HMM for this family|
Weight segments by...
Change the size of the sunburst
selected sequences to HMM
a FASTA-format file
- 0 sequences
- 0 species
This visualisation provides a simple graphical representation of the distribution of this family across species. You can find the original interactive tree in the More....
This chart is a modified "sunburst" visualisation of the species tree for this family. It shows each node in the tree as a separate arc, arranged radially with the superkingdoms at the centre and the species arrayed around the outermost ring.
How the sunburst is generated
The tree is built by considering the taxonomic lineage of each sequence that has a match to this family. For each node in the resulting tree, we draw an arc in the sunburst. The radius of the arc, its distance from the root node at the centre of the sunburst, shows the taxonomic level ("superkingdom", "kingdom", etc). The length of the arc represents either the number of sequences represented at a given level, or the number of species that are found beneath the node in the tree. The weighting scheme can be changed using the sunburst controls.
In order to reduce the complexity of the representation, we reduce the number of taxonomic levels that we show. We consider only the following eight major taxonomic levels:
Colouring and labels
Segments of the tree are coloured approximately according to their superkingdom. For example, archeal branches are coloured with shades of orange, eukaryotes in shades of purple, etc. The colour assignments are shown under the sunburst controls. Where space allows, the name of the taxonomic level will be written on the arc itself.
As you move your mouse across the sunburst, the current node will be highlighted. In the top section of the controls panel we show a summary of the lineage of the currently highlighed node. If you pause over an arc, a tooltip will be shown, giving the name of the taxonomic level in the title and a summary of the number of sequences and species below that node in the tree.
Anomalies in the taxonomy tree
There are some situations that the sunburst tree cannot easily handle and for which we have work-arounds in place.
Missing taxonomic levels
Some species in the taxonomic tree may not have one or more of the main eight levels that we display. For example, Bos taurus is not assigned an order in the NCBI taxonomic tree. In such cases we mark the omitted level with, for example, "No order", in both the tooltip and the lineage summary.
Unmapped species names
The tree is built by looking at each sequence in the full alignment for the family. We take the name of the species given by UniProt and try to map that to the full taxonomic tree from NCBI. In some cases, the name chosen by UniProt does not map to any node in the NCBI tree, perhaps because the chosen name is listed as a synonym or a misspelling in the NCBI taxonomy.
So that these nodes are not simply omitted from the sunburst tree, we group them together in a separate branch (or segment of the sunburst tree). Since we cannot determine the lineage for these unmapped species, we show all levels between the superkingdom and the species as "uncategorised".
Since we reduce the species tree to only the eight main taxonomic levels, sequences that are mapped to the sub-species level in the tree would not normally be shown. Rather than leave out these species, we map them instead to their parent species. So, for example, for sequences belonging to one of the Vibrio cholerae sub-species in the NCBI taxonomy, we show them instead as belonging to the species Vibrio cholerae.
Too many species/sequences
For large species trees, you may see blank regions in the outer layers of the sunburst. These occur when there are large numbers of arcs to be drawn in a small space. If an arc is less than approximately one pixel wide, it will not be drawn and the space will be left blank. You may still be able to get some information about the species in that region by moving your mouse across the area, but since each arc will be very small, it will be difficult to accurately locate a particular species.
The tree shows the occurrence of this domain across different species. More...
We show the species tree in one of two ways. For smaller trees we try to show an interactive representation, which allows you to select specific nodes in the tree and view them as an alignment or as a set of Pfam domain graphics.
Unfortunately we have found that there are problems viewing the interactive tree when the it becomes larger than a certain limit. Furthermore, we have found that Internet Explorer can become unresponsive when viewing some trees, regardless of their size. We therefore show a text representation of the species tree when the size is above a certain limit or if you are using Internet Explorer to view the site.
If you are using IE you can still load the interactive tree by clicking the "Generate interactive tree" button, but please be aware of the potential problems that the interactive species tree can cause.
For all of the domain matches in a full alignment, we count the number that are found on all sequences in the alignment. This total is shown in the purple box.
We also count the number of unique sequences on which each domain is found, which is shown in green. Note that a domain may appear multiple times on the same sequence, leading to the difference between these two numbers.
Finally, we group sequences from the same organism according to the NCBI code that is assigned by UniProt, allowing us to count the number of distinct sequences on which the domain is found. This value is shown in the pink boxes.
We use the NCBI species tree to group organisms according to their taxonomy and this forms the structure of the displayed tree. Note that in some cases the trees are too large (have too many nodes) to allow us to build an interactive tree, but in most cases you can still view the tree in a plain text, non-interactive representation. Those species which are represented in the seed alignment for this domain are highlighted.
You can use the tree controls to manipulate how the interactive tree is displayed:
- show/hide the summary boxes
- highlight species that are represented in the seed alignment
- expand/collapse the tree or expand it to a given depth
- select a sub-tree or a set of species within the tree and view them graphically or as an alignment
- save a plain text representation of the tree
Please note: for large trees this can take some time. While the tree is loading, you can safely switch away from this tab but if you browse away from the family page entirely, the tree will not be loaded.
There are 7 interactions for this family. More...
We determine these interactions using iPfam, which considers the interactions between residues in three-dimensional protein structures and maps those interactions back to Pfam families. You can find more information about the iPfam algorithm in the journal article that accompanies the website.
For those sequences which have a structure in the Protein DataBank, we use the mapping between UniProt, PDB and Pfam coordinate systems from the PDBe group, to allow us to map Pfam domains onto UniProt sequences and three-dimensional protein structures. The table below shows the structures on which the LSM domain has been found. There are 622 instances of this domain found in the PDB. Note that there may be multiple copies of the domain in a single PDB structure, since many structures contain multiple copies of the same protein seqence.
Loading structure mapping...