Please note: this site relies heavily on the use of javascript. Without a javascript-enabled browser, this site will not function correctly. Please enable javascript and reload the page, or switch to a different browser.
0  structures 1  species 0  interactions 1  sequence 1  architecture

Protein: CO1A2_MOUSE (Q01149)

Summary

This is the summary of UniProt entry CO1A2_MOUSE (Q01149).

Description: Collagen alpha-2(I) chain
Source organism: Mus musculus (Mouse) (NCBI taxonomy ID 10090)
Length: 1372 amino acids
Reference Proteome: ✓

Please note: when we start each new Pfam data release, we take a copy of the UniProt sequence database. This snapshot of UniProt forms the basis of the overview that you see here. It is important to note that, although some UniProt entries may be removed after a Pfam release, these entries will not be removed from Pfam until the next Pfam data release.

Pfam domains

Download the data used to generate the domain graphic in JSON format.

Show or hide the data used to generate the graphic in JSON format.

Source Domain Start End
sig_p n/a 1 27
disorder n/a 25 1172
Pfam Collagen 30 91
low_complexity n/a 33 78
Pfam Collagen 88 155
low_complexity n/a 95 109
low_complexity n/a 103 137
low_complexity n/a 129 148
low_complexity n/a 151 169
low_complexity n/a 225 241
low_complexity n/a 241 271
low_complexity n/a 274 292
low_complexity n/a 291 313
low_complexity n/a 316 352
low_complexity n/a 387 416
Pfam Collagen 475 535
low_complexity n/a 475 496
low_complexity n/a 501 514
low_complexity n/a 517 559
Pfam Collagen 521 588
low_complexity n/a 598 625
low_complexity n/a 628 655
low_complexity n/a 679 700
low_complexity n/a 699 722
low_complexity n/a 724 745
low_complexity n/a 741 757
low_complexity n/a 759 784
low_complexity n/a 792 808
low_complexity n/a 802 820
low_complexity n/a 853 883
Pfam Collagen 895 969
low_complexity n/a 904 928
low_complexity n/a 957 979
low_complexity n/a 987 1007
low_complexity n/a 1048 1069
Pfam Collagen 1051 1120
low_complexity n/a 1087 1113
Pfam COLFI 1137 1371
disorder n/a 1214 1216
disorder n/a 1220 1221

Show or hide domain scores.

Sequence information

This is the amino acid sequence of the UniProt sequence database entry with the accession Q01149. This sequence is stored in the Pfam database and updated with each new Pfam release, but this means that the sequence we store may differ from that stored by UniProt.

Sequence:
1
MLSFVDTRTL LLLAVTSCLA TCQYLQSGSV RKGPTGDRGP RGQRGPAGPR
50
51
GRDGVDGPMG PPGPPGSPGP PGSPAPPGLT GNFAAQYSDK GVSSGPGPMG
100
101
LMGPRGPPGA VGAPGPQGFQ GPAGEPGEPG QTGPAGPRGP AGSPGKAGED
150
151
GHPGKPGRPG ERGVVGPQGA RGFPGTPGLP GFKGVKGHSG MDGLKGQPGA
200
201
QGVKGEPGAP GENGTPGQAG ARGLPGERGR VGAPGPAGAR GSDGSVGPVG
250
251
PAGPIGSAGP PGFPGAPGPK GELGPVGNPG PAGPAGPRGE VGLPGLSGPV
300
301
GPPGNPGTNG LTGAKGATGL PGVAGAPGLP GPRGIPGPAG AAGATGARGL
350
351
VGEPGPAGSK GESGNKGEPG SVGAQGPPGP SGEEGKRGSP GEAGSAGPAG
400
401
PPGLRGSPGS RGLPGADGRA GVMGPPGNRG STGPAGIRGP NGDAGRPGEP
450
451
GLMGPRGLPG SPGNVGPSGK EGPVGLPGID GRPGPIGPAG PRGEAGNIGF
500
501
PGPKGPSGDP GKPGERGHPG LAGARGAPGP DGNNGAQGPP GPQGVQGGKG
550
551
EQGPAGPPGF QGLPGPSGTT GEVGKPGERG LPGEFGLPGP AGPRGERGTP
600
601
GESGAAGPSG PIGSRGPSGA PGPDGNKGEA GAVGAPGSAG ASGPGGLPGE
650
651
RGAAGIPGGK GEKGETGLRG DTGNTGRDGA RGIPGAVGAP GPAGASGDRG
700
701
EAGAAGPSGP AGPRGSPGER GEVGPAGPNG FAGPAGAAGQ PGAKGEKGTK
750
751
GPKGENGIVG PTGSVGAAGP SGPNGPPGPV GSRGDGGPPG MTGFPGAAGR
800
801
TGPPGPSGIA GPPGPPGAAG KEGIRGPRGD QGPVGRTGET GASGPPGFVG
850
851
EKGPSGEPGT AGAPGTAGPQ GLLGAPGILG LPGSRGERGL PGIAGALGEP
900
901
GPLGISGPPG ARGPPGAVGS PGVNGAPGEA GRDGNPGSDG PPGRDGQPGH
950
951
KGERGYPGSI GPTGAAGAPG PHGSVGPAGK HGNRGEPGPA GSVGPVGAVG
1000
1001
PRGPSGPQGI RGDKGEPGDK GHRGLPGLKG YSGLQGLPGL AGLHGDQGAP
1050
1051
GPVGPAGPRG PAGPSGPVGK DGRSGQPGPV GPAGVRGSQG SQGPAGPPGP
1100
1101
PGPPGPPGVS GGGYDFGFEG DFYRADQPRS QPSLRPKDYE VDATLKSLNN
1150
1151
QIETLLTPEG SRKNPARTCR DLRLSHPEWN SDYYWIDPNQ GCTMDAIKVY
1200
1201
CDFSTGETCI QAQPVNTPAK NSYSRAQANK HVWLGETING GSQFEYNVEG
1250
1251
VSSKEMATQL AFMRLLANRA SQNITYHCKN SIAYLDEETG SLNKAVLLQG
1300
1301
SNDVELVAEG NSRFTYSVLV DGCSKKTNEW GKTIIEYKTN KPSRLPFLDI
1350
1351
APLDIGGADQ EFRVEVGPVC FK                              
1372
 

Show the unformatted sequence.

Checksums:
CRC64:0D17DF5D6C1452D1
MD5:d3bbbe497cc5f434abfcc720bf5e24b1

TreeFam

Below is a phylogenetic tree of animal genes, with ortholog and paralog assignments, from TreeFam.

AlphaFold Structure Prediction

The protein structure below has been predicted by DeepMind with AlphaFold. For more information, please visit the AlphaFold page for this protein.

Model confidence scale

  Very High (pLDDT > 90)
  Confident (90 > pLDDT > 70)
  Low (70 > pLDDT > 50)
  Very Low (pLDDT < 50)
Highly accurate protein structure prediction with AlphaFold. John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli & Demis Hassabis Nature 2021-07-15; DOI: 10.1038/s41586-021-03819-2;