Please note: this site relies heavily on the use of javascript. Without a javascript-enabled browser, this site will not function correctly. Please enable javascript and reload the page, or switch to a different browser.
5  structures 1  species 0  interactions 1  sequence 1  architecture

Protein: CO1A1_HUMAN (P02452)

Summary

This is the summary of UniProt entry CO1A1_HUMAN (P02452).

Description: Collagen alpha-1(I) chain
Source organism: Homo sapiens (Human) (NCBI taxonomy ID 9606)
Length: 1464 amino acids
Reference Proteome: ✓

Please note: when we start each new Pfam data release, we take a copy of the UniProt sequence database. This snapshot of UniProt forms the basis of the overview that you see here. It is important to note that, although some UniProt entries may be removed after a Pfam release, these entries will not be removed from Pfam until the next Pfam data release.

Pfam domains

Download the data used to generate the domain graphic in JSON format.

Show or hide the data used to generate the graphic in JSON format.

Source Domain Start End
sig_p n/a 1 22
low_complexity n/a 6 19
low_complexity n/a 21 30
disorder n/a 22 35
disorder n/a 39 40
Pfam VWC 40 95
disorder n/a 92 1258
Pfam Collagen 107 163
low_complexity n/a 111 135
low_complexity n/a 134 156
Pfam Collagen 177 238
low_complexity n/a 177 218
low_complexity n/a 210 229
low_complexity n/a 232 256
Pfam Collagen 236 295
low_complexity n/a 271 295
Pfam Collagen 296 355
low_complexity n/a 301 326
low_complexity n/a 325 352
Pfam Collagen 356 415
low_complexity n/a 364 395
low_complexity n/a 391 419
Pfam Collagen 407 476
low_complexity n/a 411 436
low_complexity n/a 447 476
low_complexity n/a 471 496
low_complexity n/a 511 526
low_complexity n/a 537 559
low_complexity n/a 553 577
low_complexity n/a 582 596
low_complexity n/a 598 610
low_complexity n/a 613 637
low_complexity n/a 639 661
low_complexity n/a 685 724
low_complexity n/a 766 778
Pfam Collagen 779 838
low_complexity n/a 781 802
low_complexity n/a 793 826
low_complexity n/a 822 860
Pfam Collagen 835 898
low_complexity n/a 865 880
low_complexity n/a 874 889
low_complexity n/a 883 916
low_complexity n/a 913 938
low_complexity n/a 934 955
low_complexity n/a 991 1009
Pfam Collagen 1013 1080
low_complexity n/a 1039 1060
low_complexity n/a 1065 1093
Pfam Collagen 1076 1138
low_complexity n/a 1117 1150
Pfam Collagen 1133 1195
low_complexity n/a 1152 1175
low_complexity n/a 1174 1191
low_complexity n/a 1216 1227
Pfam COLFI 1227 1463

Show or hide domain scores.

Sequence information

This is the amino acid sequence of the UniProt sequence database entry with the accession P02452. This sequence is stored in the Pfam database and updated with each new Pfam release, but this means that the sequence we store may differ from that stored by UniProt.

Sequence:
1
MFSFVDLRLL LLLAATALLT HGQEEGQVEG QDEDIPPITC VQNGLRYHDR
50
51
DVWKPEPCRI CVCDNGKVLC DDVICDETKN CPGAEVPEGE CCPVCPDGSE
100
101
SPTDQETTGV EGPKGDTGPR GPRGPAGPPG RDGIPGQPGL PGPPGPPGPP
150
151
GPPGLGGNFA PQLSYGYDEK STGGISVPGP MGPSGPRGLP GPPGAPGPQG
200
201
FQGPPGEPGE PGASGPMGPR GPPGPPGKNG DDGEAGKPGR PGERGPPGPQ
250
251
GARGLPGTAG LPGMKGHRGF SGLDGAKGDA GPAGPKGEPG SPGENGAPGQ
300
301
MGPRGLPGER GRPGAPGPAG ARGNDGATGA AGPPGPTGPA GPPGFPGAVG
350
351
AKGEAGPQGP RGSEGPQGVR GEPGPPGPAG AAGPAGNPGA DGQPGAKGAN
400
401
GAPGIAGAPG FPGARGPSGP QGPGGPPGPK GNSGEPGAPG SKGDTGAKGE
450
451
PGPVGVQGPP GPAGEEGKRG ARGEPGPTGL PGPPGERGGP GSRGFPGADG
500
501
VAGPKGPAGE RGSPGPAGPK GSPGEAGRPG EAGLPGAKGL TGSPGSPGPD
550
551
GKTGPPGPAG QDGRPGPPGP PGARGQAGVM GFPGPKGAAG EPGKAGERGV
600
601
PGPPGAVGPA GKDGEAGAQG PPGPAGPAGE RGEQGPAGSP GFQGLPGPAG
650
651
PPGEAGKPGE QGVPGDLGAP GPSGARGERG FPGERGVQGP PGPAGPRGAN
700
701
GAPGNDGAKG DAGAPGAPGS QGAPGLQGMP GERGAAGLPG PKGDRGDAGP
750
751
KGADGSPGKD GVRGLTGPIG PPGPAGAPGD KGESGPSGPA GPTGARGAPG
800
801
DRGEPGPPGP AGFAGPPGAD GQPGAKGEPG DAGAKGDAGP PGPAGPAGPP
850
851
GPIGNVGAPG AKGARGSAGP PGATGFPGAA GRVGPPGPSG NAGPPGPPGP
900
901
AGKEGGKGPR GETGPAGRPG EVGPPGPPGP AGEKGSPGAD GPAGAPGTPG
950
951
PQGIAGQRGV VGLPGQRGER GFPGLPGPSG EPGKQGPSGA SGERGPPGPM
1000
1001
GPPGLAGPPG ESGREGAPGA EGSPGRDGSP GAKGDRGETG PAGPPGAPGA
1050
1051
PGAPGPVGPA GKSGDRGETG PAGPTGPVGP VGARGPAGPQ GPRGDKGETG
1100
1101
EQGDRGIKGH RGFSGLQGPP GPPGSPGEQG PSGASGPAGP RGPPGSAGAP
1150
1151
GKDGLNGLPG PIGPPGPRGR TGDAGPVGPP GPPGPPGPPG PPSAGFDFSF
1200
1201
LPQPPQEKAH DGGRYYRADD ANVVRDRDLE VDTTLKSLSQ QIENIRSPEG
1250
1251
SRKNPARTCR DLKMCHSDWK SGEYWIDPNQ GCNLDAIKVF CNMETGETCV
1300
1301
YPTQPSVAQK NWYISKNPKD KRHVWFGESM TDGFQFEYGG QGSDPADVAI
1350
1351
QLTFLRLMST EASQNITYHC KNSVAYMDQQ TGNLKKALLL QGSNEIEIRA
1400
1401
EGNSRFTYSV TVDGCTSHTG AWGKTVIEYK TTKTSRLPII DVAPLDVGAP
1450
1451
DQEFGFDVGP VCFL                                       
1464
 

Show the unformatted sequence.

Checksums:
CRC64:F0EC4DE778FFFC11
MD5:7ec87b317c34ae3a45a05503c2c8db1b

Structures

For those sequences which have a structure in the Protein DataBank, we use the mapping between UniProt, PDB and Pfam coordinate systems from the PDBe SIFTS project, to allow us to map Pfam domains onto UniProt three-dimensional structures. The table below shows the mapping between Pfam domains, this UniProt entry and a corresponding three dimensional structure.

Pfam family UniProt residues PDB ID PDB chain ID PDB residues View
COLFI 1227 - 1463 5K31 A 9 - 245 Jmol OpenAstexViewer
B 9 - 245 Jmol OpenAstexViewer
C 9 - 245 Jmol OpenAstexViewer
D 9 - 245 Jmol OpenAstexViewer
E 9 - 245 Jmol OpenAstexViewer
F 9 - 245 Jmol OpenAstexViewer
Collagen 1172 - 1192 5OU9 C 1 - 21 Jmol OpenAstexViewer
D 1 - 21 Jmol OpenAstexViewer
E 1 - 21 Jmol OpenAstexViewer
1178 - 1192 5OU8 C 1 - 15 Jmol OpenAstexViewer
D 1 - 15 Jmol OpenAstexViewer
E 1 - 15 Jmol OpenAstexViewer
133 - 152 1Q7D C 2 - 21 Jmol OpenAstexViewer
133 - 153 1Q7D A 2 - 22 Jmol OpenAstexViewer
B 2 - 22 Jmol OpenAstexViewer
254 - 273 3GXE E 254 - 273 Jmol OpenAstexViewer
260 - 270 3GXE F 260 - 270 Jmol OpenAstexViewer

TreeFam

Below is a phylogenetic tree of animal genes, with ortholog and paralog assignments, from TreeFam.