71  structures 194  species 0  interactions 209  sequences 9  architectures

Family: RAG2 (PF03089)

Summary: Recombination activating protein 2

Pfam includes annotations and additional family information from a range of different sources. These sources can be accessed via the tabs below.

Recombination activating protein 2 Provide feedback

V-D-J recombination is the combinatorial process by which the huge range of immunoglobulin and T cell binding specificity is generated from a limited amount of genetic material. This process is synergistically activated by RAG1 and RAG2 in developing lymphocytes. Defects in RAG2 in humans are a cause of severe combined immunodeficiency B cell negative and Omenn syndrome.

Literature references

  1. Oettinger MA, Schatz DG, Gorka C, Baltimore D; , Science 1990;248:1517-1523.: RAG-1 and RAG-2, adjacent genes that synergistically activate V(D)J recombination. PUBMED:2360047 EPMC:2360047

Internal database links

This tab holds annotation information from the InterPro database.

InterPro entry IPR004321

The variable portion of the genes encoding immunoglobulins and T cell receptors are assembled from component V, D, and J DNA segments by a site-specific recombination reaction termed V(D)J recombination. V(D)J recombination is targeted to specific sites on the chromosome by recombination signal sequences (RSSs) that flank antigen receptor gene segments. The RSS consists of a conserved heptamer (consensus, 5'-CACAGTG-3') and nonamer (consensus, 5'-ACAAAAACC-3') separated by a spacer of either 12 or 23 bp. Efficient recombination occurs between a 12-RSS and a 23-RSS, a restriction known as the 12/23 rule.

V(D)J recombination can be divided into two phases, DNA cleavage and DNA joining. DNA cleavage requires two lymphocyte-specific factors, the products of the recombination activating genes, RAG1 and RAG2, which together recognise the RSSs and create double strand breaks at the RSS-coding segment junctions [ PUBMED:11961538 ]. RAG-mediated DNA cleavage occurs in a synaptic complex termed the paired complex, which is constituted from two distinct RSS-RAG complexes, a 12-SC and a 23-SC (where SC stands for signal complex). The DNA cleavage reaction involves two distinct enzymatic steps, initial nicking that creates a 3'-OH between a coding segment and its RSS, followed by hairpin formation in which the newly created 3'-OH attacks a phosphodiester bond on the opposite DNA strand. This generates a blunt, 5' phosphorylated signal end containing all of the RSS elements, and a covalently sealed hairpin coding end.

The second phase of V(D)J recombination, in which broken DNA fragments are processed and joined, is less well characterised. Signal ends are typically joined precisely to form a signal joint, whereas joining of the coding ends requires the hairpin structure to be opened and typically involves nucleotide addition and deletion before formation of the coding joint. The factors involved in these processes include ubiquitously expressed proteins involved in the repair of DNA double strand breaks by nonhomologous end joining, terminal deoxynucleotidyl transferase, and Artemis protein.

In addition to their critical roles in RSS recognition and DNA cleavage, the RAG proteins may perform two distinct types of functions in the postcleavage phase of V(D)J. A structural function has been inferred from the finding that, after DNA cleavage in vitro, the DNA ends remain associated with the RAG proteins in a "four end" complex known as the cleaved signal complex. After release of the coding ends in vitro, and after coding joint formation in vivo, the RAG proteins remain in a stable signal end complex (SEC) containing the two signal ends. These postcleavage complexes may serve as essential scaffolds for the second phase of the reaction, with the RAG proteins acting to organise the DNA processing and joining events.

The second type of RAG protein-mediated postcleavage activity is the catalysis of phosphodiester bond hydrolysis and strand transfer reactions. The RAG proteins are capable of opening hairpin coding ends in vitro. The RAG proteins also show 3' flap endonuclease activity that may contribute to coding end processing/joining and can utilise the 3' OH group on the signal ends to attack hairpin coding ends (forming hybrid or open/shut joints) or virtually any DNA duplex (forming a transposition product).

Gene Ontology

The mapping between Pfam and Gene Ontology is provided by InterPro. If you use this data please cite InterPro.

Domain organisation

Below is a listing of the unique domain organisations or architectures in which this domain is found. More...

Pfam Clan

This family is a member of clan Beta_propeller (CL0186), which has the following description:

This large clan contains proteins that contain beta propellers. These are composed of between 6 and 8 repeats. The individual repeats are composed of a four stranded sheet. The clan includes families such as WD40 Pfam:PF00400 where the individual repeats are modeled. The clan also includes families where the entire propeller is modeled such as Pfam:PF02239 usually because the individual repeats are not discernible. These proteins carry out a very wide diversity of functions including catalysis.

The clan contains the following 88 members:

ANAPC4_WD40 APMAP_N Arylesterase Arylsulfotran_2 Arylsulfotrans B_lectin BBS2_Mid Beta_propel Coatomer_WDAD CPSF_A CyRPA Cytochrom_D1 Dpp_8_9_N DPPIV_N DPPIV_rep DUF1513 DUF1668 DUF2415 DUF3748 DUF4221 DUF4933 DUF4934 DUF5046 DUF5050 DUF5122 DUF5128 DUF5711 DUF839 eIF2A FG-GAP FG-GAP_2 FG-GAP_3 Frtz Ge1_WD40 Glu_cyclase_2 Gmad1 GSDH Hyd_WA IKI3 Itfg2 Kelch_1 Kelch_2 Kelch_3 Kelch_4 Kelch_5 Kelch_6 Lactonase Ldl_recept_b LGFP Lgl_C LVIVD Me-amine-dh_H MgpC MRJP Nbas_N Neisseria_PilC NHL nos_propeller nos_propeller_2 Nucleoporin_N Nup160 PALB2_WD40 PD40 Pectate_lyase22 Peptidase_S9_N PHTB1_N Phytase-like PQQ PQQ_2 PQQ_3 RAG2 RCC1 RCC1_2 Reg_prop SBBP SBP56 SdiA-regulated SGL Str_synth TcdB_toxin_midN Tectonin TolB_like VID27 WD40 WD40_2 WD40_3 WD40_4 WD40_like


We store a range of different sequence alignments for families. As well as the seed alignment from which the family is built, we provide the full alignment, generated by searching the sequence database (reference proteomes) using the family HMM. We also generate alignments using four representative proteomes (RP) sets and the UniProtKB sequence database. More...

HMM logo

HMM logos is one way of visualising profile HMMs. Logos provide a quick overview of the properties of an HMM in a graphical form. You can see a more detailed description of HMM logos and find out how you can interpret them here. More...


This page displays the phylogenetic tree for this family's seed alignment. We use FastTree to calculate neighbour join trees with a local bootstrap based on 100 resamples (shown next to the tree nodes). FastTree calculates approximately-maximum-likelihood phylogenetic trees from our seed alignment.

Note: You can also download the data file for the tree.

Curation and family details

This section shows the detailed information about the Pfam family. You can see the definitions of many of the terms in this section in the glossary and a fuller explanation of the scoring system that we use in the scores section of the help pages.

Curation View help on the curation process

Seed source: Pfam-B_4702 (release 6.5)
Previous IDs: none
Type: Repeat
Sequence Ontology: SO:0001068
Author: Griffiths-Jones SR
Number in seed: 11
Number in full: 209
Average length of the domain: 318.90 aa
Average identity of full alignment: 61 %
Average coverage of the sequence by the domain: 61.07 %

HMM information View help on HMM parameters

HMM build commands:
build method: hmmbuild -o /dev/null HMM SEED
search method: hmmsearch -Z 57096847 -E 1000 --cpu 4 HMM pfamseq
Model details:
Parameter Sequence Domain
Gathering cut-off 19.4 19.4
Trusted cut-off 19.4 19.4
Noise cut-off 19.3 19.1
Model length: 339
Family (HMM) version: 16
Download: download the raw HMM for this family

Species distribution

For those sequences which have a structure in the Protein DataBank, we use the mapping between UniProt, PDB and Pfam coordinate systems from the PDBe group, to allow us to map Pfam domains onto UniProt sequences and three-dimensional protein structures. The table below shows the structures on which the RAG2 domain has been found. There are 71 instances of this domain found in the PDB. Note that there may be multiple copies of the domain in a single PDB structure, since many structures contain multiple copies of the same protein sequence.

