TFBScluster - Genome-wide.

Run TFBScluster

Background

Instructions

Example Files

References

Links

Homepage

What is TFBScluster - Genome-wide?

This web tool is designed to identify clusters of transcription factor binding sites (TFBSs) conserved in mammalian genomes. This web tool has the advantage of a simple user interface to select a range of TFBSs and retrieve a list of SWISS-PROT characterised genes to which the clusters are localised. This information may be directly used in the experimental validation a region.

Raw data.
The raw data for this analysis are mouse/human BLASTZ/CHAINNET genome alignments held at Genome Bioinformatics (UCSC). Genome-wide TFBSs are identified using TFBSsearch (available on our web site) via a script that converts the downloaded data format to the FASTA format.

The currently implemented alignments include:

  • March. 2005 mouse assembly (also known as build 34 and mm6).
  • May. 2004 human assembly (also known as build 35 and hg17).

It is also possible to use TFBSs that are also conserved between mouse and dog (canFam1, July 2004), or between mouse and opossum (monDom1, Oct. 2004).

The result is a set of libraries containing all the putative sites for different transcription factors. For each TFBS (e.g., EBOX) one library is created for the core sequence 'CANNTG'. The IUPAC letter 'N' is allowed to differ between genomes. Libraries are also created to extend the 'core' binding site one to three nucleotides 5' and 3', i.e., NCANNTGN, NNCANNTGNN or NNNCANNTGNNN. In these libraries the IUPAC letter N must be the same in both genomes. By extending the degree of conservation between the aligned genomes a more specific and reduced set of TFBSs are created.

Information for each TFBS cluster is stored in the GFF format. The start and end sites are coordinates of the mouse genome. The start and end positions for each TFBS relates to the 'core' sequence, for example NNGATANN - start = 3 and end = 6. Clusters are all reported on the sense strand as individual TFBSs may be on sense or complement strands. TFBSs from selected libraries are formed into clusters of a specified size. The final length of each cluster may be greater than the specified range as overlapping TFBS are combined to highlight the TFBS rich region.

Identifying candidate genes controlled by the clusters.
The UCSC genome assemblies ('builds') are also used by the Ensembl project; this connection allows annotated genes to be localised to the final TFBS clusters.

The version of Ensembl used by this site is 32.34 and is accessed via the Ensembl API.

All Ensembl annotated genes (transcripts clustered into transcript 'footprints') are localised to each cluster when a cluster is contained in a gene, or a gene is located within 100kb of a cluster. As a cluster may be localised to many genes the list is processed to identify one of two scenarios for each cluster:

  1. A cluster is situated in the intron of a gene.
  2. A cluster is situated 5' to a gene and/or 3' to a gene. The nearest gene is selected in both situations.

In order to identify the function of transcripts localised to clusters the SWISS-PROT identifier and Entrez Gene identifier (formerly Locuslink) in the Ensembl annotation are used (where available) to identify genes with characterised gene products. Anecdotally - there are more Ensembl genes with Entrez Gene IDs, but the genes may not have well defined functions.

The version of UniProt/SWISS-PROT used by this tool is v48.0 of 13 Sept. 2005.

The version of EntrezGene used by this tool is of 14 Sept. 2005.

The regulatory elements predicted by this tool may be responsible for the tissue specific expression of the candidate genes localised to them. If there is prior knowledge that a cluster is responsible for driving expression in a particular cell/tissue, it is possible to filter the final set of candidate genes to those expressed in tissues included in the Gene Expression Atlas 2.

Expression data were downloaded from the SymAtlas portal page as duplicate values. The median expression value for each gene probe was determined over all tissues. The fold over median was calculated by finding the mean value for each tissue and dividing this value by the overall median value.


Information on TFBS IUPAC codes.

Contents

Haematopoietic TFBS:
[AML1] [AP1] [CEBP] [CP2] [EBF] [EBOX] [EBOX-GATA] [EBOX (c-Myc)] [ETS] [GATA] [HMG] [Ikaros] [Iroquois] [MEF2] [MEIS1] [MYB] [Nanog] [NBOX] [NFAT] [NFAT-AP1] [NFKB] [Nkx2.5] [OCT3/4] [OTX] [PAX5] [SOX2] [SP1]

Liver study TFBS (based on work by Krivan and Wasserman, 2001):
[HNF1] [HNF3] [HNF4] [CEBP]

Muscle study TFBS (based on work by Wasserman and Fickett, 1998):
[MEF2] [SP1] [SRF] [EBOX (MyoD)] [TEF]

Other TFBS of interest:
[CRE] [FOXI1] [GLI1] [p53] [RE1] [STAT5]

[Table of conserved TFBS numbers]

[References]


Haematopoietic TFBS:

TFBS name: AML1
IUPAC code: TGYGGT
Bound by: AML1 (Acute Myeloid Leukemia-1) a.k.a. RUNX1.
Function: Transcription factor showing homology to the Drosophila pair rule gene Runt (Meyers and coworkers 1993). The gene was identified on the basis of its involvement in a leukemia associated translocation (Miyoshi and coworkers 1991). Knockout mouse studies have identified a role for AML1 in definitive, but not primative, haematopoiesis (Wang and coworkers 1996; Okuda and coworkers 1996).
Ref: Based upon consensus sequence from Gisler and coworkers (2002) and TRANSFAC(v6) accession M00261.

TFBS name: AP1
IUPAC code: NNNSTCA
Bound by: AP1 (Activating Protein-1).
Function: A leucine-zipper transcription factor, which is a heterodimer formed by c-Jun and c-Fos. AP1 acts synergistically with NFAT family proteins on composite regulatory elements involved in the regulation of the immune system (Macian and coworkers 2001).
Ref: Based on a consensus sequence detailed by Kel and coworkers (1999).

TFBS name: CEBP
IUPAC code: SYAAY
Bound by: C/EBP (CCAAT - enhancer binding protein).
Function: A family of basic region-leucine zipper (bZip) transcription factors that are exclusively expressed in myelomonocytic cells in the haematopoietic system, with different family members exhibiting different roles (Scott and coworkers 1992). CEBP-alpha and AML1 have been shown to work synergistically to regulate a critical monocytic lineage growth factor, macrophage-colony stimulating factor receptor (M-CSF) (Zhang and coworkers 1996).
Ref: Based upon consensus sequence from Osada and coworkers (1996) and TRANSFAC(v6) accession M00116.

TFBS name: CP2
IUPAC code:
CNRG*5-6*CNRG
Bound by: Alpha-globin transcription factor CP2 aka Transcription factor (LSF) and SAA3 enhancer factor.
Function: CP2 binds as a homo-dimer to the above motif. CP2 was originally identified as an important factor in the transcription of the alpha-globin gene. It has also been involved in foetal erythroid expression of the gamma-globin gene through heterodimer formation with NF-E4. It has recently been shown to interact with GATA-1 in the regulation of erythroid promoters (Francesca and coworkers 2006).
Ref: Based upon a consensus sequence detailed by Bose and coworkers (1997).

TFBS name: EBF
IUPAC code: CCCNNGRG
Bound by: EBF (Early B-cell factor) a.k.a. OLF1.
Function: A basic helix-loop-helix transcription factor required in early B-cell development (Johnson and Calame 2003).
Ref: Based upon consensus sequence from Gisler and coworkers (2002) and TRANSFAC(v6) accession M00261.

TFBS name: EBOX
IUPAC code: CANNTG
Bound by: bHLH transcription factors, including TAL1 (a.k.a. SCL).
Function: A basic helix-loop-helix transcription factor crucial in the development of haematopoietic stem cell lineages as knockout mice fail to produce any haematopoietic cells (Begley and Green 1999).
Ref: Based upon consensus sequence from Murre and coworkers (1989) and the core sequence of TRANSFAC(v6) accessions M00065, M00066, M00070, M00277 and M00278.

TFBS name: EBOX-GATA
IUPAC code:
CANNTG*8-10*GATA,
NCANNTGN*6-8*NGATAN,
NNCANNTGNN*4-6*NNGATANN,
NNNCANNTGNNN*2-4*NNNGATANNN
Bound by: Lmo2, Ldb1/NLI, TAL1, GATA-1 and E2A protein complex.
Function: An erythroid gene expression and haematopoietic cell differentiation (Osada and coworkers 1995; Wadman and coworkers 1997; Xu and coworkers 2003).
Ref: Based upon a consensus sequence detailed by Wadman and coworkers (1997).

TFBS name: EBOX (c-Myc)
IUPAC code: CAYGYG
Bound by: c-Myc
Function: A basic helix-loop-helix transcription factor. Overexpression has been implicated in the etiology of haematopoietic tumours (MYC_HUMAN Swiss-Prot entry).
Ref: Based upon a consensus sequence detailed by Cawley and coworkers (2004).

TFBS name: ETS
IUPAC code: GGAW
Bound by: Winged helix-turn-helix transcription factor family members including Elf-1 (ETS related transcription factor-1) and Fli-1 (Friend Leukemia Integration factor-1). PU.1 (a.k.a. Spi-1).
Function: The ETS family members have important roles in haematopoiesis Sharrocks and coworkers 1997) binding critically important regulatory elements in vitro and within haematopoitic progenitor cells (Gottgens and coworkers 2002). PU.1 is required in macrophage development and is required in other myeloid and lymphocytic lineages Warren and Rothenberg 2003).
Ref: Based on the core concensus sequence detailed by Sharrocks and coworkers (1997) and TRANSFAC(v6) accessions M00032 and M00074.

TFBS name: GATA
IUPAC code: GATA
Bound by: Zinc finger transcription factors GATA1-3.
Function: GATA factors are key regulators of haematopoiesis (Weiss and Orkin 1995). GATA1 has been identified as a component of the SCL binding complex and GATA2 has been shown to contribute to a necessary and sufficient 3' enhancer of the SCL gene (Gottgens and coworkers 2002).
GATA-1 is essential in eryroid development and is thought to participate in a mutually antagonistic role with PU.1 (Warren and Rothenberg 2003).
Ref: The GATA motif is the most widely identified binding sequence of GATA-1 (TRANSFAC(v6) accessions M0278, M00348 and M0349) and GATA-2 (TRANSFAC(v6) accessions M00126, M00127, M00128, M00203, M00346 and M00347). It should be noted that Merika and Orkin (1993) identified variation in the last position of the GATA motif.

TFBS name: HMG
IUPAC code: WWCAAWG
Bound by: TCF-1 (T-cell factor 1) and LEF1 (Lymphoid enhancing factor-1).
Function: High mobility group (HMG) transcription factors. Tcf-1 is uniquely expressed in adult mammal T-cells, while Lef-1 is expressed in T-cells and early B-cells. Both are known to interact with the vertebrate Wnt effector beta-catenin (Staal and Clevers 2000).
Ref: Pattern derived by combining consensus sequences from van de Wetering and coworkers (1991) and van Beest and coworkers (2000). Busslinger (1995) and TRANSFAC(v6) accession M00143.

TFBS name: Ikaros
IUPAC code: HRGGAW
Bound by: Ikaros (also by Aiolos and Helios).
Function: A zinc finger transcription factor required in the development of B, T and NK cells but not for myeloid cells (Georgopulos 2002).
Ref: Based upon TRANSFAC(v6) accessions M00086, M00087 and M00088.

TFBS name: Iroquois
IUPAC code: ACANNTGT
Bound by: Iroquois transcription factors of the Iro/Irx gene families.
Function: Homeodomain transcription factors, differing structurally from typical homeodomain proteins by containing a 63-aa homeodomain with a 3-aa loop extension (TALE) (Bilioni and coworkers 2005). These transcription factors are essential in controlling many aspects of developmental patterning (Cavodeassi and coworkers, 2001).
Ref: Based upon a consensus sequence from the work of Bilioni and coworkers (2005).

TFBS name: MEF2
IUPAC code: CTAWWWWTAR
Bound by: Myocyte-specific enhancer factor 2.
Function: A MADS family protein predominantly expressed in skeletal and cardiac muscle and to a lesser extent in the brain Pollock and Treisman (1991).
Ref: Based on a consensus sequence detailed by Dodou and coworkers (1995) and Krivan and Wasserman (2001).

TFBS name: MEIS1
IUPAC code: TGACAS
Bound by: MEIS1 (Myeloid ecotropic viral integration site 1).
Function: A homeobox protein belonging to the TALE ('three amino acid loop extension') family of homeodomain-containing proteins. MEIS1 has an important role in human myeloid leukaemias (Afonja and coworkers 2000) and neuroblastoma (Geerts and coworkers 2003). A recent study has shown that Meis1-deficient mouse embryos have haematopoietic, angiogenic and eye defects (Hisa and coworkers 2004).
Ref: Pattern derived by combining consensus sequences from Shen and coworkers (1997) and TRANSFAC(v6) accessions M00419, M00420 and M00421.

TFBS name: MYB
IUPAC code: YAACNG
Bound by: c-Myb.
Function: A homeodomain like transcription factor crucial in the development and functioning of haematopoietic stem cells. c-Myb knockout mice are able to produce committed progenitor cells, but these cells are unable to expand, resulting in the loss of definitive haematopoitic cell types (Mucenski and coworkers 1991; Sumner and coworkers 2000). Levels of c-Myb have been shown to favour different cell types. Sub-optimal levels favour the formation macrophages and megakaryocytes, whereas higher levels favour erythropoiesis and lymphopoiesis (Emambokus and coworkers 2003). The expression of c-Myb is thought to be controlled by the level of expressed GATA1 at the time of erythropoiesis (Bartunek and coworkers 2003).
Ref: Based upon TRANSFAC(v6) accessions M00004 and M00183.

TFBS name: Nanog
IUPAC code: SATTANS
Bound by: Nanog.
Function: A homeodomain transcription factor that is an essential regulator of early development and embryonic stem cell identity. It has been shown to collaborate with OCT4 and SOX2 to form the necessary regulatory circuitry and co-occupy a substantial portion of their target genes (Boyer and coworkers, 2005).
Ref: Based on the work of Mitsui and coworkers (2003).

TFBS name: NBOX
IUPAC code: CACNAG
Bound by: HES1 (Hairy and enhancer of split-1).
Function: A basic helix-loop-helix transcription factor is a downstream factor in the Notch1 signalling system. HES1 is important in the lineage commitment of T-cells and my bind the AML1/RUNX gene (Kojika and Griffin 2001).
Ref: Based on a consensus sequence detailed by Kojika and Griffin (2001).

TFBS name: NFAT
IUPAC code: GGAAA
Bound by: NFAT (Nuclear factor of activated T-cells) family of transcription factors.
Function: Four factors of the NFAT family act synergistically with AP-1 on composite regulatory elements involved in regulation of the immune system (Macian and coworkers 2001).
Ref: Based on a consensus sequence detailed by Kel and coworkers (1999).

TFBS name: NFAT-AP1
IUPAC code (full AP1 consensus):
WGGAAA*0-7*TGASTCA,
NWGGAAAN*0-5*NTGASTCAN,
NNWGGAAANN*0-3*NNTGASTCANN,
NNNWGGAAANNN*0-1*NNNTGASTCANNN
Ref: Based upon a consensus sequence detailed by Kel and coworkers (1999).

IUPAC code (half AP1 consensus):
GGAAA*0-10*STCA,
NGGAAAN*0-8*NSTCAN,
NNGGAAANN*0-6*NNSTCANN,
NNNGGAAANNN*0-4*NNNSTCANNN
Ref: Based upon a consensus sequence detailed by Kel and coworkers (1999).

TFBS name: NFKB
IUPAC code: GGGRNNYYY
Bound by: NF-kappaB (Nuclear Factor-Kappa B)
Function: The p65 subunit has been shown to be crucial in the survival or development of an early lymphocyte precursor (Horwitz and coworkers 1997).
Ref: Based upon a consensus sequence detailed by Martone and coworkers (2003), Senger and coworkers (2004), TRANSFAC(v6) accessions M00051, M00052 and M00054.

TFBS name: Nkx2.5
IUPAC code: CAMTTNR
Bound by: NK-type homeobox 2.5
Function: Nkx2.5 is expressed in the early cardiac crescent and then continues to be expressed throughout heart development (Lyons and coworkers, 1995). It has been shown to be cricial as a dominant-negative form of the gene blocks cardiogenesis (Grow and Krieg, 1998) and mutations in the genes cause congenital heart disease in humans (Schott and coworkers, 1998). The Drosophila orthologue Tinman has been shown to work in tandem with GATA factors to bring about cardiogenesis and haematopoiesis (Han and Olsen, 2005).
Ref: Based upon a consensus sequence detailed by Han and Olsen (2005).

TFBS name: OCT3/4
IUPAC code: ATGMWWVW
Bound by: OCT3/4 POU class of homeodomains.
Function: OCT4 interacts with other transcription factors, for example SOX2, to affect the expression of other genes in mouse ES cells (see references within Boyer and coworkers (2005). It has been shown to collaborate with SOX2 and Nanog to form the necessary regulatory circuitry and co-occupy a substantial portion of their target genes (Boyer and coworkers, 2005).
Pattern contributed by: Davide Ambrosetti, New York University.

TFBS name: OTX
IUPAC code: TAATCY
Bound by: Otx-1 (Orthodenticle homolog-1).
Function: A bicoid class homeobox gene, recently shown be expressed in haematopoietic pluripotent and erythroid progenitor cells (Levantini and coworkers 2003). Otx1 knockout mice show decreased levels of SCL and GATA-1, exhibiting a decreased number of blood cells. This phenotype was rescued in mice bred to constitutively express SCL (Oct-/-SCLtg), indicating that Otx-1 functions upstream of SCL (Levantini and coworkers 2003).
Ref: Based on a concensus sequence detailed by Sakamoto and coworkers (1997).

TFBS name: PAX5
IUPAC code: RNKMANBSNWGNRKRMM
Bound by: Pax-5 (Paired Box Protein-5) a.k.a. BSAP.
Function: A bipartite paired-domain transcription factor binding DNA at two points. Required in the establishment and commitment to the B-cell lineage (reviewed by Johnson and Calame 2003).
Ref: Pattern derived by combining consensus sequences from Czerny and Busslinger (1995), Pfeffer and coworkers (2000) and TRANSFAC(v6) accession M00143.

TFBS name: SOX2
IUPAC code: CWTTGTD
Bound by: SOX2
Function: A high mobility group transcription factor that is known to interact with OCT4, to affect the expression of other genes in mouse ES cells (see references within Boyer and coworkers (2005). It has been shown to collaborate with OCT4 and Nanog to form the necessary regulatory circuitry and co-occupy a substantial portion of their target genes (Boyer and coworkers, 2005).
Pattern contributed by: Davide Ambrosetti, New York University.

TFBS name: SP1
IUPAC code: GGGHGGG
Bound by: Sp-1.
Function: A ubiquitously expressed zinc-finger transcription factor show to be important for muscle specific expression (Sartorelli and coworkers 1990).
Ref: Based on a consensus sequence detailed by Krivan and Wasserman (2001).

TFBS name: SP1 (Updated)
IUPAC code: GGGSWGGG
Bound by: Sp-1.
Function: A ubiquitously expressed zinc-finger transcription factor show to be important for muscle specific expression (Sartorelli and coworkers 1990).
Ref: Based on a consensus sequence detailed by Liu and coworkers (2004).

TFBS name: SP1 (Cell 2004)
IUPAC code: GGKGYGGG
Bound by: Sp-1.
Function: A ubiquitously expressed zinc-finger transcription factor show to be important for muscle specific expression (Sartorelli and coworkers 1990).
Ref: Based on a consensus sequence detailed by Cawley and coworkers (2004).

Liver study TFBS:

TFBS name: HNF1
IUPAC code: GTTAAT
Bound by: Hepatocyte nuclear factor 1.
Function: Important transcription factor in liver development, paticularly in the expression of mature liver genes (Ktistaki and Taliandis 1997; Tronche and coworkers 1997; Darlington 1999).
Ref: Pattern derived by combining consensus sequence from Krivan and Wasserman (2001) and TRANSFAC(v6) accessions M00132 and M00206.

TFBS name: HNF3
IUPAC code: TRTTTRY
Bound by: Hepatocyte nuclear factor 3.
Function: Important transcription factor in liver development, paticularly in the expression of early stage liver genes (Ktistaki and Taliandis 1997; Tronche and coworkers 1997; Darlington 1999).
Ref: Pattern derived by combining consensus sequence from Krivan and Wasserman (2001) and TRANSFAC(v6) accessions M00129, M00131 and M00294.

TFBS name: HNF4
IUPAC code: CAAAGK
Bound by: Hepatocyte nuclear factor 4.
Function: Important transcription factor in liver development, paticularly in the expression of mature liver genes (Ktistaki and Taliandis 1997; Tronche and coworkers 1997; Darlington 1999).
Ref: Pattern derived by combining consensus sequence from Krivan and Wasserman (2001) and TRANSFAC(v6) accessions M00134, M00158 and M00411.

Muscle study TFBS:

TFBS name: MEF2
IUPAC code: CTAWWWWTAR
Bound by: Myocyte-specific enhancer factor 2.
Function: A MADS family protein predominantly expressed in skeletal and cardiac muscle and to a lesser extent in the brain Pollock and Treisman (1991).
Ref: Based on a consensus sequence detailed by Dodou and coworkers (1995) and Wasserman and Fickett (1998).

TFBS name: SRF
IUPAC code: CCWWWWWWGG
Bound by: Serum response factor.
Function: A MADS family protein that activates muscle gene expression via the CArG motif (Vandromme and coworkers 1992). "This gene encodes a ubiquitous nuclear protein that stimulates both cell proliferation and differentiation. It is a member of the MADS (MCM1, Agamous, Deficiens, and SRF) box superfamily of transcription factors. This protein binds to the serum response element (SRE) in the promoter region of target genes. This protein regulates the activity of many immediate-early genes, for example c-fos, and thereby participates in cell cycle regulation, apoptosis, cell growth, and cell differentiation. This gene is the downstream target of many pathways, for example the mitogen-activated protein kinase pathway (MAPK) that acts through the ternary complex factors (TCFs)." (Entrez Gene ID: 6722).
Ref: Based on a consensus sequence detailed by Dodou and coworkers (1995) and Wasserman and Fickett (1998).

TFBS name: EBOX (MyoD)
IUPAC code: CANCWG
Bound by: MyoD which belongs to the myogenin (Myf) subfamily of basic helix-loop-helix transcription factors.
Function: "It is involved in muscle cell differentiation, and is essential for repair of damaged tissue. It activates its own transcription which may stabilize commitment to myogenesis." (Entrez Gene ID: 4654).
Ref: Based on a consensus sequence detailed by Wasserman and Fickett (1998).

TFBS name: TEF
IUPAC code: CATTCC
Bound by: Transcriptional enhancer factor-1 related factors (TEF-1).
Function: TEF transcription factors bind to muscle-specific CATT regulatory elements (M-CAT sites) that are responsible for the activity of many promoters in cardiac and skeletal muscle (Farrance and Ordahl, 1996). There is evidence that TEF-1 binds cooperatively to repeated M-CAT motifs under positional and spatial constraints (Jiang and coworkers, 2000).
Ref: Based on a consensus sequence detailed by Wasserman and Fickett (1998) and Jiang and coworkers (2000).

Other TFBS of interest:

TFBS name: CRE
IUPAC code: TGACGTCA (full CRE consensus)
Bound by: The cAMP-response element binding protein (CREB).
Function: The CREB family of cAMP induced activators stimulate gene expression after phosphorylation at a conserved serine (Mayr and Montminy, 2001). The genome-wide locations of CRE motifs have been mapped, but only a small proportion of CREB target genes are induced by cAMP in any cell type (Zhang and coworkers, 2005). Their work suggests additional CREB regulatory partners are required for recruitment of the transcriptional apparatus to a promoter.
Ref: Based on a consensus sequence detailed by Zhang and coworkers (2005).

IUPAC code: TGACG (half CRE consensus)
Ref: Based on a consensus sequence detailed by Zhang and coworkers (2005).

TFBS name: FOXI1
IUPAC code: TRTTKRY
Bound by: Forkhead family transcription factor FOXI1.
Function: "The specific function of this gene has not yet been determined; however, it is possible that this gene plays an important role in the development of the cochlea and vestibulum, as well as embryogenesis. Mutations in this gene may be associated with the common cavity phenotype. Two transcript variants encoding different isoforms have been found for this gene." (Entrez Gene ID: 2299).
Ref: Based on a consensus sequence detailed by Blomqvist and coworkers (2004) and Kurth and coworkers (2006).

TFBS name: GLI1
IUPAC code: GACCACCCA
Bound by: Kruppel-type zinc finger transcription factor GLI1.
Function: GLI1 mediates Hedgehog signalling including Sonic Hedgehog (Yoon and coworkers 2002). Hedgehog signalling has been implicated with the induction of haematopoiesis and vasculogenesis from the mesodermal progenitor, the haemangioblast (Byrd and coworkers 2002; Baron 2003).
Ref: Based on a consensus sequence detailed by Kinzler and Vogelstein (1990) and Yoon and coworkers (2002).

TFBS name: GLI1 - multiple sites
IUPAC code: GACCACCCA,CACCACCCA,GTCCACCCA,GAACACCCA,GACCCCCCA, GACCTCCCA,GACCACCAA
Bound by: Zinc finger transcription factor GLI1.
Function: As above. Six additional sites differing by one nucloetide from the published consensus (GACCACCCA).
Ref: Based on sequences detailed by Yoon and coworkers (2002).

TFBS name: p53
IUPAC code: RCNWGYNN*0-1*NNRCAWGY
Bound by: Nuclear protein p53.
Function: "Tumor protein p53, a nuclear protein, plays an essential role in the regulation of cell cycle, specifically in the transition from G0 to G1. It is found in very low levels in normal cells, however, in a variety of transformed cell lines, it is expressed in high amounts, and believed to contribute to transformation and malignancy. p53 is a DNA-binding protein containing DNA-binding, oligomerization and transcription activation domains. It is postulated to bind as a tetramer to a p53-binding site and activate expression of downstream genes that inhibit growth and/or invasion, and thus function as a tumor suppressor." (Entrez Gene ID: 7157).
Ref: Based on the whole genome search for p53 binding sites by Wei and coworkers (2006).

TFBS name: RE1 (NRSE)
IUPAC code: NTYAGMRCCNNRGMSAG
Bound by: Kruppel-type zinc finger transcription factor REST.
Function: "The RE-1 silencing transcription factor gene encodes a transcriptional repressor which represses neuronal genes in non-neuronal tissues. It represses transcription by binding a DNA sequence element called the neuron-restrictive silencer element. The protein is also found in undifferentiated neuronal progenitor cells, and it is thought that this repressor may act as a master negative regular of neurogenesis. Alternatively spliced transcript variants have been described; however, their full length nature has not been determined." (Entrez Gene ID: 5978).
Ref: Based on a consensus sequence detailed by Bruce and coworkers (2004).

TFBS name: RE1 (NRSE) - multiple sites
IUPAC code:
NTYAGMRCCNNRGMSAG,NNYAGMRCCNNRGMSAG,NTNAGMRCCNNRGMSAG,NTYNGMRCCNNRGMSAG,
NTYANMRCCNNRGMSAG,NTYAGNRCCNNRGMSAG,NTYAGMNCCNNRGMSAG,NTYAGMRNCNNRGMSAG,
NTYAGMRCNNNRGMSAG,NTYAGMRCCNNNGMSAG,NTYAGMRCCNNRNMSAG,NTYAGMRCCNNRGNSAG,
NTYAGMRCCNNRGMNAG,NTYAGMRCCNNRGMSNG,NTYAGMRCCNNRGMSAN
Bound by: Kruppel-type zinc finger transcription factor REST.
Function: As above. Additional sites differ by one nucleotide from the published consensus (NTYAGMRCCNNRGMSAG).
Ref: Based on a consensus sequence detailed by Bruce and coworkers (2004).

TFBS name: STAT5
IUPAC code: TTCYNRGAA
Bound by: Signal transducers and activators of transcription 5a and 5b.
Function: "In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein is activated by, and mediates the responses of many cell ligands, such as IL2, IL3, IL7 GM-CSF, erythropoietin, thrombopoietin, and different growth hormones. Activation of this protein in myeloma and lymphoma associated with a TEL/JAK2 gene fusion is independent of cell stimulus and has been shown to be essential for the tumorigenesis. The mouse counterpart of this gene is found to induce the expression of BCL2L1/BCL-X(L), which suggests the antiapoptotic function of this gene in cells." (Entrez Gene ID: 6776).
Ref: Based on a consensus sequence detailed by Soldaini and coworkers (2000).

[TOP]

Valid HTML 4.01! Webmaster.