What is TFBScluster - Genome-wide?
This web tool is designed to identify clusters of transcription
factor binding sites (TFBSs) conserved in mammalian genomes. This
web tool has the advantage of a simple user interface to select a range
of TFBSs and retrieve a list of SWISS-PROT characterised genes to which the
clusters are localised. This information may be directly used in the
experimental validation a region.
Raw data.
The raw data for this analysis are human/mouse BLASTZ/CHAINNET
genome alignments held at
Genome Bioinformatics (UCSC). Genome-wide TFBSs are identified using
TFBSsearch (available on our web site) via a script that converts
the downloaded data format to the FASTA format.
The currently implemented alignments include:
- May. 2004 human assembly (also known as build 35 and hg17).
- May. 2004 mouse assembly (also known as build 33 and mm5).
It is also possible to use TFBSs that are also conserved between human
and dog (canFam1, July 2004), or between human and opossum (monDom1, Oct. 2004).
Searches can also be restricted to particular areas of the human genome.
It is possible to only use TFBSs that are present in active promoter regions
identified in the fibroblast cell line IMR90 (reported by
Kim and coworkers, 2005).
All of the TFBSs have also been screened using
Regulatory Potential scores (also see the corresponding publication at
PubMed1 and PubMed2). The scores have been computed
from alignments of human, chimpanzee, mouse, rat and dog.
For ease of use the 1bp scores from UCSC were converted to areas covered by
RP scores > 0. This is a threshold score determined by analysis
of the haemoglobin beta gene locus. New TFBS library files (TFBS_filtered)
were created to only include those TFBSs present in these areas.
The result is a set of libraries containing all the putative sites
for different transcription factors. For each TFBS (e.g., EBOX) one
library is created for the core sequence 'CANNTG'. The IUPAC letter 'N' is
allowed to differ between genomes. Libraries are also created to extend the
'core' binding site one to three nucleotides 5' and 3', i.e.,
NCANNTGN, NNCANNTGNN or NNNCANNTGNNN. In these libraries the IUPAC letter N
must be the same in both genomes. By extending the degree of conservation
between the aligned genomes a more specific and reduced set of TFBSs are created.
Information for each TFBS cluster is stored in the
GFF format. The start and
end sites are coordinates of the human genome. The start and
end positions for each TFBS relates to the 'core' sequence, for example
NNGATANN - start = 3 and end = 6. Clusters are all reported on the sense
strand as individual TFBSs may be on sense or complement strands. TFBSs from
selected libraries are formed into clusters of a specified size.
The final length of each cluster may be greater than the specified range as
overlapping TFBS are combined to highlight the TFBS rich region.
Identifying candidate genes controlled by the clusters.
The UCSC genome assemblies ('builds') are also used by the
Ensembl project; this
connection allows annotated genes to be localised to the final TFBS clusters.
The version of Ensembl used by this site is 32.35e and is accessed
via the Ensembl API.
All Ensembl annotated genes (transcripts clustered into transcript 'footprints')
are localised to each cluster when a cluster is contained in a gene, or a gene is
located within 100kb of a cluster. As a cluster may be localised to many genes
the list is processed to identify one of two scenarios for each cluster:
- A cluster is situated in the intron of a gene.
- A cluster is situated 5' to a gene and/or 3' to a gene.
The nearest gene is selected in both situations.
In order to identify the function of transcripts localised to
clusters the
SWISS-PROT identifier and Entrez Gene identifier (formerly Locuslink) in the Ensembl
annotation are used (where available) to identify genes with characterised
gene products. Anecdotally - there are more Ensembl genes with Entrez Gene
IDs, but the genes may not have well defined functions.
The version of UniProt/SWISS-PROT used by this tool is v48.0 of 13 Sept. 2005.
The version of EntrezGene used by this tool is of 14 Sept. 2005.
The regulatory elements predicted by this tool may be responsible for the tissue
specific expression of the candidate genes localised to them. If there is
prior knowledge that a cluster is responsible for driving expression in
a particular cell/tissue, it is possible to filter the final set of candidate
genes to those expressed in tissues included in the
Gene Expression Atlas 2.
Expression data were downloaded from the
SymAtlas portal page
as duplicate values.
The median expression value for each gene probe was determined over all
tissues. The fold over median was calculated by finding the mean value
for each tissue and dividing this value by the overall median value.
Information on TFBS IUPAC codes.
Contents
Haematopoietic TFBS:
[AML1]
[AP1]
[CEBP]
[CP2]
[EBF]
[EBOX]
[EBOX-GATA]
[EBOX (c-Myc)]
[ETS]
[GATA]
[HMG]
[Ikaros]
[Iroquois]
[MEF2]
[MEIS1]
[MYB]
[Nanog]
[NBOX]
[NFAT]
[NFAT-AP1]
[NFKB]
[Nkx2.5]
[OCT3/4]
[OTX]
[PAX5]
[SOX2]
[SP1]
Liver study TFBS (based on work by Krivan and Wasserman, 2001):
[HNF1]
[HNF3]
[HNF4]
[CEBP]
Muscle study TFBS (based on work by Wasserman and Fickett, 1998):
[MEF2]
[SP1]
[SRF]
[EBOX (MyoD)]
[TEF]
Other TFBS of interest:
[CRE]
[FOXI1]
[GLI1]
[p53]
[RE1]
[STAT5]
[Table of conserved TFBS numbers]
[References]
Haematopoietic TFBS:
TFBS name: AML1
IUPAC code: TGYGGT
Bound by: AML1 (Acute Myeloid Leukemia-1) a.k.a. RUNX1.
Function: Transcription factor showing homology to the
Drosophila pair rule gene Runt (Meyers and coworkers 1993). The gene was
identified on the basis of its involvement in a leukemia associated
translocation (Miyoshi and coworkers 1991). Knockout mouse studies have
identified a role for AML1 in definitive, but not primative,
haematopoiesis (Wang and coworkers 1996; Okuda and coworkers 1996).
Ref: Based upon consensus sequence from Gisler and coworkers (2002)
and TRANSFAC(v6) accession M00261.
TFBS name: AP1
IUPAC code: NNNSTCA
Bound by: AP1 (Activating Protein-1).
Function: A leucine-zipper transcription factor, which is a
heterodimer formed by c-Jun and c-Fos. AP1 acts synergistically with NFAT
family proteins on composite regulatory elements involved in the regulation
of the immune system (Macian and coworkers 2001).
Ref: Based on a consensus sequence detailed by Kel and coworkers
(1999).
TFBS name: CEBP
IUPAC code: SYAAY
Bound by: C/EBP (CCAAT - enhancer binding protein).
Function: A family of basic region-leucine zipper (bZip) transcription
factors that are exclusively expressed in myelomonocytic cells in the
haematopoietic system, with different family members exhibiting different roles
(Scott and coworkers 1992).
CEBP-alpha and AML1 have been shown to work synergistically to regulate
a critical monocytic lineage growth factor, macrophage-colony stimulating
factor receptor (M-CSF) (Zhang and coworkers 1996).
Ref: Based upon consensus sequence from Osada and coworkers (1996) and
TRANSFAC(v6) accession M00116.
TFBS name: CP2
IUPAC code: CNRG*5-6*CNRG
Bound by: Alpha-globin transcription factor CP2 aka Transcription factor (LSF) and SAA3 enhancer factor.
Function: CP2 binds as a homo-dimer to the above motif. CP2 was originally identified as an important factor in the
transcription of the alpha-globin gene. It has also been involved in foetal erythroid expression of the
gamma-globin gene through heterodimer formation with NF-E4. It has recently been shown to interact with
GATA-1 in the regulation of erythroid promoters (Francesca and coworkers 2006).
Ref: Based upon a consensus sequence detailed by Bose and
coworkers (1997).
TFBS name: EBF
IUPAC code: CCCNNGRG
Bound by: EBF (Early B-cell factor) a.k.a. OLF1.
Function: A basic helix-loop-helix transcription factor required in early
B-cell development (Johnson and Calame 2003).
Ref: Based upon consensus sequence from Gisler and coworkers (2002)
and TRANSFAC(v6) accession M00261.
TFBS name: EBOX
IUPAC code: CANNTG
Bound by: bHLH transcription factors, including TAL1 (a.k.a. SCL).
Function: A basic helix-loop-helix transcription factor crucial in the
development of haematopoietic stem cell lineages as knockout mice fail
to produce any haematopoietic cells (Begley and Green 1999).
Ref: Based upon consensus sequence from Murre and coworkers (1989)
and the core sequence of TRANSFAC(v6) accessions M00065, M00066, M00070,
M00277 and M00278.
TFBS name: EBOX-GATA
IUPAC code: CANNTG*8-10*GATA,
NCANNTGN*6-8*NGATAN,
NNCANNTGNN*4-6*NNGATANN,
NNNCANNTGNNN*2-4*NNNGATANNN
Bound by: Lmo2, Ldb1/NLI, TAL1, GATA-1 and E2A protein
complex.
Function: An erythroid gene expression and haematopoietic
cell differentiation (Osada and coworkers 1995; Wadman and coworkers
1997; Xu and coworkers 2003).
Ref: Based upon a consensus sequence detailed by Wadman and
coworkers (1997).
TFBS name: EBOX (c-Myc)
IUPAC code: CAYGYG
Bound by: c-Myc
Function: A basic helix-loop-helix transcription factor. Overexpression
has been implicated in the etiology of haematopoietic tumours (MYC_HUMAN Swiss-Prot
entry).
Ref: Based upon a consensus sequence detailed by Cawley and coworkers
(2004).
TFBS name: ETS
IUPAC code: GGAW
Bound by: Winged helix-turn-helix transcription factor family
members including Elf-1 (ETS related transcription factor-1) and Fli-1
(Friend Leukemia Integration factor-1). PU.1 (a.k.a. Spi-1).
Function: The ETS family members have important roles in haematopoiesis
Sharrocks and coworkers 1997) binding critically important regulatory
elements in vitro and within haematopoitic progenitor cells
(Gottgens and coworkers 2002). PU.1 is required in macrophage development
and is required in other myeloid and lymphocytic lineages
Warren and Rothenberg 2003).
Ref: Based on the core concensus sequence detailed by Sharrocks and
coworkers (1997) and TRANSFAC(v6) accessions M00032 and M00074.
TFBS name: GATA
IUPAC code: GATA
Bound by: Zinc finger transcription factors GATA1-3.
Function: GATA factors are key regulators of haematopoiesis
(Weiss and Orkin 1995). GATA1 has been identified as a component of the SCL
binding complex and GATA2 has been shown to contribute to a necessary and
sufficient 3' enhancer of the SCL gene (Gottgens and coworkers 2002).
GATA-1 is essential in eryroid development and is thought to participate
in a mutually antagonistic role with PU.1 (Warren and Rothenberg 2003).
Ref: The GATA motif is the most widely identified binding sequence
of GATA-1 (TRANSFAC(v6) accessions M0278, M00348 and M0349) and GATA-2
(TRANSFAC(v6) accessions M00126, M00127, M00128, M00203, M00346 and M00347).
It should be noted that Merika and Orkin (1993) identified variation in the
last position of the GATA motif.
TFBS name: HMG
IUPAC code: WWCAAWG
Bound by: TCF-1 (T-cell factor 1) and LEF1 (Lymphoid enhancing factor-1).
Function: High mobility group (HMG) transcription factors. Tcf-1
is uniquely expressed in adult mammal T-cells, while Lef-1 is expressed in
T-cells and early B-cells. Both are known to interact with the vertebrate Wnt
effector beta-catenin (Staal and Clevers 2000).
Ref: Pattern derived by combining consensus sequences from van de
Wetering and coworkers (1991) and van Beest and coworkers (2000).
Busslinger (1995) and TRANSFAC(v6) accession M00143.
TFBS name: Ikaros
IUPAC code: HRGGAW
Bound by: Ikaros (also by Aiolos and Helios).
Function: A zinc finger transcription factor required in the development
of B, T and NK cells but not for myeloid cells (Georgopulos 2002).
Ref: Based upon TRANSFAC(v6) accessions M00086, M00087 and M00088.
TFBS name: Iroquois
IUPAC code: ACANNTGT
Bound by: Iroquois transcription factors of the Iro/Irx gene families.
Function: Homeodomain transcription factors, differing structurally from
typical homeodomain proteins by containing a 63-aa homeodomain with a 3-aa
loop extension (TALE) (Bilioni and coworkers 2005). These transcription factors
are essential in controlling many aspects of developmental patterning (Cavodeassi
and coworkers, 2001).
Ref: Based upon a consensus sequence from the work of Bilioni and coworkers (2005).
TFBS name: MEF2
IUPAC code: CTAWWWWTAR
Bound by: Myocyte-specific enhancer factor 2.
Function: A MADS family protein predominantly expressed in
skeletal and cardiac muscle and to a lesser extent in the brain
Pollock and Treisman (1991).
Ref: Based on a consensus sequence detailed by Dodou and
coworkers (1995) and Krivan and Wasserman (2001).
TFBS name: MEIS1
IUPAC code: TGACAS
Bound by: MEIS1 (Myeloid ecotropic viral integration site 1).
Function: A homeobox protein belonging to the TALE ('three amino acid
loop extension') family of homeodomain-containing proteins. MEIS1 has an
important role in human myeloid leukaemias (Afonja and coworkers 2000) and
neuroblastoma (Geerts and coworkers 2003).
A recent study has shown that Meis1-deficient mouse embryos have haematopoietic,
angiogenic and eye defects (Hisa and coworkers 2004).
Ref: Pattern derived by combining consensus sequences from
Shen and coworkers (1997) and TRANSFAC(v6) accessions M00419, M00420 and M00421.
TFBS name: MYB
IUPAC code: YAACNG
Bound by: c-Myb.
Function: A homeodomain like transcription factor crucial in the
development and functioning of haematopoietic stem cells. c-Myb knockout mice
are able to produce committed progenitor cells, but these cells are unable to
expand, resulting in the loss of definitive haematopoitic cell types
(Mucenski and coworkers 1991; Sumner and coworkers 2000). Levels of c-Myb
have been shown to favour different cell types. Sub-optimal levels favour
the formation macrophages and megakaryocytes, whereas higher levels favour
erythropoiesis and lymphopoiesis (Emambokus and coworkers 2003). The expression
of c-Myb is thought to be controlled by the level of expressed GATA1 at the
time of erythropoiesis (Bartunek and coworkers 2003).
Ref: Based upon TRANSFAC(v6) accessions M00004 and M00183.
TFBS name: Nanog
IUPAC code: SATTANS
Bound by: Nanog.
Function: A homeodomain transcription factor that is an essential
regulator of early development and embryonic stem cell identity. It has been
shown to collaborate with OCT4 and SOX2 to form the necessary regulatory
circuitry and co-occupy a substantial portion of their target genes
(Boyer and coworkers, 2005).
Ref: Based on the work of Mitsui and coworkers (2003).
TFBS name: NBOX
IUPAC code: CACNAG
Bound by: HES1 (Hairy and enhancer of split-1).
Function: A basic helix-loop-helix transcription factor is a downstream
factor in the Notch1 signalling system. HES1 is important in the lineage
commitment of T-cells and my bind the AML1/RUNX gene (Kojika and Griffin
2001).
Ref: Based on a consensus sequence detailed by Kojika and Griffin
(2001).
TFBS name: NFAT
IUPAC code: GGAAA
Bound by: NFAT (Nuclear factor of activated T-cells) family of
transcription factors.
Function: Four factors of the NFAT family act synergistically with
AP-1 on composite regulatory elements involved in regulation of the
immune system (Macian and coworkers 2001).
Ref: Based on a consensus sequence detailed by Kel and coworkers
(1999).
TFBS name: NFAT-AP1
IUPAC code (full AP1 consensus):
WGGAAA*0-7*TGASTCA,
NWGGAAAN*0-5*NTGASTCAN,
NNWGGAAANN*0-3*NNTGASTCANN,
NNNWGGAAANNN*0-1*NNNTGASTCANNN
Ref: Based upon a consensus sequence detailed by Kel and coworkers
(1999).
IUPAC code (half AP1 consensus):
GGAAA*0-10*STCA,
NGGAAAN*0-8*NSTCAN,
NNGGAAANN*0-6*NNSTCANN,
NNNGGAAANNN*0-4*NNNSTCANNN
Ref: Based upon a consensus sequence detailed by Kel and coworkers
(1999).
TFBS name: NFKB
IUPAC code: GGGRNNYYY
Bound by: NF-kappaB (Nuclear Factor-Kappa B)
Function: The p65 subunit has been shown to be crucial in the survival or
development of an early lymphocyte precursor (Horwitz and coworkers 1997).
Ref: Based upon a consensus sequence detailed by Martone and coworkers
(2003), Senger and coworkers (2004), TRANSFAC(v6) accessions M00051, M00052 and
M00054.
TFBS name: Nkx2.5
IUPAC code: CAMTTNR
Bound by: NK-type homeobox 2.5
Function: Nkx2.5 is expressed in the early cardiac crescent and
then continues to be expressed throughout heart development (Lyons and coworkers,
1995). It has been shown to be cricial as a dominant-negative form of the gene blocks
cardiogenesis (Grow and Krieg, 1998) and mutations in the genes cause congenital
heart disease in humans (Schott and coworkers, 1998). The Drosophila orthologue
Tinman has been shown to work in tandem with GATA factors to bring about
cardiogenesis and haematopoiesis (Han and Olsen, 2005).
Ref: Based upon a consensus sequence detailed by Han and Olsen (2005).
TFBS name: OCT3/4
IUPAC code: ATGMWWVW
Bound by: OCT3/4 POU class of homeodomains.
Function: OCT4 interacts with other transcription factors, for example
SOX2, to affect the expression of other genes in mouse ES cells (see references
within Boyer and coworkers (2005). It has been shown to collaborate with SOX2
and Nanog to form the necessary regulatory circuitry and co-occupy a substantial
portion of their target genes (Boyer and coworkers, 2005).
Pattern contributed by: Davide Ambrosetti, New York University.
TFBS name: OTX
IUPAC code: TAATCY
Bound by: Otx-1 (Orthodenticle homolog-1).
Function: A bicoid class homeobox gene, recently shown be expressed
in haematopoietic pluripotent and erythroid progenitor cells (Levantini and
coworkers 2003). Otx1 knockout mice show decreased levels of SCL and GATA-1,
exhibiting a decreased number of blood cells. This phenotype was rescued
in mice bred to constitutively express SCL (Oct-/-SCLtg), indicating that Otx-1
functions upstream of SCL (Levantini and coworkers 2003).
Ref: Based on a concensus sequence detailed by Sakamoto and
coworkers (1997).
TFBS name: PAX5
IUPAC code: RNKMANBSNWGNRKRMM
Bound by: Pax-5 (Paired Box Protein-5) a.k.a. BSAP.
Function: A bipartite paired-domain transcription factor binding DNA at
two points. Required in the establishment and commitment to the
B-cell lineage (reviewed by Johnson and Calame 2003).
Ref: Pattern derived by combining consensus sequences from Czerny and
Busslinger (1995), Pfeffer and coworkers (2000) and TRANSFAC(v6) accession
M00143.
TFBS name: SOX2
IUPAC code: CWTTGTD
Bound by: SOX2
Function: A high mobility group transcription factor that is known to
interact with OCT4, to affect the expression of other genes in mouse ES cells
(see references within Boyer and coworkers (2005). It has been shown to collaborate
with OCT4 and Nanog to form the necessary regulatory circuitry and co-occupy a substantial
portion of their target genes (Boyer and coworkers, 2005).
Pattern contributed by: Davide Ambrosetti, New York University.
TFBS name: SP1
IUPAC code: GGGHGGG
Bound by: Sp-1.
Function: A ubiquitously expressed zinc-finger transcription
factor show to be important for muscle specific expression
(Sartorelli and coworkers 1990).
Ref: Based on a consensus sequence detailed by Krivan and
Wasserman (2001).
TFBS name: SP1 (Updated)
IUPAC code: GGGSWGGG
Bound by: Sp-1.
Function: A ubiquitously expressed zinc-finger transcription
factor show to be important for muscle specific expression
(Sartorelli and coworkers 1990).
Ref: Based on a consensus sequence detailed by Liu and
coworkers (2004).
TFBS name: SP1 (Cell 2004)
IUPAC code: GGKGYGGG
Bound by: Sp-1.
Function: A ubiquitously expressed zinc-finger transcription
factor show to be important for muscle specific expression
(Sartorelli and coworkers 1990).
Ref: Based on a consensus sequence detailed by Cawley and
coworkers (2004).
Liver study TFBS:
TFBS name: HNF1
IUPAC code: GTTAAT
Bound by: Hepatocyte nuclear factor 1.
Function: Important transcription factor in liver
development, paticularly in the expression of mature liver genes
(Ktistaki and Taliandis 1997; Tronche and coworkers 1997; Darlington 1999).
Ref: Pattern derived by combining consensus sequence from
Krivan and Wasserman (2001) and TRANSFAC(v6) accessions M00132 and
M00206.
TFBS name: HNF3
IUPAC code: TRTTTRY
Bound by: Hepatocyte nuclear factor 3.
Function: Important transcription factor in liver
development, paticularly in the expression of early stage liver genes
(Ktistaki and Taliandis 1997; Tronche and coworkers 1997; Darlington 1999).
Ref: Pattern derived by combining consensus sequence from
Krivan and Wasserman (2001) and TRANSFAC(v6) accessions M00129,
M00131 and M00294.
TFBS name: HNF4
IUPAC code: CAAAGK
Bound by: Hepatocyte nuclear factor 4.
Function: Important transcription factor in liver
development, paticularly in the expression of mature liver genes
(Ktistaki and Taliandis 1997; Tronche and coworkers 1997; Darlington 1999).
Ref: Pattern derived by combining consensus sequence from
Krivan and Wasserman (2001) and TRANSFAC(v6) accessions M00134,
M00158 and M00411.
Muscle study TFBS:
TFBS name: MEF2
IUPAC code: CTAWWWWTAR
Bound by: Myocyte-specific enhancer factor 2.
Function: A MADS family protein predominantly expressed in
skeletal and cardiac muscle and to a lesser extent in the brain
Pollock and Treisman (1991).
Ref: Based on a consensus sequence detailed by Dodou and
coworkers (1995) and Wasserman and Fickett (1998).
TFBS name: SRF
IUPAC code: CCWWWWWWGG
Bound by: Serum response factor.
Function: A MADS family protein that activates muscle
gene expression via the CArG motif (Vandromme and coworkers 1992).
"This gene encodes a ubiquitous nuclear protein that stimulates both cell
proliferation and differentiation. It is a member of the MADS (MCM1, Agamous,
Deficiens, and SRF) box superfamily of transcription factors. This
protein binds to the serum response element (SRE) in the promoter region
of target genes. This protein regulates the activity of many immediate-early
genes, for example c-fos, and thereby participates in cell cycle regulation,
apoptosis, cell growth, and cell differentiation. This gene is the downstream
target of many pathways, for example the mitogen-activated protein kinase pathway
(MAPK) that acts through the ternary complex factors (TCFs)." (Entrez Gene ID:
6722).
Ref: Based on a consensus sequence detailed by Dodou and
coworkers (1995) and Wasserman and Fickett (1998).
TFBS name: EBOX (MyoD)
IUPAC code: CANCWG
Bound by: MyoD which belongs to the myogenin (Myf) subfamily of basic
helix-loop-helix transcription factors.
Function: "It is involved in muscle cell differentiation, and is
essential for repair of damaged tissue. It activates its own transcription
which may stabilize commitment to myogenesis." (Entrez Gene ID:
4654).
Ref: Based on a consensus sequence detailed by Wasserman and Fickett (1998).
TFBS name: TEF
IUPAC code: CATTCC
Bound by: Transcriptional enhancer factor-1 related factors (TEF-1).
Function: TEF transcription factors bind to muscle-specific CATT
regulatory elements (M-CAT sites) that are responsible for the activity of many
promoters in cardiac and skeletal muscle (Farrance and Ordahl, 1996). There is
evidence that TEF-1 binds cooperatively to repeated M-CAT motifs under
positional and spatial constraints (Jiang and coworkers, 2000).
Ref: Based on a consensus sequence detailed by Wasserman and Fickett (1998)
and Jiang and coworkers (2000).
Other TFBS of interest:
TFBS name: CRE
IUPAC code: TGACGTCA (full CRE consensus)
Bound by: The cAMP-response element binding protein (CREB).
Function: The CREB family of cAMP induced activators stimulate gene
expression after phosphorylation at a conserved serine (Mayr and Montminy, 2001).
The genome-wide locations of CRE motifs have been mapped, but only a small proportion
of CREB target genes are induced by cAMP in any cell type (Zhang and coworkers,
2005). Their work suggests additional CREB regulatory partners are required
for recruitment of the transcriptional apparatus to a promoter.
Ref: Based on a consensus sequence detailed by Zhang and coworkers (2005).
IUPAC code: TGACG (half CRE consensus)
Ref: Based on a consensus sequence detailed by Zhang and coworkers (2005).
TFBS name: FOXI1
IUPAC code: TRTTKRY
Bound by: Forkhead family transcription factor FOXI1.
Function: "The specific function of this gene has not yet been determined; however,
it is possible that this gene plays an important role in the development of the cochlea
and vestibulum, as well as embryogenesis. Mutations in this gene may be associated with
the common cavity phenotype. Two transcript variants encoding different isoforms have been
found for this gene." (Entrez Gene ID:
2299).
Ref: Based on a consensus sequence detailed by Blomqvist and coworkers
(2004) and Kurth and coworkers (2006).
TFBS name: GLI1
IUPAC code: GACCACCCA
Bound by: Kruppel-type zinc finger transcription factor GLI1.
Function: GLI1 mediates Hedgehog signalling including Sonic Hedgehog
(Yoon and coworkers 2002). Hedgehog signalling has been implicated with
the induction of haematopoiesis and vasculogenesis from the mesodermal
progenitor, the haemangioblast (Byrd and coworkers 2002; Baron 2003).
Ref: Based on a consensus sequence detailed by Kinzler and Vogelstein
(1990) and Yoon and coworkers (2002).
TFBS name: GLI1 - multiple sites
IUPAC code: GACCACCCA,CACCACCCA,GTCCACCCA,GAACACCCA,GACCCCCCA,
GACCTCCCA,GACCACCAA
Bound by: Zinc finger transcription factor GLI1.
Function: As above. Six additional sites differing by one nucloetide
from the published consensus (GACCACCCA).
Ref: Based on sequences detailed by Yoon and coworkers (2002).
TFBS name: p53
IUPAC code: RCNWGYNN*0-1*NNRCAWGY
Bound by: Nuclear protein p53.
Function: "Tumor protein p53, a nuclear protein, plays an essential role
in the regulation of cell cycle, specifically in the transition from G0 to G1.
It is found in very low levels in normal cells, however, in a variety of transformed
cell lines, it is expressed in high amounts, and believed to contribute to
transformation and malignancy. p53 is a DNA-binding protein containing DNA-binding,
oligomerization and transcription activation domains. It is postulated to bind
as a tetramer to a p53-binding site and activate expression of downstream genes
that inhibit growth and/or invasion, and thus function as a tumor suppressor."
(Entrez Gene ID:
7157).
Ref: Based on the whole genome search for p53 binding sites
by Wei and coworkers (2006).
TFBS name: RE1 (NRSE)
IUPAC code: NTYAGMRCCNNRGMSAG
Bound by: Kruppel-type zinc finger transcription factor REST.
Function: "The RE-1 silencing transcription factor gene encodes a
transcriptional repressor which represses neuronal genes in non-neuronal
tissues. It represses transcription by binding a DNA sequence element called
the neuron-restrictive silencer element. The protein is also found in
undifferentiated neuronal progenitor cells, and it is thought that this
repressor may act as a master negative regular of neurogenesis.
Alternatively spliced transcript variants have been described; however, their full
length nature has not been determined." (Entrez Gene ID:
5978).
Ref: Based on a consensus sequence detailed by Bruce and coworkers
(2004).
TFBS name: RE1 (NRSE) - multiple sites
IUPAC code: NTYAGMRCCNNRGMSAG,NNYAGMRCCNNRGMSAG,NTNAGMRCCNNRGMSAG,NTYNGMRCCNNRGMSAG,
NTYANMRCCNNRGMSAG,NTYAGNRCCNNRGMSAG,NTYAGMNCCNNRGMSAG,NTYAGMRNCNNRGMSAG,
NTYAGMRCNNNRGMSAG,NTYAGMRCCNNNGMSAG,NTYAGMRCCNNRNMSAG,NTYAGMRCCNNRGNSAG,
NTYAGMRCCNNRGMNAG,NTYAGMRCCNNRGMSNG,NTYAGMRCCNNRGMSAN
Bound by: Kruppel-type zinc finger transcription factor REST.
Function: As above. Additional sites differ by one nucleotide from
the published consensus (NTYAGMRCCNNRGMSAG).
Ref: Based on a consensus sequence detailed by Bruce and coworkers
(2004).
TFBS name: STAT5
IUPAC code: TTCYNRGAA
Bound by: Signal transducers and activators of transcription 5a and 5b.
Function: "In response to cytokines and growth factors, STAT family members are
phosphorylated by the receptor associated kinases, and then form homo- or heterodimers
that translocate to the cell nucleus where they act as transcription activators. This
protein is activated by, and mediates the responses of many cell ligands, such as IL2,
IL3, IL7 GM-CSF, erythropoietin, thrombopoietin, and different growth hormones.
Activation of this protein in myeloma and lymphoma associated with a TEL/JAK2 gene
fusion is independent of cell stimulus and has been shown to be essential for the
tumorigenesis. The mouse counterpart of this gene is found to induce the expression of
BCL2L1/BCL-X(L), which suggests the antiapoptotic function of this gene in cells."
(Entrez Gene ID: 6776).
Ref: Based on a consensus sequence detailed by Soldaini and coworkers (2000).
[TOP]
|