TFBScluster - Genome-wide.

Run TFBScluster

Background

Instructions

Example Files

References

Links

Homepage

TFBScluster Instructions.

First screen: Selecting the number of TFBS to analyse.

Option 1: You will need to decide the number of DIFFERENT transcription factor binding site (TFBS) patterns you want to be present in cluster produced by the program. For example:

ETS-GATA-EBOX equals 3
ETS-ETS-GATA equals 2
GATA-GATA-GATA equals 1

Option 2: We have incorporated libraries of TFBS positions from three different sources. You can select select which set of TFBSs to use in your analysis. But only one set per analysis.

Second screen: Set options for TFBScluster.

Select TFBS parameters.
In this section you need to:

  1. Select TFBS from list.
    These must all be selected!
    Only unique TFBSs should be selected. Unique TFBSs are, for example, ETS, GATA and EBF. ETS and NNETSNN are the same TFBS. Choosing the same TFBS merely creates duplicates that are removed in the analysis.
  2. Specify the degree of conservation beyond the 'core' motif.
    If you want to start with a larger number of less specific TFBSs in the analysis choose the 'core' TFBS pattern. For increased sensitivity, choose those TFBSs with extended conservation. 'Non-exact' patterns allow degenerate IUPAC code positions, for example W (A or T), to be different in the aligned sequences. 'Exact' patterns must be the same in all aligned sequences.
  3. Specify the minimum number of occurences for this motif.
    This allows you to specify the minimum number of your chosen TFBSs in the final clusters.

Only consider TFBSs that are ALSO conserved in the following genome alignments.
TFBScluster, by default, will form cluster from all TFBSs conserved between mouse and human genomes. However, you can also select to only use those TFBS that are also conserved between mouse and dog, or mouse and opossum.

Select whether to include or exclude clusters containing overlapping TFBSs.
If you wanted to find clusters containing at least two "typeA" and one "typeB" TFBS, by choosing 'exclude' TFBScluster will not report clusters where one of the "typeA" sites was overlapped by a "typeB" site. We have included this option to make the minimum number of TFBS in a cluster represent 'free' sites that may be bound by their corresponding transcription factors at the same time.

Specify a minimum cluster size.
This is the minimum range, in nucleotides, between the start of the most 5' TFBS and the end of the most 3' TFBS. When selecting this value conider the loosest arrangement of your individual TFBSs, for example the centre of each TFBS separated from the next by 1 helical turn (10.5 nucleotides).

At present the maximum initial cluster size is 220bp. This is a theoretical maximum for short range looping in DNA, representing the distance between nucleosomal linkers (Ringrose and coworkers, 1999, EMBO 18: 6630-6641).

Run the short or long analysis?

  • Short analysis.
    Works out all possible clusters of the specified size, containing the minimum number of TFBSs. Overlapping clusters are merged and the final list (in GFF format) is reported back via a link, sent via email.
  • Long analysis.
    The short analysis is completed then genes are localised to each of the clusters. A gene is localised to a cluster if a cluster is present in an intron or where clusters are within an exon or overlapping an exon are not reported. Otherwise the closest 5' and 3' genes to the cluster (within 100kb) are reported. As this analysis can take a long time, the links to the results are returned by email. A set of output files have been annotated to describe the contents. These examples come from the human version of TFBScluster, but are for all intents and purposes the same.

OPTIONAL CLUSTER CONSTRAINTS.
A) Choose to only search for clusters on a single chromosome or 'all' chromosomes.
B) Choose to retain or reject clusters containing user specified TFBS represented by IUPAC consensus sequences or patterns of IUPAC consensus sequences.

IUPAC consensus sequence, for example, GGAW = GGA[A or T].

Patterns or consensus sequences, for example, GGAW*1-10*GGAW*1-10*GATA = GGA[A/T] 1 to 10 spaces (nucleotides/gaps) GGA[A/T] 1 to 10 spaces GATA.

Further information can be found in the instruction page of TFBSsearch.

Filter candidate genes to only show those expressed in a given tissue.
Select a tissue of interest and fold over median expression using the pop-up menus.

Please choose how you would like to get your results (select 1 method).
We recommend you provide an email address as you will receive your results automatically. Email addresses will be used for notifying users of completed analyses and updates to the tool etc. However, if you require anonymity you can opt to retrieve your own results. This method was only really intended for the review procedure.

Valid HTML 4.01! Webmaster.