Run TFBSsearch



Example Files




TFBSsearch Instructions.

First screen: Masking your sequences?

TFBSsearch searches for transcription factor binding sites in areas of complete identity between two or more aligned sequences, or in single sequences. Highly conserved features, such as coding exons or repeats should be masked to reduce the number of false positive hits. A GFF file should be available for each sequence, containing features that need to be masked.

If you wish exons to be masked then you should selected the 'Mask features' box and select the number of sequences present in your multiple sequence alignment.

Second screen: Set options for TFBSsearch.

Input options:
Select strands to search.
Specifies which strand to search. Default is to search both strands.

Search using IUPAC strings.
A search may be defined as a comma delimited list of IUPAC patterns. This is useful for a quick one-off search.

NOTE there should be no spaces in the list. Only the following IUPAC characters are allowed:

           A        Adenine

           C        Cytosine

           G        Guanine

           T        Thymine

           M        AMino (A or C)

           R        PuRine (A or G)

           W        Weak (A or T)

           S        Strong (C or G)

           Y        PYrimidine (C or T)

           K        Keto (G or T)

           V        Not T (A or C or G)

           H        Not G (A or C or T)

           D        Not C (A or G or T)

           B        Not A (C or G or T)

           N        ANy (A or C or G or T)

Alternatively Searches may also be defined by a file containing a carriage return separated list of IUPAC strings. This option is useful if you periodically search for the same patterns.

NOTE the last IUPAC string should be on the last line!

Select NWM accession number(s).
Nucleotide position weight matrix (NWM) accession numbers occurring in the TRANSFAC database ( may be defined as a comma delimited list.

TRANSFAC accession numbers may also be entered as a carriage return seperated list file.

A list of TRANSFAC accession numbers and their name can be viewed via this [LINK].

Output options:

Select format of output.
The 'unaligned' output reports of the location of conserved sites will be in unaligned format (i.e. gaps '-' will be ignored). As the unaligned numbering will be different in each species, a reference sequence is used (see below).

Unaligned numbering is used as default. NOTE that SynPlot converts the unaligned numbering from a GFF file to plot features, therefore unaligned numbering should be selected if the GFF files are going to be used with SynPlot.

The 'aligned' output reports the feature in the global alignment position, i.e. gaps '-' are respected.

Selecting a reference sequence.
This option is used to specify which sequence will be used as the reference if unaligned numbering is in use. It must correspond to the name of the sequence as it appears in the FASTA sequence file.

Defaults to first sequence in the multiple fasta file.

Advanced output options:
Select a name for GFF 'source' column.
Corresponds to the source field of the GFF file.
Defaults to 'TFBSsearch'.

Select a name for GFF 'feature' column.
Corresponds to the feature field of the GFF file.
Defaults to 'CNS' (i.e. conserved non-coding sequence).

Select to ensure the same motif is found in all sequences of the alignment.
If this option is used ambiguous IUPAC codes (e.g. N= [ACTG] or S= [CG]) specified in an IUPAC string or pattern will have to match in the sequence alignment.

           For example:

           IUPAC string = NGGAW

           Alignment    = AGGAT Pattern found    AGGAA Pattern NOT found!
                          AGGAT                  AGGAT

Default is not set.

Select conserved range (deviation from exact alignment).
If this option is specified, TFBSsearch will allow the sites to occur upto x bases apart in the two sequences (i.e. they will not have to be exactly aligned).

Unless you are searching for a long and not very degenerate motif (i.e. one that will not occur often by chance), it does not make much sense to set x more than a few bases (or even use this option at all). However, if set at 1 or 2, it will allow small mis- alignments to be ignored.

Select sequences to exclude from input file, OR leave blank to select all.
The list (which must not include spaces) can contain one or more of the sequence names as they appear in the alignment file. These sequences will be ignored when looking for conserved motifs. For example, if you have a multiple aligned file of four species:

           > human .....

           > mouse .....

           > dog .....

           > rat .....

and use the option with 'mouse,dog' then TFBSsearch will only look for motifs that are conserved between human and rat. Note, however, that the gaps generated by the original 4-way alignment will be preserved and that this will likely give you a different output to a TFBSsearch search of a straight 2-way human-rat alignment.

Search using an IUPAC pattern.
This allows for a search for a pattern of IUPAC strings to be searched. The format of 'pattern' is a little complex. The * character is used as a delimiter between alternate comma-separated (no spaces) lists of IUPAC strings and ranges. For example:


This will search for an ETS or GATA site, then 8-12 bases, followed by a second ETS or GATA site, then either 8-12 or 18-22 bases, then an EBOX site.

Search using an NWM pattern.
This is similar to using an IUPAC pattern (above) in syntax, except that instead of the IUPAC strings, NWM accession numbers are used.

Select a threshold for a NWN search.
The threshold to use for a NWM search. Remember to supply the percent (%) sign e.g. 90%.

Valid HTML 4.01! Webmaster.