What is TFBSsearch?
TFBSsearch searches for potential transcription factor binding sites which are preserved in a multiple alignment (or simply present if a single sequence is supplied). Binding sites can be described as IUPAC strings, or as nucleotide weight matrices (NWM).
The output is as a GFF file (see www.sanger.ac.uk/Software/formats/GFF/). If there is reduncancy in the IUPAC string or the threshold used in the NWM search is less than 1, the exact motif need not be exactly the same in the different sequences for TFBSsearch to report a find. For example, in the following short aligned sequences:
Human: TAC--ACAGGAAGTC Mouse: T-CTTACAGGATGCCa search for GGAA will report no find, but a search for GGAW will report a find.
The position reported will depend on the settings. If the numbering system chosen is aligned, the match above would be reported as occuring at position 9 (i.e. gaps are counted). If the numbering system chosen is unaligned and the reference sequence chosen is human, the match would be reported at position 7. Finally, if the numbering system chosen is unaligned and the reference sequence chosen is mouse, the match would occur at 8. Note, if you want to plot your features in SynPlot, you should chose unaligned numbering. Other options are described below.
TFBSsearch takes as its input aligned sequences in fasta format, with gaps in the sequences introduced by the alignment represented by "-" characters. The sequences must therefore be of the same length.
TFBSsearch implements a Perl script (of the same name) written by Dr. Mike Chapman. Searches using NWM implement Perl modules written by Boris Lenhard, which accesses TRANSFAC NWMs via the Transcription Element Search Software (TESS) database.