First screen: Masking your sequences?
BlockCounter searches for blocks of complete identity between all
aligned sequences. Highly conserved features, such as coding exons (CDS)
or repeats should be masked to reduce the number of false positive hits.
A GFF file should be available for each sequence, containing
features that need to be masked.
If you wish exons to be masked then you should selected the 'Mask
features' box and select the number of sequences present in your
multiple sequence alignment.
Second screen: Set options for BlockCounter.
Sequence alignment file(s).
The aligned sequences must be input as a single FASTA formatted file. All the
sequences must be the same length. Gaps are represented by '-'.
The header, denoted by '>' must be present on its own line
(no spaces), e.g.
Please note that the last nucleotides should be on the
last line of the file.
The first sequence of the first alignment file is taken to be the
Reference sequence. Subsequent files should should contain the
reference sequence as the first sequence. Otherwise the file will
GFF feature format files.
For each sequence name extracted from the FASTA global alignment file,
BlockCounter will look for a file NAME.gff,
(where NAME is the name found in the header).
The GFF format is explained at the Sanger Institute.
We recommend two methods for creating GFF files for your sequences
GFF file names should only consist of numbers, letters and the
underscore character! It should also end with the suffix '.gff'.
Select GFF features to mask.
The features should be entered as a comma-delimited list, exactly as
they appear in the gff files.
Set minimum and maximum block sizes.
Specify the range of block sizes to be displayed.
Report the start positions of the blocks on the unaligned reference
sequence and produce a plot.
For each block length within defined range, will report the start
position in the unaligned reference sequence. The reference sequence is
the first sequence of the alignment file. The graphical output of the
block size distribution is not available when this option is selected.
Name of reference GFF file.
This is the GFF file which relates to the annotation of the reference
sequence. Normally this will be the same as the GFF file used for masking
the reference sequence. However, this may be a different GFF file.
PLEASE NOTE: The annotation of the reference sequence is drawn one
feature at a time start from the top of the file. Please make sure
that exon is drawn before CDS and CNS is drawn after exon and CDS.
The GFF reference file name should only consist of numbers, letters and the
underscore character. It should also end with the suffix '.gff'.
Report the number of blocks larger than the maximum range.
Will "cap" the output table at the maximum value. Therefore, if set
with the maximum block size of 20, the program will count all
blocks of 20 or more and report them in the table as >=20. Otherwise,
it will just count the number of blocks of exactly 20 bases
and ignore all those greater in length.
Output picture options (if selected):
Select format of data in graph.
The individual data points can be displayed a histogram or as a dot plot.
Choose a title for the graph.
A title may be added to the final graph if required.
Select format of the output file.
The graph may be produced in three different formats. These included
PDF and Postscript.