BlockCounter.

Run BlockCounter

Background

Instructions

Example Files

References

Links

Homepage

BlockCounter Instructions.

First screen: Masking your sequences?
BlockCounter searches for blocks of complete identity between all aligned sequences. Highly conserved features, such as coding exons (CDS) or repeats should be masked to reduce the number of false positive hits. A GFF file should be available for each sequence, containing features that need to be masked.

If you wish exons to be masked then you should selected the 'Mask features' box and select the number of sequences present in your multiple sequence alignment.

Second screen: Set options for BlockCounter.

Input files:
Sequence alignment file(s).
The aligned sequences must be input as a single FASTA formatted file. All the sequences must be the same length. Gaps are represented by '-'. The header, denoted by '>' must be present on its own line (no spaces), e.g.

      >header1
      ACTGAGG...etc
 
      >header2
      ACTGAGG...etc

      >header3
      ACTGA--...etc
      
Please note that the last nucleotides should be on the last line of the file.

The first sequence of the first alignment file is taken to be the Reference sequence. Subsequent files should should contain the reference sequence as the first sequence. Otherwise the file will be skipped.

GFF feature format files.
For each sequence name extracted from the FASTA global alignment file, BlockCounter will look for a file NAME.gff, (where NAME is the name found in the header).

The GFF format is explained at the Sanger Institute. We recommend two methods for creating GFF files for your sequences [LINK].

GFF file names should only consist of numbers, letters and the underscore character! It should also end with the suffix '.gff'.

Select GFF features to mask.
The features should be entered as a comma-delimited list, exactly as they appear in the gff files.

Set minimum and maximum block sizes.
Specify the range of block sizes to be displayed.

Report the start positions of the blocks on the unaligned reference sequence and produce a plot.
For each block length within defined range, will report the start position in the unaligned reference sequence. The reference sequence is the first sequence of the alignment file. The graphical output of the block size distribution is not available when this option is selected.

Name of reference GFF file.
This is the GFF file which relates to the annotation of the reference sequence. Normally this will be the same as the GFF file used for masking the reference sequence. However, this may be a different GFF file. PLEASE NOTE: The annotation of the reference sequence is drawn one feature at a time start from the top of the file. Please make sure that exon is drawn before CDS and CNS is drawn after exon and CDS.

The GFF reference file name should only consist of numbers, letters and the underscore character. It should also end with the suffix '.gff'.

Report the number of blocks larger than the maximum range.
Will "cap" the output table at the maximum value. Therefore, if set with the maximum block size of 20, the program will count all blocks of 20 or more and report them in the table as >=20. Otherwise, it will just count the number of blocks of exactly 20 bases and ignore all those greater in length.

Output picture options (if selected):
Select format of data in graph.
The individual data points can be displayed a histogram or as a dot plot.

Choose a title for the graph.
A title may be added to the final graph if required.

Select format of the output file.
The graph may be produced in three different formats. These included PDF and Postscript.

Valid HTML 4.01! Webmaster.