Compare two or more sets of features.

This program takes as input several feature files (two or more), and calculates the intersection, union and difference between features. It also computes contingency tables and comparison statistics.


Jean Valery Turatsinze <>
Jacques van Helden <Jacques.van-Helden\>




compare-features -i inputfile_1 -i inputfile_2 [-i inputfile_3 ... ...] 
[-o outputfile] [-v]


The default input format is .ft (the same as for feature-map). Other formats are also supported ($supported_input_formats).

Feature format

Each feature is represented by a single line, which should provide the following information:

Input file columns:

  1. map label (eg gene name)
  2. feature type
  3. feature identifier (ex: GATAbox, Abf1_site)
  4. strand (D for Direct, R for Reverse),
  5. feature start position
  6. feature end position
  7. (optional) description
  8. (optional) score

    The standard input format assumes that these topics are provided in this order, separated by tabs. Start and end positions can be positive or negative.



Intersections between features (pairwise comparisons). For each intersection between two features, a feature of type "inter" is created.

The ID of an "inter" feature indicates the files to which the intersecting features belong. For example "f1.and.f3" means that the intersection feture was obtained from a feature of the first input file and a feature of the second input file.


Pairwise differences between files. For each pair of file, a feature of type "diff" is created.

The ID of the "diff" feature indicates the number of the files containing and not containing the feature, respezctively. For example, the ID "f1.not.f3" indictaes a feature found in file 1 and without any intersection with features oof file 3.


Calculate statistics about the intersections between features of each pair of input file.


The output depends on the return type(s), which can be specified with the option -return.


The intersection and differences are reported as features. Different output formats can be specified with the option -oformat (supported: $supported_output_formats).


Matching statistics are exported as tab-delimited tables. Each row is starting with a comment character ';', so that the statistics are ignored when the program is used as input by feature-map.

These comment characters can easily be removed if the result has to be used by other programs. Try for example:

perl -pe 's/^;//' outfile


-v #

Level of verbosity (detail in the warning messages during execution)


Display full help message


display options

-i inputfile

This option can be used iteratively to specify several input files. It must be used at least 2 times, since the comparison requires at least two feature files.

-files inputfile_1 inputfile_2 ...

Specify multiple input files. All the arguments following the option -files are considered as input files.

-ref reference_file

Specify a reference file. Only one reference file can be specified.

All the other input files (specified with -i or -files) are then compared to the reference file. When the option '-return stats' is combined with a reference fiile, some additional statistics are calculated (PPV, sensitivity, accuracy).

-o outputfile

If no output file is specified, the standard output is used. This allows to use the command within a pipe.

-oft outputfeaturefile

In addition to the output, export a feature file containing the type of the feature, and chromosomal location of each features. This option is compatible with -return inter.

-iformat input_format

Input feature format (Supported: $supported_input_formats)

-oformat output_format

Output feature format (Supported: $supported_output_formats)


Also perform comparison between features in the same file (self-comparison). This can be useful to detect redundancy between annotated features.

-return output1[,output2,...]

Specify the output type(s).

Supported output types: stats,inter,diff

-lth parameter value

Specify the value of the lower threshold on some parameter.


-lth inter_len 3
-lth inter_cov 0.8

Supported parameters :


Length (in residues) of the intersection between two features.


Coverage of the intersection between two features. The coverage (inter_cov) is defined as

inter_cov = inter_len / inter_pair

where inter_len is the length of the intersection, pair_len is the total length covered by the pair of intersecting features.