This program takes as input a set of scored results associated with validation labels (pos for positive, neg for negative) and computes, for each score value, the derived statistics (Sn, PPV, FPR), which can be used to draw a ROC curve.
The input file is a tab-delimited text file including at least two columns with the following information where each row represents one prediction :
More status can be manually specified with the fields Positive labels (default: pos) and Negative labels (default: neg) in the form (one status per line). For example, it your input file contains annotations of 'site' and 'non-site', you can use it directly as input with these options.
It can be useful to rename these labels, for compatibility with other programs.
For example, roc-stats is typically used as post-analysis program after Network comparison. This program has 3 labels for the arcs (see also DEMO of the roc-stats form ):
Arcs found in both the reference (R) and query (Q) graphs. This label is considered as positive.
Arcs found in the reference (R) but not in the query (Q) graph. These arcs are considered as positive (since they are in the reference graph).
Arcs found in the query (Q) but not in the reference (R) graph. These arcs are considered as negative (since they are not found in the reference graph).
The TP, FP, TN, FN labels are frequently used to evaluate prediction results. In roc-stats, these status are not really appropriate, since the TRUE or FALSE quality depends on the score threshold. Thus, these will be converted into pos/neg labels, according to the nature of the considered element.
True positive.
False positive (a FP is actually a negative, incorrectly predicted as positive).
False negative (a FN is actually a positive, incorrectly predicted as negative).
True negative.
Total number of elements in the universe (neg + pos). This option allows to manually specify the total number of elements, in case the input file would not contain the complete data set.
A typical example is when roc-stats is used to analyze the output of Network comparison: the graph comparison returns the intersection and the differences between reference and predictions, but does not return the arcs which are in neither graphs. However, those constitute the true negative, and can represent an important fraction of the elements.
When the total number of elements is specified manually, the number of negative elements is corrected accordingly.
neg = total - pos
The program returns a table with one row per score value, and one column per statistics. The column content is commented in the header of the output file.
When the option graphs is checked, the program returns a series of graphs.
The program calculates the number of true positives (TP) and false positives (FP) for each score provided in the input. The inverse cumulative distributions are then computed, in order to indicate, for each possible score, the number of TP above the score (TP_icum), or the number of FP above the score (FP_icum).
Score (X)
Inverse cumulated occurrences i.e. number of observations with score >= X
Inverse cumulated frequencies, i.e. fraction of observations with score >= X
True Positive, inverse cumulative (number of TP observations with score >= X)
False Positive, inverse cumulative (number of FP observations with score >= X)
False Negative, inverse cumulative (number of FN observations with score >= X)
True Negative, inverse cumulative (number of TN observations with score >= X)
True Positive (number of TP observations with score = X)
False Positive (number of FP observations with score = X)
False Negative (number of FN observations with score = X)
True Negative (number of TN observations with score = X)
Sensitivity (also called TPR, or Recall). Sn = TP_icum/(Total positives) =TP_icum/(TP_icum + FN_icum)
Positive Predictive Value (also called Precision). PPV = TP_icum/(TP_icum + FP_icum)
False Positive Rate. FPR = FP_icum/(Total negatives) = FP_icum/(FP_icum + TN_icum)
Accuracy (geometric mean). Acc_g = sqrt(Sn*PPV)
Accuracy (arithmetic mean). Acc_a = (Sn + PPV)/2
Distribution of various statistics (Sn, PPV, Acc, FPR, ...) as a function of the score.
Distribution of various statistics (Sn, PPV, Acc, FPR, ...) as a function of the score, with a logarithmic scale on the X axis. This is convenient to highlight the differences between small scores.
True versus False Positives curve.
Receiver Operating Characteristic (ROC) curve. TPR vs FPR.
Precision-recall curve. PPV vs Sn.
There are plenty of references about ROC curves. I found this article quite clear.
Jesse Davis and Mark Goadrich (2006). The Relationship Between Precision-Recall and ROC Curves. In the Proceedings of the 23rd International Conference on Machine Learning (ICML). http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf