Nerval metrics

The ie-eval nerval command can be used to compute the reading-order-independent version of the soft-aligned entity Precision, Recall and F1-score (Nerval), globally.

Metric description

The Nerval method to compute Precision, Recall and F1 scores between two sequences of named entities relies on allowing a certain percentage of transcription error when aligning the sequences of named entities.

True Positives (TP) are hypothesized named entities that have been correctly tagged and transcribed when compared to their ground truth counterpart.
False Positives (FP) are hypothesized named entities that do not find an acceptable match in the ground truth.
False Negatives (FN) are ground truth named entities for which no acceptable match was found in the hypothesis. In the reading-order-independent version implemented in this toolkit, we reduce the problem to an assignment problem. For a given error threshold \(M\) (by default 0), the computation of the assignment cost \(\delta(x_j, y_k)\) between a Ground Truth entity \(x_j\) and a hypothesized entity \(y_k\), is as follows:

\[ \delta_{\mathrm{NERVAL-M}}(x_j, y_k) = \begin{cases} 1 \, & \text{if} \, x_j = \lambda \vee y_k = \lambda \\ 2 \, & \text{if} \, c(x_j) \neq c(y_k) \\ \mathrm{otherwise} & \begin{cases} 2 \, & \text{if} \, \mathrm{CER}(t(x_j), t(y_k)) > \mathrm{M} \\ 0 \, & \text{if} \, \mathrm{CER}(t(x_j), t(y_k)) \le \mathrm{M} \\ \end{cases} \end{cases} \]

The assignment cost \(\delta(x_j, y_k)\) determines the nature of the match.

If the cost is 0, the named entity must be considered a TP.
Suppose the cost is 2 due to category mismatch or excessive character error. In that case, the hypothesized named entity \(y_k\) must be considered a FP, and the ground truth named entity \(x_j\) as FN.
If the cost is 1 due to \(y_k\) being a dummy symbol \(\lambda\), then \(x_j\) is a FN.
Finally, if the cost is 1 due to \(x_j\) being a dummy symbol \(\lambda\), then \(y_k\) must be considered a FP.

This principle can be extended to the whole corpora where, for each document, the number of TP, FP, and FN are computed between the hypothesized sequence of named entities and the ground truth. Calculating the micro Precision, Recall, and F1 scores can be done traditionally:

\[ P = \frac{TP}{TP + FP} \\ R = \frac{TP}{TP + FN} \\ F_1 = 2 \cdot \frac{P \cdot R}{P + R} \]

Parameters

Here are the available parameters for these metrics:

Parameter	Description	Type	Default
`--label-dir`	Path to the directory containing BIO label files.	`pathlib.Path`
`--prediction-dir`	Path to the directory containing BIO prediction files.	`pathlib.Path`
`--nerval-threshold`	Percentage of transcription error to allow [0, 100]	`float`	0.0

The parameters are also described when running ie-eval nerval--help.

Examples

Global evaluation

Use the following command to compute the overall Nerval Precision, Recall and F1 scores:

ie-eval nerval \
    --label-dir tests/data/labels/ \
    --prediction-dir tests/data/predictions/

It will output the results in Markdown format:

2024-01-24 12:20:26,973 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:20:27,104 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:20:27,187 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | Precision (%) | Recall (%) | F1 (%) | N entities | N documents |
|:---------|:-------------:|:----------:|:------:|-----------:|------------:|
| total    |     82.14     |   82.14    | 82.14  |         28 |           5 |