Skip to content

Bag-of-Entity metric

The ie-eval boe command can be used to compute the bag-of-entity recognition and error rates, globally and for each semantic category.

Metric description

Recognition rate (Precision, Recall, F1)

The Bag-of-Entities (BoE) recognition rate checks whether predicted entities appear in the ground truth and if ground truth entities appear in the prediction, regardless of their position.

  • The number of True Positives (TP) is the number of entities that appears both in the label and the prediction.
  • The number of False Positives (FP) is the number of entities that appear in the prediction, but not in the label.
  • The number of False Negatives (FN) is the number of entities that appear in the label, but not in the prediction.

From these counts, the Precision, Recall and F1-scores can be computed:

  • The Precision (P) is the fraction of predicted entities that also appear in the ground truth. It is defined by \(\frac{TP}{TP + FP}\).
  • The Recall (R) is the fraction of ground truth entities that are predicted by the automatic model. It is defined by \(\frac{TP}{TP + FN}\).
  • The F1-score is the harmonic mean of the Precision and Recall. It is defined by \(\frac{2 \times P \times R}{P + R}\).

Error rate (bWER)

The Bag-of-Entity (BoE) error rate is derived from the bag of words WER (bWER) metric proposed by Vidal et al. in End-to-End page-Level assessment of handwritten text recognition. Entities are defined as a combination of a text and its semantic tag. For example:

  • Label: [("person", "Georges Washington"), ("date", "the last day of 1798"), ("date", "January 24th")]
  • Prediction: [("person", "Georges Woshington"), ("date", "the last day of 1798")

From ground truth and predicted entities, we count the number of errors and compute the error rate.

  • The number of insertions & deletions (\(N_{ID}\)) is the absolute difference between the number of ground truth entities and predicted entities. In this case, ("date", "January 24th") counts as a deletion, so \(N_{ID} = 1\).
  • The number of substitutions (\(N_S\)) is defined as \((N_{SID} - N_{ID}) / 2\), where \(N_{SID}\) is the total number of errors. In this case, ("person", "Georges Woshington") counts as a substitution, so \(N_S = 1\).
  • The error rate (\(BoE_{WER}\)) is then defined as \((N_{ID} + N_S) / |G|\), where \(|G|\) is the number of ground truth words. In this example, \(BoE_{WER} = 2 / 3 = 0.67\).

Parameters

Here are the available parameters for this metric:

Parameter Description Type Default
--label-dir Path to the directory containing BIO label files. pathlib.Path
--prediction-dir Path to the directory containing BIO prediction files. pathlib.Path
--by-category Whether to display the metric for each category. bool False

The parameters are also described when running ie-eval boe --help.

Examples

Global evaluation

Use the following command to compute the overall BoE metrics:

ie-eval boe --label-dir Simara/labels/ \
            --prediction-dir Simara/predictions/

It will output the results in Markdown format:

2024-01-24 12:20:26,973 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:20:27,104 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:20:27,187 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:---------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total    |  23.23   |     77.06     |   77.34    | 77.20  |   4430  |     804     |

Evaluation for each category

Use the following command to compute the BoE metrics for each semantic category:

ie-eval boe --label-dir Simara/labels/ \
            --prediction-dir Simara/predictions/ \
            --by-category

It will output the results in Markdown format:

2024-01-24 12:20:48,096 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:20:48,232 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:20:48,315 INFO/bio_parser.utils: The dataset is complete and valid.
| Category            | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:--------------------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total               |  23.23   |     77.06     |   77.34    | 77.20  |   4430  |     804     |
| cote_article        |   2.81   |     97.21     |   97.78    | 97.49  |   676   |     676     |
| cote_serie          |   2.81   |     97.64     |   97.78    | 97.71  |   676   |     676     |
| precisions_sur_cote |  11.85   |     88.28     |   88.15    | 88.21  |   675   |     675     |
| intitule            |  56.09   |     43.91     |   43.91    | 43.91  |   804   |     804     |
| date                |   5.73   |     94.65     |   94.27    | 94.46  |   751   |     751     |
| analyse_compl       |  50.45   |     50.85     |   50.71    | 50.78  |   771   |     771     |
| classement          |  25.97   |     74.03     |   74.03    | 74.03  |    77   |      77     |