Bag-of-Entity metric
The ie-eval boe
command can be used to compute the bag-of-entity recognition and error rates, globally and for each semantic category.
Metric description
Recognition rate (Precision, Recall, F1)
The Bag-of-Entities (BoE) recognition rate checks whether predicted entities appear in the ground truth and if ground truth entities appear in the prediction, regardless of their position.
- The number of True Positives (TP) is the number of entities that appears both in the label and the prediction.
- The number of False Positives (FP) is the number of entities that appear in the prediction, but not in the label.
- The number of False Negatives (FN) is the number of entities that appear in the label, but not in the prediction.
From these counts, the Precision, Recall and F1-scores can be computed:
- The Precision (P) is the fraction of predicted entities that also appear in the ground truth. It is defined by \(\frac{TP}{TP + FP}\).
- The Recall (R) is the fraction of ground truth entities that are predicted by the automatic model. It is defined by \(\frac{TP}{TP + FN}\).
- The F1-score is the harmonic mean of the Precision and Recall. It is defined by \(\frac{2 \times P \times R}{P + R}\).
Error rate (bWER)
The Bag-of-Entity (BoE) error rate is derived from the bag of words WER (bWER) metric proposed by Vidal et al. in End-to-End page-Level assessment of handwritten text recognition. Entities are defined as a combination of a text and its semantic tag. For example:
- Label:
[("person", "Georges Washington"), ("date", "the last day of 1798"), ("date", "January 24th")]
- Prediction:
[("person", "Georges Woshington"), ("date", "the last day of 1798")
From ground truth and predicted entities, we count the number of errors and compute the error rate.
- The number of insertions & deletions (\(N_{ID}\)) is the absolute difference between the number of ground truth entities and predicted entities. In this case,
("date", "January 24th")
counts as a deletion, so \(N_{ID} = 1\). - The number of substitutions (\(N_S\)) is defined as \((N_{SID} - N_{ID}) / 2\), where \(N_{SID}\) is the total number of errors. In this case,
("person", "Georges Woshington")
counts as a substitution, so \(N_S = 1\). - The error rate (\(BoE_{WER}\)) is then defined as \((N_{ID} + N_S) / |G|\), where \(|G|\) is the number of ground truth words. In this example, \(BoE_{WER} = 2 / 3 = 0.67\).
Parameters
Here are the available parameters for this metric:
Parameter | Description | Type | Default |
---|---|---|---|
--label-dir |
Path to the directory containing BIO label files. | pathlib.Path |
|
--prediction-dir |
Path to the directory containing BIO prediction files. | pathlib.Path |
|
--by-category |
Whether to display the metric for each category. | bool |
False |
The parameters are also described when running ie-eval boe --help
.
Examples
Global evaluation
Use the following command to compute the overall BoE metrics:
ie-eval boe --label-dir Simara/labels/ \
--prediction-dir Simara/predictions/
It will output the results in Markdown format:
2024-01-24 12:20:26,973 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:20:27,104 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:20:27,187 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:---------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total | 23.23 | 77.06 | 77.34 | 77.20 | 4430 | 804 |
Evaluation for each category
Use the following command to compute the BoE metrics for each semantic category:
ie-eval boe --label-dir Simara/labels/ \
--prediction-dir Simara/predictions/ \
--by-category
It will output the results in Markdown format:
2024-01-24 12:20:48,096 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:20:48,232 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:20:48,315 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:--------------------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total | 23.23 | 77.06 | 77.34 | 77.20 | 4430 | 804 |
| cote_article | 2.81 | 97.21 | 97.78 | 97.49 | 676 | 676 |
| cote_serie | 2.81 | 97.64 | 97.78 | 97.71 | 676 | 676 |
| precisions_sur_cote | 11.85 | 88.28 | 88.15 | 88.21 | 675 | 675 |
| intitule | 56.09 | 43.91 | 43.91 | 43.91 | 804 | 804 |
| date | 5.73 | 94.65 | 94.27 | 94.46 | 751 | 751 |
| analyse_compl | 50.45 | 50.85 | 50.71 | 50.78 | 771 | 771 |
| classement | 25.97 | 74.03 | 74.03 | 74.03 | 77 | 77 |