Bag-of-Word metric

The ie-eval bow command can be used to compute the bag-of-word recognition and error rates, globally and for each semantic category.

Metric description

Recognition rate (Precision, Recall, F1)

The Bag-of-Word (BoW) recognition rate checks whether predicted words appear in the ground truth and if ground truth words appear in the prediction, regardless of their position. Note that words tagged as other (O in the IOB2 notation) are ignored.

The number of True Positives (TP) is the number of words that appears both in the label and the prediction.
The number of False Positives (FP) is the number of words that appear in the prediction, but not in the label.
The number of False Negatives (FN) is the number of words that appear in the label, but not in the prediction.

From these counts, the Precision, Recall and F1-scores can be computed:

The Precision (P) is the fraction of predicted words that also appear in the ground truth. It is defined by \(\frac{TP}{TP + FP}\).
The Recall (R) is the fraction of ground truth words that are predicted by the automatic model. It is defined by \(\frac{TP}{TP + FN}\).
The F1-score is the harmonic mean of the Precision and Recall. It is defined by \(\frac{2 \times P \times R}{P + R}\).

Error rate (bWER)

Additionally, an error rate is computed, following the bag of words WER (bWER) metric proposed by Vidal et al. in End-to-End page-Level assessment of handwritten text recognition. From ground truth and predicted words, we count the number of errors and compute the error rate.

The number of insertions & deletions (\(N_{ID}\)) is the absolute difference between the number of ground truth words and predicted words.
The number of substitutions (\(N_S\)) is defined as \((N_{SID} - N_{ID}) / 2\), where \(N_{SID}\) is the total number of errors.
The error rate (\(BoTW\)) is then defined as \((N_{ID} + N_S) / |G|\), where \(|G|\) is the number of ground truth words.

Parameters

Here are the available parameters for this metric:

Parameter	Description	Type	Default
`--label-dir`	Path to the directory containing BIO label files.	`pathlib.Path`
`--prediction-dir`	Path to the directory containing BIO prediction files.	`pathlib.Path`
`--by-category`	Whether to display the metric for each category.	`bool`	`False`

The parameters are also described when running ie-eval bow --help.

Examples

Global evaluation

Use the following command to compute the overall BoW metrics:

ie-eval bow --label-dir Simara/labels/ \
            --prediction-dir Simara/predictions/

It will output the results in Markdown format:

2024-01-24 12:13:10,379 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:13:10,513 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:13:10,598 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:---------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total    |  16.78   |     85.45     |   84.32    | 84.88  |  17894  |     804     |

Evaluation for each category

Use the following command to compute the BoW metrics for each semantic category:

ie-eval bow --label-dir Simara/labels/ \
            --prediction-dir Simara/predictions/ \
            --by-category

It will output the results in Markdown format:

2024-01-24 12:12:48,179 INFO/bio_parser.utils: Loading labels...
2024-01-24 12:12:48,405 INFO/bio_parser.utils: Loading prediction...
2024-01-24 12:12:48,590 INFO/bio_parser.utils: The dataset is complete and valid.
| Category            | bWER (%) | Precision (%) | Recall (%) | F1 (%) | N words | N documents |
|:--------------------|:--------:|:-------------:|:----------:|:------:|:-------:|:-----------:|
| total               |  16.78   |     85.45     |   84.32    | 84.88  |  17894  |     804     |
| precisions_sur_cote |  14.39   |     90.48     |   87.70    | 89.07  |   813   |     675     |
| intitule            |  20.73   |     82.18     |   81.15    | 81.66  |   8173  |     804     |
| cote_serie          |   3.25   |     97.21     |   97.78    | 97.49  |   676   |     676     |
| cote_article        |   4.28   |     95.94     |   97.64    | 96.78  |   678   |     676     |
| analyse_compl       |  22.92   |     81.50     |   78.97    | 80.22  |   5602  |     771     |
| date                |   2.67   |     97.61     |   97.44    | 97.52  |   1799  |     751     |
| classement          |  13.73   |     86.36     |   86.93    | 86.64  |   153   |      77     |