Skip to content

Scorer

ie_eval.scorer

Scorer.

Attributes

M module-attribute

M = Munkres()

Classes

OiEcerEwer dataclass

OiEcerEwer(
    labels: list[tuple[str, str]],
    predictions: list[tuple[str, str]],
    compute_ecer: bool,
    costs: list[list[float]] = list(),
    errors: float = 0.0,
)

Base class for order independent ECER / EWER computation.

Attributes
labels instance-attribute
labels: list[tuple[str, str]]
predictions instance-attribute
predictions: list[tuple[str, str]]
compute_ecer instance-attribute
compute_ecer: bool
costs class-attribute instance-attribute
costs: list[list[float]] = field(default_factory=list)
errors class-attribute instance-attribute
errors: float = 0.0
num_ne_gt property
num_ne_gt: int

Compute number of NEs in the label.

num_ne_hyp property
num_ne_hyp: int

Compute number of NEs in the prediction.

OiNerval dataclass

OiNerval(
    labels: list[tuple[str, str]],
    predictions: list[tuple[str, str]],
    nerval_threshold: float = 0.0,
    costs: list[list[float]] = list(),
    true_positives: int = 0,
    false_positives: int = 0,
    false_negatives: int = 0,
)

Base class for order independent Nerval computation of Precision, Recall and F1 scores.

Attributes
labels instance-attribute
labels: list[tuple[str, str]]
predictions instance-attribute
predictions: list[tuple[str, str]]
nerval_threshold class-attribute instance-attribute
nerval_threshold: float = 0.0
costs class-attribute instance-attribute
costs: list[list[float]] = field(default_factory=list)
true_positives class-attribute instance-attribute
true_positives: int = 0
false_positives class-attribute instance-attribute
false_positives: int = 0
false_negatives class-attribute instance-attribute
false_negatives: int = 0
num_ne_gt property
num_ne_gt: int

Returns the number of NEs in the label.

num_ne_hyp property
num_ne_hyp: int

Returns the number of NEs in the prediction.

BagOfWords

Bases: NamedTuple

Base class for bag-of-word metrics. Extension of bWER defined in End-to-End Page-Level Assessment of Handwritten Text Recognition (https://arxiv.org/pdf/2301.05935.pdf).

Attributes
labels instance-attribute
labels: list[str | tuple[str, str]]
predictions instance-attribute
predictions: list[str | tuple[str, str]]
label_counter property
label_counter: Counter[str]

Split the label into a list of words.

prediction_counter property
prediction_counter: Counter[str]

Split the prediction into a list of words.

true_positives property
true_positives: int

Count true positive words.

false_positives property
false_positives: int

Count false positive words.

false_negatives property
false_negatives: int

Count false negatives words.

all_words property
all_words: list[str | tuple[str, str]]

All tagged words.

label_word_vector property
label_word_vector: array

Iterate over the set of tagged words and count occurrences in the label.

prediction_word_vector property
prediction_word_vector: array

Iterate over the set of words and count occurrences in the prediction.

insertions_deletions property
insertions_deletions: int

Count unavoidable insertions and deletions. See Equation 8 from https://arxiv.org/pdf/2301.05935.pdf.

substitutions property
substitutions: int

Count substitutions. See Equation 8 from https://arxiv.org/pdf/2301.05935.pdf.

errors property
errors: int

Count total number of errors.

MicroAverageErrorRate

MicroAverageErrorRate()

Compute total error rates.

Initialize errors and counts.

Examples:

>>> score = MicroAverageErrorRate()
Source code in ie_eval/scorer.py
339
340
341
342
343
344
345
346
347
def __init__(self) -> None:
    """Initialize errors and counts.

    Examples:
        >>> score = MicroAverageErrorRate()
    """
    self.label_word_count = defaultdict(int)
    self.error_count = defaultdict(int)
    self.count = defaultdict(int)
Attributes
label_word_count instance-attribute
label_word_count = defaultdict(int)
error_count instance-attribute
error_count = defaultdict(int)
count instance-attribute
count = defaultdict(int)
error_rate property
error_rate: dict[str, float]

Error rate for each key.

categories property
categories: list[str]

Get all categories in the label.

Functions
update
update(key: str, score: BagOfWords | OiEcerEwer) -> None

Update the score with the current evaluation for a given key.

Parameters:

Name Type Description Default
key str

Category to update.

required
score TextEval

Current score.

required

Examples:

>>> score.update("total", [("person", "Georges"), ("person", "Washington")])
Source code in ie_eval/scorer.py
349
350
351
352
353
354
355
356
357
358
359
360
361
def update(self, key: str, score: BagOfWords | OiEcerEwer) -> None:
    """Update the score with the current evaluation for a given key.

    Args:
        key (str): Category to update.
        score (TextEval): Current score.

    Examples:
        >>> score.update("total", [("person", "Georges"), ("person", "Washington")])
    """
    self.label_word_count[key] += len(score.labels)
    self.count[key] += 1
    self.error_count[key] += score.errors

MicroAverageFScore

MicroAverageFScore()

Compute total precision, recall, and f1 scores.

Initialize error counts.

Examples:

>>> score = MicroAverageFScore()
Source code in ie_eval/scorer.py
380
381
382
383
384
385
386
387
388
389
390
def __init__(self) -> None:
    """Initialize error counts.

    Examples:
        >>> score = MicroAverageFScore()
    """
    self.label_word_count = defaultdict(int)
    self.count = defaultdict(int)
    self.true_positives = defaultdict(int)
    self.false_positives = defaultdict(int)
    self.false_negatives = defaultdict(int)
Attributes
label_word_count instance-attribute
label_word_count = defaultdict(int)
count instance-attribute
count = defaultdict(int)
true_positives instance-attribute
true_positives = defaultdict(int)
false_positives instance-attribute
false_positives = defaultdict(int)
false_negatives instance-attribute
false_negatives = defaultdict(int)
recall property
recall: dict[str, float]

Recall score for each key.

precision property
precision: dict[str, float]

Precision score for each key.

f1_score property
f1_score: dict[str, float]

F1 score for each key.

categories property
categories: list[str]

Get all categories in the label.

Functions
update
update(key: str, score: BagOfWords) -> None

Update the score with the current evaluation for a given key.

Parameters:

Name Type Description Default
key str

Category to update.

required
score BagOfWords

Current score.

required

Examples:

>>> score.update("total", BagOfWords(label.entities, pred.entities))
Source code in ie_eval/scorer.py
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
def update(self, key: str, score: BagOfWords) -> None:
    """Update the score with the current evaluation for a given key.

    Args:
        key (str): Category to update.
        score (BagOfWords): Current score.

    Examples:
        >>> score.update("total", BagOfWords(label.entities, pred.entities))
    """
    self.label_word_count[key] += len(score.labels)
    self.count[key] += 1
    self.true_positives[key] += score.true_positives
    self.false_positives[key] += score.false_positives
    self.false_negatives[key] += score.false_negatives

Functions

calc_dist_sus_entity

calc_dist_sus_entity(
    hyp_ne: tuple[str, str],
    gt_ne: tuple[str, str],
    char_level: bool,
) -> float

Calculate substitution distance between 2 entities (hyp_ne, gt_ne).

Parameters:

Name Type Description Default
hyp_ne tuple

hypothesized Named Entity, format: (category, transcription)

required
gt_ne tuple

label Named Entity, format: (category, transcription)

required
char_level bool

if True, evaluate at character level.

required

Returns:

Name Type Description
float float

edit distance in range [0.0, 1.0]

Source code in ie_eval/scorer.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def calc_dist_sus_entity(
    hyp_ne: tuple[str, str],
    gt_ne: tuple[str, str],
    char_level: bool,
) -> float:
    """Calculate substitution distance between 2 entities (hyp_ne, gt_ne).

    Args:
        hyp_ne (tuple): hypothesized Named Entity, format: (category, transcription)
        gt_ne (tuple): label Named Entity, format: (category, transcription)
        char_level (bool): if True, evaluate at character level.

    Returns:
        float: edit distance in range [0.0, 1.0]
    """
    # Check coincidence of NE category
    if hyp_ne[0] != gt_ne[0]:
        return 1.0

    hyp_word_transcription = hyp_ne[1]
    gt_word_transcription = gt_ne[1]

    if char_level is False:
        # Split by word
        hyp_word_transcription = hyp_word_transcription.split()
        gt_word_transcription = gt_word_transcription.split()

    # Tuples of (distance, correct tokens )
    vec_dist_pre = [(i, 0) for i in range(len(gt_word_transcription) + 1)]
    vec_dist_act = [(0, 0)] * (len(gt_word_transcription) + 1)

    # if char_level == true, then the string is explored character by character (including space)
    for j in range(len(hyp_word_transcription)):
        vec_dist_act[0] = (j + 1, 0)
        for i in range(len(gt_word_transcription)):
            dist_ins = (vec_dist_act[i][0] + 1, vec_dist_act[i][1])
            dist_bor = (vec_dist_pre[i + 1][0] + 1, vec_dist_pre[i + 1][1])

            cost_sus = int(hyp_word_transcription[j] != gt_word_transcription[i])
            dist_sus = (
                vec_dist_pre[i][0] + cost_sus,
                vec_dist_pre[i][1] + (1 - cost_sus),
            )

            vec_dist_act[i + 1] = min(dist_ins, dist_bor, dist_sus)

        vec_dist_pre, vec_dist_act = vec_dist_act, vec_dist_pre

    # Saturation of CER/WER (min(CER, 1.0))
    return min(float(vec_dist_pre[-1][0]) / float(len(gt_word_transcription)), 1.0)