Scorer

ie_eval.scorer

Scorer.

Attributes

M `module-attribute`

M = Munkres()

Classes

OiEcerEwer `dataclass`

OiEcerEwer(
    labels: list[tuple[str, str]],
    predictions: list[tuple[str, str]],
    compute_ecer: bool,
    costs: list[list[float]] = list(),
    errors: float = 0.0,
)

Base class for order independent ECER / EWER computation.

Attributes

labels `instance-attribute`

labels: list[tuple[str, str]]

predictions `instance-attribute`

predictions: list[tuple[str, str]]

compute_ecer `instance-attribute`

compute_ecer: bool

costs `class-attribute` `instance-attribute`

costs: list[list[float]] = field(default_factory=list)

errors `class-attribute` `instance-attribute`

errors: float = 0.0

num_ne_gt `property`

num_ne_gt: int

Compute number of NEs in the label.

num_ne_hyp `property`

num_ne_hyp: int

Compute number of NEs in the prediction.

OiNerval `dataclass`

OiNerval(
    labels: list[tuple[str, str]],
    predictions: list[tuple[str, str]],
    nerval_threshold: float = 0.0,
    costs: list[list[float]] = list(),
    true_positives: int = 0,
    false_positives: int = 0,
    false_negatives: int = 0,
)

Base class for order independent Nerval computation of Precision, Recall and F1 scores.

Attributes

labels `instance-attribute`

labels: list[tuple[str, str]]

predictions `instance-attribute`

predictions: list[tuple[str, str]]

nerval_threshold `class-attribute` `instance-attribute`

nerval_threshold: float = 0.0

costs `class-attribute` `instance-attribute`

costs: list[list[float]] = field(default_factory=list)

true_positives `class-attribute` `instance-attribute`

true_positives: int = 0

false_positives `class-attribute` `instance-attribute`

false_positives: int = 0

false_negatives `class-attribute` `instance-attribute`

false_negatives: int = 0

num_ne_gt `property`

num_ne_gt: int

Returns the number of NEs in the label.

num_ne_hyp `property`

num_ne_hyp: int

Returns the number of NEs in the prediction.

BagOfWords

Bases: NamedTuple

Base class for bag-of-word metrics. Extension of bWER defined in End-to-End Page-Level Assessment of Handwritten Text Recognition (https://arxiv.org/pdf/2301.05935.pdf).

Attributes

labels `instance-attribute`

labels: list[str | tuple[str, str]]

predictions `instance-attribute`

predictions: list[str | tuple[str, str]]

label_counter `property`

label_counter: Counter[str]

Split the label into a list of words.

prediction_counter `property`

prediction_counter: Counter[str]

Split the prediction into a list of words.

true_positives `property`

true_positives: int

Count true positive words.

false_positives `property`

false_positives: int

Count false positive words.

false_negatives `property`

false_negatives: int

Count false negatives words.

all_words `property`

all_words: list[str | tuple[str, str]]

All tagged words.

label_word_vector `property`

label_word_vector: array

Iterate over the set of tagged words and count occurrences in the label.

prediction_word_vector `property`

prediction_word_vector: array

Iterate over the set of words and count occurrences in the prediction.

insertions_deletions `property`

insertions_deletions: int

Count unavoidable insertions and deletions. See Equation 8 from https://arxiv.org/pdf/2301.05935.pdf.

substitutions `property`

substitutions: int

Count substitutions. See Equation 8 from https://arxiv.org/pdf/2301.05935.pdf.

errors `property`

errors: int

Count total number of errors.

MicroAverageErrorRate

MicroAverageErrorRate()

Compute total error rates.

Initialize errors and counts.

Examples:

>>> score = MicroAverageErrorRate()

Source code in ie_eval/scorer.py

def __init__(self) -> None:
    """Initialize errors and counts.

    Examples:
        >>> score = MicroAverageErrorRate()
    """
    self.label_word_count = defaultdict(int)
    self.error_count = defaultdict(int)
    self.count = defaultdict(int)

Attributes

label_word_count `instance-attribute`

label_word_count = defaultdict(int)

error_count `instance-attribute`

error_count = defaultdict(int)

count `instance-attribute`

count = defaultdict(int)

error_rate `property`

error_rate: dict[str, float]

Error rate for each key.

categories `property`

categories: list[str]

Get all categories in the label.

Functions

update

update(key: str, score: BagOfWords | OiEcerEwer) -> None

Update the score with the current evaluation for a given key.

Parameters:

Name	Type	Description	Default
`key`	`str`	Category to update.	required
`score`	`TextEval`	Current score.	required

Examples:

>>> score.update("total", [("person", "Georges"), ("person", "Washington")])

Source code in ie_eval/scorer.py

def update(self, key: str, score: BagOfWords | OiEcerEwer) -> None:
    """Update the score with the current evaluation for a given key.

    Args:
        key (str): Category to update.
        score (TextEval): Current score.

    Examples:
        >>> score.update("total", [("person", "Georges"), ("person", "Washington")])
    """
    self.label_word_count[key] += len(score.labels)
    self.count[key] += 1
    self.error_count[key] += score.errors

MicroAverageFScore

MicroAverageFScore()

Compute total precision, recall, and f1 scores.

Initialize error counts.

Examples:

>>> score = MicroAverageFScore()

Source code in ie_eval/scorer.py

def __init__(self) -> None:
    """Initialize error counts.

    Examples:
        >>> score = MicroAverageFScore()
    """
    self.label_word_count = defaultdict(int)
    self.count = defaultdict(int)
    self.true_positives = defaultdict(int)
    self.false_positives = defaultdict(int)
    self.false_negatives = defaultdict(int)

Attributes

label_word_count `instance-attribute`

label_word_count = defaultdict(int)

count `instance-attribute`

count = defaultdict(int)

true_positives `instance-attribute`

true_positives = defaultdict(int)

false_positives `instance-attribute`

false_positives = defaultdict(int)

false_negatives `instance-attribute`

false_negatives = defaultdict(int)

recall `property`

recall: dict[str, float]

Recall score for each key.

precision `property`

precision: dict[str, float]

Precision score for each key.

f1_score `property`

f1_score: dict[str, float]

F1 score for each key.

categories `property`

categories: list[str]

Get all categories in the label.

Functions

update

update(key: str, score: BagOfWords) -> None

Update the score with the current evaluation for a given key.

Parameters:

Name	Type	Description	Default
`key`	`str`	Category to update.	required
`score`	`BagOfWords`	Current score.	required

Examples:

>>> score.update("total", BagOfWords(label.entities, pred.entities))

Source code in ie_eval/scorer.py

def update(self, key: str, score: BagOfWords) -> None:
    """Update the score with the current evaluation for a given key.

    Args:
        key (str): Category to update.
        score (BagOfWords): Current score.

    Examples:
        >>> score.update("total", BagOfWords(label.entities, pred.entities))
    """
    self.label_word_count[key] += len(score.labels)
    self.count[key] += 1
    self.true_positives[key] += score.true_positives
    self.false_positives[key] += score.false_positives
    self.false_negatives[key] += score.false_negatives

Functions

calc_dist_sus_entity

calc_dist_sus_entity(
    hyp_ne: tuple[str, str],
    gt_ne: tuple[str, str],
    char_level: bool,
) -> float

Calculate substitution distance between 2 entities (hyp_ne, gt_ne).

Parameters:

Name	Type	Description	Default
`hyp_ne`	`tuple`	hypothesized Named Entity, format: (category, transcription)	required
`gt_ne`	`tuple`	label Named Entity, format: (category, transcription)	required
`char_level`	`bool`	if True, evaluate at character level.	required

Returns:

Name	Type	Description
`float`	`float`	edit distance in range [0.0, 1.0]

Source code in ie_eval/scorer.py

def calc_dist_sus_entity(
    hyp_ne: tuple[str, str],
    gt_ne: tuple[str, str],
    char_level: bool,
) -> float:
    """Calculate substitution distance between 2 entities (hyp_ne, gt_ne).

    Args:
        hyp_ne (tuple): hypothesized Named Entity, format: (category, transcription)
        gt_ne (tuple): label Named Entity, format: (category, transcription)
        char_level (bool): if True, evaluate at character level.

    Returns:
        float: edit distance in range [0.0, 1.0]
    """
    # Check coincidence of NE category
    if hyp_ne[0] != gt_ne[0]:
        return 1.0

    hyp_word_transcription = hyp_ne[1]
    gt_word_transcription = gt_ne[1]

    if char_level is False:
        # Split by word
        hyp_word_transcription = hyp_word_transcription.split()
        gt_word_transcription = gt_word_transcription.split()

    # Tuples of (distance, correct tokens )
    vec_dist_pre = [(i, 0) for i in range(len(gt_word_transcription) + 1)]
    vec_dist_act = [(0, 0)] * (len(gt_word_transcription) + 1)

    # if char_level == true, then the string is explored character by character (including space)
    for j in range(len(hyp_word_transcription)):
        vec_dist_act[0] = (j + 1, 0)
        for i in range(len(gt_word_transcription)):
            dist_ins = (vec_dist_act[i][0] + 1, vec_dist_act[i][1])
            dist_bor = (vec_dist_pre[i + 1][0] + 1, vec_dist_pre[i + 1][1])

            cost_sus = int(hyp_word_transcription[j] != gt_word_transcription[i])
            dist_sus = (
                vec_dist_pre[i][0] + cost_sus,
                vec_dist_pre[i][1] + (1 - cost_sus),
            )

            vec_dist_act[i + 1] = min(dist_ins, dist_bor, dist_sus)

        vec_dist_pre, vec_dist_act = vec_dist_act, vec_dist_pre

    # Saturation of CER/WER (min(CER, 1.0))
    return min(float(vec_dist_pre[-1][0]) / float(len(gt_word_transcription)), 1.0)

Scorer

ie_eval.scorer

Attributes

M module-attribute

Classes

OiEcerEwer dataclass

Attributes

labels instance-attribute

predictions instance-attribute

compute_ecer instance-attribute

costs class-attribute instance-attribute

errors class-attribute instance-attribute

num_ne_gt property

num_ne_hyp property

OiNerval dataclass

Attributes

labels instance-attribute

predictions instance-attribute

nerval_threshold class-attribute instance-attribute

costs class-attribute instance-attribute

true_positives class-attribute instance-attribute

false_positives class-attribute instance-attribute

false_negatives class-attribute instance-attribute

num_ne_gt property

num_ne_hyp property

BagOfWords

Attributes

labels instance-attribute

predictions instance-attribute

label_counter property

prediction_counter property

true_positives property

false_positives property

false_negatives property

all_words property

label_word_vector property

prediction_word_vector property

insertions_deletions property

substitutions property

errors property

MicroAverageErrorRate

Attributes

label_word_count instance-attribute

error_count instance-attribute

count instance-attribute

error_rate property

categories property

Functions

update

MicroAverageFScore

Attributes

label_word_count instance-attribute

count instance-attribute

true_positives instance-attribute

false_positives instance-attribute

false_negatives instance-attribute

recall property

precision property

f1_score property

categories property

Functions

update

Functions

calc_dist_sus_entity

M `module-attribute`

OiEcerEwer `dataclass`

labels `instance-attribute`

predictions `instance-attribute`

compute_ecer `instance-attribute`

costs `class-attribute` `instance-attribute`

errors `class-attribute` `instance-attribute`

num_ne_gt `property`

num_ne_hyp `property`

OiNerval `dataclass`

labels `instance-attribute`

predictions `instance-attribute`

nerval_threshold `class-attribute` `instance-attribute`

costs `class-attribute` `instance-attribute`

true_positives `class-attribute` `instance-attribute`

false_positives `class-attribute` `instance-attribute`

false_negatives `class-attribute` `instance-attribute`

num_ne_gt `property`

num_ne_hyp `property`

labels `instance-attribute`

predictions `instance-attribute`

label_counter `property`

prediction_counter `property`

true_positives `property`

false_positives `property`

false_negatives `property`

all_words `property`

label_word_vector `property`

prediction_word_vector `property`

insertions_deletions `property`

substitutions `property`

errors `property`

label_word_count `instance-attribute`

error_count `instance-attribute`

count `instance-attribute`

error_rate `property`

categories `property`

label_word_count `instance-attribute`

count `instance-attribute`

true_positives `instance-attribute`

false_positives `instance-attribute`

false_negatives `instance-attribute`

recall `property`

precision `property`

f1_score `property`

categories `property`