CER / WER metrics

The atr-ner-eval nerval command can be used to compute the Nerval precision, recall and F1 score of your automatic workflow, either globally or for each semantic category.

Metric description

Nerval is an evaluation metric for named-entity recognition evaluation on noisy text, typically to measure NER performances on Automatic Text Recognition predictions. It relies on string alignment at character level. The automatic transcription is first aligned with the ground truth at character level, by minimizing the Levenshtein distance between them. Each entity in the ground truth is then matched with a corresponding entity in the aligned transcription. If the character error rate between the two entities is lower than a threshold, the predicted entity is considered as recognized. A threshold of 0 would impose perfect matches, while a threshold of 1 would allow completely different strings to be considered as a match. For the purpose of matching entities to existing databases, we estimate that a threshold of 0.3 (default value) is fair, as it requires a 70% match between entities.

Parameters

Here are the available parameters for this metric:

Parameter	Description	Type	Default
`--label-dir`	Path to the directory containing BIO label files.	`pathlib.Path`
`--prediction-dir`	Path to the directory containing BIO prediction files.	`pathlib.Path`
`--threshold`	Character Error Rate threshold used to match entities	`float`	`0.3`
`--by-category`	Whether to display CER and WER for each category.	`bool`	`False`

The parameters are also described when running atr-ner-eval nerval --help.

Examples

Global evaluation

Use the following command to compute the overall CER and WER:

atr-ner-eval nerval --label-dir Simara/labels/ \
                    --prediction-dir Simara/predictions/

It will output the results in Markdown format:

2024-01-12 15:35:19,790 INFO/bio_parser.utils: Loading labels...
2024-01-12 15:35:19,968 INFO/bio_parser.utils: Loading prediction...
2024-01-12 15:35:20,144 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | Precision | Recall |   F1  | Support |
|:--------:|:---------:|:------:|:-----:|:-------:|
|  total   |   94.96   |  95.3  | 95.13 |   4430  |

Evaluation for each category

Use the following command to compute CER and WER for each semantic category:

atr-ner-eval nerval --label-dir Simara/labels/ \
                    --prediction-dir Simara/predictions/ \
                    --by-category

It will output the results in Markdown format:

2024-01-12 15:35:39,240 INFO/bio_parser.utils: Loading labels...
2024-01-12 15:35:39,326 INFO/bio_parser.utils: Loading prediction...
2024-01-12 15:35:39,422 INFO/bio_parser.utils: The dataset is complete and valid.
|       Category      | Precision | Recall |   F1  | Support |
|:-------------------:|:---------:|:------:|:-----:|:-------:|
|    analyse_compl    |   92.16   |  93.0  | 92.58 |   771   |
|      classement     |   92.41   | 94.81  | 93.59 |    77   |
|     cote_article    |   97.94   | 98.52  | 98.23 |   676   |
|      cote_serie     |   97.49   | 97.63  | 97.56 |   676   |
|         date        |   97.61   |  98.0  | 97.81 |   751   |
|       intitule      |   94.28   | 94.28  | 94.28 |   804   |
| precisions_sur_cote |    90.8   | 90.67  | 90.73 |   675   |
|        total        |   94.96   |  95.3  | 95.13 |   4430  |

Evaluation for each category with a custom threshold

Use the following command to compute CER and WER for each semantic category with a different threshold:

atr-ner-eval nerval --label-dir Simara/labels/ \
                    --prediction-dir Simara/predictions/ \
                    --threshold 0.1 \
                    --by-category

It will output the results in Markdown format:

2024-01-12 15:35:39,240 INFO/bio_parser.utils: Loading labels...
2024-01-12 15:35:39,326 INFO/bio_parser.utils: Loading prediction...
2024-01-12 15:35:39,422 INFO/bio_parser.utils: The dataset is complete and valid.
|       Category      | Precision | Recall |   F1  | Support |
|:-------------------:|:---------:|:------:|:-----:|:-------:|
|    analyse_compl    |   80.72   | 81.45  | 81.08 |   771   |
|      classement     |   72.15   | 74.03  | 73.08 |    77   |
|     cote_article    |   97.06   | 97.63  | 97.35 |   676   |
|      cote_serie     |   97.49   | 97.63  | 97.56 |   676   |
|         date        |   95.23   | 95.61  | 95.42 |   751   |
|       intitule      |   84.83   | 84.83  | 84.83 |   804   |
| precisions_sur_cote |   89.32   | 89.19  | 89.25 |   675   |
|        total        |   90.13   | 90.45  | 90.29 |   4430  |