CER / WER metrics
The atr-ner-eval nerval
command can be used to compute the Nerval precision, recall and F1 score of your automatic workflow, either globally or for each semantic category.
Metric description
Nerval is an evaluation metric for named-entity recognition evaluation on noisy text, typically to measure NER performances on Automatic Text Recognition predictions. It relies on string alignment at character level. The automatic transcription is first aligned with the ground truth at character level, by minimizing the Levenshtein distance between them. Each entity in the ground truth is then matched with a corresponding entity in the aligned transcription. If the character error rate between the two entities is lower than a threshold, the predicted entity is considered as recognized. A threshold of 0 would impose perfect matches, while a threshold of 1 would allow completely different strings to be considered as a match. For the purpose of matching entities to existing databases, we estimate that a threshold of 0.3 (default value) is fair, as it requires a 70% match between entities.
Parameters
Here are the available parameters for this metric:
Parameter | Description | Type | Default |
---|---|---|---|
--label-dir |
Path to the directory containing BIO label files. | pathlib.Path |
|
--prediction-dir |
Path to the directory containing BIO prediction files. | pathlib.Path |
|
--threshold |
Character Error Rate threshold used to match entities | float |
0.3 |
--by-category |
Whether to display CER and WER for each category. | bool |
False |
The parameters are also described when running atr-ner-eval nerval --help
.
Examples
Global evaluation
Use the following command to compute the overall CER and WER:
atr-ner-eval nerval --label-dir Simara/labels/ \
--prediction-dir Simara/predictions/
It will output the results in Markdown format:
2024-01-12 15:35:19,790 INFO/bio_parser.utils: Loading labels...
2024-01-12 15:35:19,968 INFO/bio_parser.utils: Loading prediction...
2024-01-12 15:35:20,144 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | Precision | Recall | F1 | Support |
|:--------:|:---------:|:------:|:-----:|:-------:|
| total | 94.96 | 95.3 | 95.13 | 4430 |
Evaluation for each category
Use the following command to compute CER and WER for each semantic category:
atr-ner-eval nerval --label-dir Simara/labels/ \
--prediction-dir Simara/predictions/ \
--by-category
It will output the results in Markdown format:
2024-01-12 15:35:39,240 INFO/bio_parser.utils: Loading labels...
2024-01-12 15:35:39,326 INFO/bio_parser.utils: Loading prediction...
2024-01-12 15:35:39,422 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | Precision | Recall | F1 | Support |
|:-------------------:|:---------:|:------:|:-----:|:-------:|
| analyse_compl | 92.16 | 93.0 | 92.58 | 771 |
| classement | 92.41 | 94.81 | 93.59 | 77 |
| cote_article | 97.94 | 98.52 | 98.23 | 676 |
| cote_serie | 97.49 | 97.63 | 97.56 | 676 |
| date | 97.61 | 98.0 | 97.81 | 751 |
| intitule | 94.28 | 94.28 | 94.28 | 804 |
| precisions_sur_cote | 90.8 | 90.67 | 90.73 | 675 |
| total | 94.96 | 95.3 | 95.13 | 4430 |
Evaluation for each category with a custom threshold
Use the following command to compute CER and WER for each semantic category with a different threshold:
atr-ner-eval nerval --label-dir Simara/labels/ \
--prediction-dir Simara/predictions/ \
--threshold 0.1 \
--by-category
It will output the results in Markdown format:
2024-01-12 15:35:39,240 INFO/bio_parser.utils: Loading labels...
2024-01-12 15:35:39,326 INFO/bio_parser.utils: Loading prediction...
2024-01-12 15:35:39,422 INFO/bio_parser.utils: The dataset is complete and valid.
| Category | Precision | Recall | F1 | Support |
|:-------------------:|:---------:|:------:|:-----:|:-------:|
| analyse_compl | 80.72 | 81.45 | 81.08 | 771 |
| classement | 72.15 | 74.03 | 73.08 | 77 |
| cote_article | 97.06 | 97.63 | 97.35 | 676 |
| cote_serie | 97.49 | 97.63 | 97.56 | 676 |
| date | 95.23 | 95.61 | 95.42 | 751 |
| intitule | 84.83 | 84.83 | 84.83 | 804 |
| precisions_sur_cote | 89.32 | 89.19 | 89.25 | 675 |
| total | 90.13 | 90.45 | 90.29 | 4430 |