A Hybrid Approach to Critical Error Detection
This report visualizes an experiment on the WMT21 Critical Error Detection task. We explore a novel hybrid system combining the COMETKiwi-23 XL quality estimation model with a TinyLlama-1.1B verifier to accurately identify high-impact translation errors across four language pairs.
The Challenge
The WMT21 task is to classify machine translations as either containing a "critical error" (label 1) or not (label 0). A key challenge is the severe class imbalance, with far fewer critical errors than acceptable translations.
Dataset Composition
The experiment uses 4,000 samples from the WMT21 development set, split evenly across four language pairs. This chart shows the label distribution within each pair.
Hybrid System Architecture
Our method generates two distinct signals for each translation and fuses them for a final prediction. This flowchart illustrates the process from input to classification.
Input
Source & MT Sentence
COMETKiwi-23 XL
Generates a continuous quality score.
TinyLlama Verifier
Generates a binary "critical error" flag (Yes/No).
Feature Fusion
[Score, Flag]
Logistic Regression
Final Prediction (0 or 1)
Experiment Results
Overall Performance
Matthews Correlation Coefficient (MCC)
0.282
Across all 4 language pairs.
Hover over a bar to see a detailed analysis.
This chart compares the model's performance on each language pair, revealing how linguistic differences can impact accuracy.