Implemented CTranslate2 for the three local translation models and
switched the existing local_nllb / local_marian factories over to it.
The new runtime lives in local_ctranslate2.py, including HF->CT2
auto-conversion, float16 compute type mapping, Marian direction
handling, and NLLB target-prefix decoding. The service wiring is in
service.py (line 113), and the three model configs now point at explicit
ctranslate2-float16 dirs in config.yaml (line 133).
I also updated the setup path so this is usable end-to-end:
ctranslate2>=4.7.0 was added to requirements_translator_service.txt and
requirements.txt, the download script now supports pre-conversion in
download_translation_models.py (line 27), and the docs/config examples
were refreshed in translation/README.md. I installed ctranslate2 into
.venv-translator, pre-converted all three models, and the CT2 artifacts
are now already on disk:
models/translation/facebook/nllb-200-distilled-600M/ctranslate2-float16
models/translation/Helsinki-NLP/opus-mt-zh-en/ctranslate2-float16
models/translation/Helsinki-NLP/opus-mt-en-zh/ctranslate2-float16
Verification was solid. python3 -m compileall passed, direct
TranslationService smoke tests ran successfully in .venv-translator, and
the focused NLLB benchmark on the local GPU showed a clear win:
batch_size=16: HF 0.347s/batch, 46.1 items/s vs CT2 0.130s/batch, 123.0
items/s
batch_size=1: HF 0.396s/request vs CT2 0.126s/request
One caveat: translation quality on some very short phrases, especially
opus-mt-en-zh, still looks a bit rough in smoke tests, so I’d run your
real quality set before fully cutting over. If you want, I can take the
next step and update the benchmark script/report so you have a fresh
full CT2 performance report for all three models.