Building accurate translation-tailored large language models with language-aware instruction tuning

Changtong ZAN; Liang DING; Li SHEN; Yibing ZHAN; Xinghao YANG; Weifeng LIU

doi:10.1631/FITEE.2400458

Your Location：

Home >

Browse articles >

Building accurate translation-tailored large language models with language-aware instruction tuning

Regular Papers | Updated：2025-09-04

- Building accurate translation-tailored large language models with language-aware instruction tuning
- 构建基于语言感知指令微调的精准翻译定制大语言模型
- Frontiers of Information Technology & Electronic Engineering Vol. 26, Issue 8, Pages: 1341-1355(2025)
- Affiliations：
  
  1.College of Control Science and Engineering, China University of Petroleum (East China), Qingdao 266580, China
  2.School of Computer Science, University of Sydney, New South Wales 2006, Australia
  3.JD Explore Academy, JD.com Inc., Beijing 100101, China
  4.School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
- Author bio：
  
  ‡ Corresponding authors
- Funds：
- DOI：10.1631/FITEE.2400458
  CLC： TP391
- Received：30 March 2024，
  
  Revised：27 November 2024，
  
  Published：2025-08
- Accepted：
Scan QR Code
Changtong ZAN, Liang DING, Li SHEN, et al. Building accurate translation-tailored large language models with language-aware instruction tuning[J]. Frontiers of information technology & electronic engineering, 2025, 26(8): 1341-1355.
DOI：

Changtong ZAN, Liang DING, Li SHEN, et al. Building accurate translation-tailored large language models with language-aware instruction tuning[J]. Frontiers of information technology & electronic engineering, 2025, 26(8): 1341-1355. DOI： 10.1631/FITEE.2400458.

摘要

大语言模型（LLM）在诸如机器翻译等自然语言处理任务中展现出了卓越的能力。然而，大语言模型庞大的参数规模在推理过程中会带来显著的计算成本。先前研究尝试通过在翻译数据上对中等规模的模型进行微调，来训练翻译定制的大语言模型。然而，在处理未包含在微调数据集内的零样本翻译方向时，模型往往会忽视指令要求，从而将内容翻译成错误的目标语言，即出现翻译脱靶问题。为此，本文提出一种两阶段的微调算法，以提高翻译定制大语言模型的指令遵循能力，尤其是保持翻译方向的准确性。首先在翻译数据集上对模型进行微调，以激发其基本的翻译能力。在第二阶段，通过将指令随机替换为错误的指令，构建指令冲突样本。随后，引入额外的非似然损失，以降低模型对这些样本的分配概率。针对16个零样本翻译方向，使用LLaMA 2和LLaMA 3模型在两个基线数据集上进行的实验结果表明，与强基线（翻译数据微调的大模型LLaMA）相比，本文的方法能显著降低翻译偏离目标语种的比例（最高可降低62.4个百分点），从而提升翻译质量（双语评估替补指标最高可提高9.7）。分析表明，本文的方法能在其他任务（如监督翻译和通用任务）中保持优异性能。代码可在以下网址获取：https://github.com/alphadl/LanguageAware_Tuning。

Abstract

Large language models (LLMs) exhibit remarkable capabilities in various natural language processing tasks

such as machine translation. However

the large number of LLM parameters incurs significant costs during inference. Previous studies have attempted to train translation-tailored LLMs with moderately sized models by fine-tuning them on the translation data. Nevertheless

when performing translations in zero-shot directions that are absent from the fine-tuning data

the problem of ignoring instructions and thus producing translations in the wrong language (i.e.

the off-target translation issue) remains unresolved. In this work

we design a two-stage fine-tuning algorithm to improve the instruction-following ability of translation-tailored LLMs

particularly for maintaining accurate translation directions. We first fine-tune LLMs on the translation data to elicit basic translation capabilities. At the second stage

we construct instruction-conflicting samples by randomly replacing the instructions with the incorrect ones. Then

we introduce an extra unlikelihood loss to reduce the probability assigned to those samples. Experiments on two benchmarks using the LLaMA 2 and LLaMA 3 models

spanning 16 zero-shot directions

demonstrate that

compared to the competitive baseline—translation-finetuned LLaMA

our method could effectively reduce the off-target translation ratio (up to -62.4 percentage points)

thus improving translation quality (up to +9.7 bilingual evaluation understudy). Analysis shows that our method can preserve the model's performance on other tasks

such as supervised translation and general tasks. Code is released at https://github.com/alphadl/LanguageAware_Tuning.

关键词

Keywords

references

Brown TB , Mann B , Ryder N , et al. , 2020 . Language models are few-shot learners . https://doi.org/10.48550/arXiv.2005.14165 https://doi.org/10.48550/arXiv.2005.14165

Chen L , Ma SM , Zhang DD , et al. , 2023 . On the off-target problem of zero-shot multilingual neural machine translation . In: Rogers A , Boyd-Graber J , Okazaki N (Eds.), Findings of the Association for Computational Linguistics . Association for Computational Linguistics , Toronto, Canada , p. 9542 - 9558 . https://doi.org/10.18653/v1/2023.findings-acl.608 https://doi.org/10.18653/v1/2023.findings-acl.608

Cho K , van Merrienboer B , Gulcehre C , et al. , 2014 . Learning phrase representations using RNN encoder–decoder for statistical machine translation . Proc Conf on Empirical Methods in Natural Language Processing , p. 1724 - 1734 . https://doi.org/10.3115/v1/D14-1179 https://doi.org/10.3115/v1/D14-1179

Chung HW , Hou L , Longpre S , et al. , 2024 . Scaling instruction-finetuned language models . J Mach Learn Res , 25 ( 70 ): 1 - 53 .

Dabre R , Kurohashi S , 2019 . MMCR4NLP: multilingual multiway corpora repository for natural language processing . https://arxiv.org/abs/1710.01025 https://arxiv.org/abs/1710.01025

Devlin J , Chang MW , Lee K , et al. , 2019 . BERT: pre-training of deep bidirectional Transformers for language understanding . Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , p. 4171 - 4186 . https://doi.org/10.18653/v1/N19-1423 https://doi.org/10.18653/v1/N19-1423

Feng ZP , Chen RZ , Zhang Y , et al. , 2024 . Ladder: a model-agnostic framework boosting LLM-based machine translation to the next level . https://doi.org/10.48550/arXiv.2406.15741 https://doi.org/10.48550/arXiv.2406.15741

Fu Y , Peng H , Ou LT , et al. , 2023 . Specializing smaller language models towards multi-step reasoning . Proc 40 th Int Conf on Machine Learning , p. 10421 - 10430 .

Gu JT , Wang Y , Cho K , et al. , 2019 . Improved zero-shot neural machine translation via ignoring spurious correlations . Proc 57 th Annual Meeting of the Association for Computational Linguistics , p. 1258 - 1268 . https://doi.org/10.18653/v1/P19-1121 https://doi.org/10.18653/v1/P19-1121

Hendy A , Abdelrehim M , Sharaf A , et al. , 2023 . How good are GPT models at machine translation? A comprehensive evaluation . https://doi.org/10.48550/arXiv.2302.09210 https://doi.org/10.48550/arXiv.2302.09210

Hosseini A , Reddy S , Bahdanau D , et al. , 2021 . Understanding by understanding not: modeling negation in language models . Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , p. 1301 - 1312 . https://doi.org/10.18653/v1/2021.naacl-main.102 https://doi.org/10.18653/v1/2021.naacl-main.102

Hu MT , Bai YH , Wu YK , et al. , 2023 . Uncertainty-aware unlikelihood learning improves generative aspect sentiment quad prediction . In: Rogers A , Boyd-Graber J , Okazaki N (Eds.), Findings of the Association for Computational Linguistics . Association for Computational Linguistics , Toronto, Canada , p. 13481 - 13494 . https://doi.org/10.18653/v1/2023.findings-acl.851 https://doi.org/10.18653/v1/2023.findings-acl.851

Huang YX , Gu HL , Yu ZT , et al. , 2024 . Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning . Front Inform Technol Electron Eng , 25 ( 1 ): 121 - 134 . https://doi.org/10.1631/FITEE.2300296 https://doi.org/10.1631/FITEE.2300296

Jiao WX , Huang JT , Wang WX , et al. , 2023 . ParroT: translating during chat using large language models tuned with human translation and feedback . https://doi.org/10.48550/arXiv.2304.02426 https://doi.org/10.48550/arXiv.2304.02426

Joulin A , Grave E , Bojanowski P , et al. , 2016a . Bag of tricks for efficient text classification . https://doi.org/10.48550/arXiv.1607.01759 https://doi.org/10.48550/arXiv.1607.01759

Joulin A , Grave E , Bojanowski P , et al. , 2016b . FastText.zip: compressing text classification models . https://doi.org/10.48550/arXiv.1612.03651 https://doi.org/10.48550/arXiv.1612.03651

Kaplan J , McCandlish S , Henighan T , et al. , 2020 . Scaling laws for neural language models . https://doi.org/10.48550/arXiv.2001.08361 https://doi.org/10.48550/arXiv.2001.08361

Kwon W , Li ZH , Zhuang SY , et al. , 2023 . Efficient memory management for large language model serving with PagedAttention . Proc 29 th Symp on Operating Systems Principles , p. 611 - 626 . https://doi.org/10.1145/3600006.3613165 https://doi.org/10.1145/3600006.3613165

Li B , Yang P , Sun YK , et al. , 2024 . Advances and challenges in artificial intelligence text generation . Front Inform Technol Electron Eng , 25 ( 1 ): 64 - 83 . https://doi.org/10.1631/FITEE.2300410 https://doi.org/10.1631/FITEE.2300410

Li JH , Zhou H , Huang SJ , et al. , 2024 . Eliciting the translation ability of large language models via multilingual finetuning with translation instructions . Trans Assoc Comput Linguist , 12 : 576 - 592 .

Li M , Roller S , Kulikov I , et al. , 2020 . Don't say that! Making inconsistent dialogue unlikely with unlikelihood training . Proc 58 th Annual Meeting of the Association for Computational Linguistics , p. 4715 - 4728 . https://doi.org/10.18653/v1/2020.acl-main.428 https://doi.org/10.18653/v1/2020.acl-main.428

Liu YH , Gu JT , Goyal N , et al. , 2020 . Multilingual denoising pre-training for neural machine translation . Trans Assoc Comput Linguist , 8 : 726 - 742 . https://doi.org/10.1162/tacl_a_00343 https://doi.org/10.1162/tacl_a_00343

Liu YJ , Zeng XF , Meng FD , et al. , 2023 . Instruction position matters in sequence generation with large language models . https://doi.org/10.48550/arXiv.2308.12097 https://doi.org/10.48550/arXiv.2308.12097

Lu QY , Qiu BP , Ding L , et al. , 2024 . Error analysis prompting enables human-like translation evaluation in large language models . https://doi.org/10.48550/arXiv.2303.13809 https://doi.org/10.48550/arXiv.2303.13809

Miao YC , Zhang S , Ding L , et al. , 2024 . InfoRM: mitigating reward hacking in RLHF via information-theoretic reward modeling . https://doi.org/10.48550/arXiv.2402.09345 https://doi.org/10.48550/arXiv.2402.09345

Min BN , Ross H , Sulem E , et al. , 2023 . Recent advances in natural language processing via large pre-trained language models: a survey . ACM Comput Surv , 56 ( 2 ): 1 - 40 . https://doi.org/10.1145/3605943 https://doi.org/10.1145/3605943

Mishra S , Khashabi D , Baral C , et al. , 2022 . Cross-task generalization via natural language crowdsourcing instructions . Proc 60 th Annual Meeting of the Association for Computational Linguistics , p. 3470 - 3487 . https://doi.org/10.18653/v1/2022.acl-long.244 https://doi.org/10.18653/v1/2022.acl-long.244

Nogueira dos Santos C , Ma XF , Nallapati R , et al. , 2020 . Beyond [CLS] through ranking by generation . Proc Conf on Empirical Methods in Natural Language Processing , p. 1722 - 1727 . https://doi.org/10.18653/v1/2020.emnlp-main.134 https://doi.org/10.18653/v1/2020.emnlp-main.134

OpenAI , 2024 . GPT-4 technical report . https://doi.org/10.48550/arXiv.2303.08774 https://doi.org/10.48550/arXiv.2303.08774

Peng KQ , Ding L , Zhong QH , et al. , 2023 . Towards making the most of ChatGPT for machine translation . In: Bouamor H , Pino J , Bali K (Eds.), Findings of the Association for Computational Linguistics . Association for Computational Linguistics , Singapore , p. 5622 - 5633 . https://doi.org/10.18653/v1/2023.findings-emnlp.373 https://doi.org/10.18653/v1/2023.findings-emnlp.373

Post M , 2018 . A call for clarity in reporting BLEU scores . Proc 3 rd Conf on Machine Translation: Research Papers , p. 186 - 191 . https://doi.org/10.18653/v1/W18-6319 https://doi.org/10.18653/v1/W18-6319

Qu Z , Watanabe T , 2022 . Adapting to non-centered languages for zero-shot multilingual translation . Proc 29 th Int Conf on Computational Linguistics , p. 5251 - 5265 . https://aclanthology.org/2022.coling-1.467 https://aclanthology.org/2022.coling-1.467

Ren ZY , Zhan YB , Yu BS , et al. , 2024 . Healthcare Copilot: eliciting the power of general LLMs for medical consultation . https://doi.org/10.48550/arXiv.2402.13408 https://doi.org/10.48550/arXiv.2402.13408

Sennrich R , Vamvas J , Mohammadshahi A , 2024 . Mitigating hallucinations and off-target machine translation with source-contrastive and language-contrastive decoding . https://doi.org/10.48550/arXiv.2309.07098 https://doi.org/10.48550/arXiv.2309.07098

Stap D , Hasler E , Byrne B , et al. , 2024 . The fine-tuning paradox: boosting translation quality without sacrificing LLM abilities . Proc 62 nd Annual Meeting of the Association for Computational Linguistics , p. 6189 - 6206 . https://doi.org/10.18653/v1/2024.acl-long.336 https://doi.org/10.18653/v1/2024.acl-long.336

Touvron H , Lavril T , Izacard G , et al. , 2023a . LLaMA: open and efficient foundation language models . https://doi.org/10.48550/arXiv.2302.13971 https://doi.org/10.48550/arXiv.2302.13971

Touvron H , Martin L , Stone K , et al. , 2023b . LLaMA 2: open foundation and fine-tuned chat models . https://doi.org/10.48550/arXiv.2307.09288 https://doi.org/10.48550/arXiv.2307.09288

Wang S , Ding L , Shen L , et al. , 2024 . OOP: object-oriented programming evaluation benchmark for large language models . https://doi.org/10.48550/arXiv.2401.06628 https://doi.org/10.48550/arXiv.2401.06628

Wang WB , Ding L , Shen L , et al. , 2024 . WisdoM: improving multimodal sentiment analysis by fusing contextual world knowledge . https://doi.org/10.48550/arXiv.2401.06659 https://doi.org/10.48550/arXiv.2401.06659

Wang YM , Zhang ZS , Wang R , 2023 . Element-aware summarization with large language models: expert-aligned evaluation and chain-of-thought method . Proc 61 st Annual Meeting of the Association for Computational Linguistics , p. 8640 - 8665 . https://doi.org/10.18653/v1/2023.acl-long.482 https://doi.org/10.18653/v1/2023.acl-long.482

Wang YZ , Kordi Y , Mishra S , et al. , 2023 . Self-instruct: aligning language models with self-generated instructions . Proc 61 st Annual Meeting of the Association for Computational Linguistics , p. 13484 - 13508 . https://doi.org/10.18653/v1/2023.acl-long.754 https://doi.org/10.18653/v1/2023.acl-long.754

Wei J , Bosma M , Zhao VY , et al. , 2022 . Finetuned language models are zero-shot learners . https://doi.org/10.48550/arXiv.2109.01652 https://doi.org/10.48550/arXiv.2109.01652

Wei J , Wang XZ , Schuurmans D , et al. , 2023 . Chain-of-thought prompting elicits reasoning in large language models . https://doi.org/10.48550/arXiv.2201.11903 https://doi.org/10.48550/arXiv.2201.11903

Welleck S , Kulikov I , Roller S , et al. , 2019 . Neural text generation with unlikelihood training . https://doi.org/10.48550/arXiv.1908.04319 https://doi.org/10.48550/arXiv.1908.04319

Wolf T , Debut L , Sanh V , et al. , 2020 . Transformers: state-of-the-art natural language processing . Proc Conf on Empirical Methods in Natural Language Processing: System Demonstrations , p. 38 - 45 . https://doi.org/10.18653/v1/2020.emnlp-demos.6 https://doi.org/10.18653/v1/2020.emnlp-demos.6

Xu HR , Sharaf A , Chen YM , et al. , 2024a . Contrastive preference optimization: pushing the boundaries of LLM performance in machine translation . https://doi.org/10.48550/arXiv.2401.08417 https://doi.org/10.48550/arXiv.2401.08417

Xu HR , Kim YJ , Sharaf A , et al. , 2024b . A paradigm shift in machine translation: boosting translation performance of large language models . https://doi.org/10.48550/arXiv.2309.11674 https://doi.org/10.48550/arXiv.2309.11674

Xu ZY , Peng KQ , Ding L , et al. , 2024 . Take care of your prompt bias! Investigating and mitigating prompt bias in factual knowledge extraction . Proc Joint Int Conf on Computational Linguistics, Language Resources and Evaluation , p. 15552 - 15665 . https://aclanthology.org/2024.lrec-main.1352 https://aclanthology.org/2024.lrec-main.1352

Zan CT , Peng KQ , Ding L , et al. , 2022 . Vega-MT: the JD explore academy machine translation system for WMT22 . Proc 7 th Conf on Machine Translation , p. 411 - 422 .

Zan CT , Ding L , Shen L , et al. , 2023 . Unlikelihood tuning on negative samples amazingly improves zero-shot translation . https://doi.org/10.48550/arXiv.2309.16599 https://doi.org/10.48550/arXiv.2309.16599

Zeng JL , Meng FD , Yin YJ , et al. , 2024 . TIM: teaching large language models to translate with comparison . https://doi.org/10.48550/arXiv.2307.04408 https://doi.org/10.48550/arXiv.2307.04408

Zhang HB , Chen Q , Zhang WW , 2022 . Improving entity linking with two adaptive features . Front Inform Technol Electron Eng , 23 ( 11 ): 1620 - 1630 . https://doi.org/10.1631/FITEE.2100492 https://doi.org/10.1631/FITEE.2100492

Zhang HB , Chen KH , Bai XF , et al. , 2024 . Paying more attention to source context: mitigating unfaithful translations from large language model . In: Ku LW , Martins A , Srikumar V (Eds.), Findings of the Association for Computational Linguistics . Association for Computational Linguistics , Bangkok, Thailand , p. 13816 - 13836 . https://doi.org/10.18653/v1/2024.findings-acl.821 https://doi.org/10.18653/v1/2024.findings-acl.821

Zhang SS , Roller S , Goyal N , et al. , 2022 . OPT: open pre-trained Transformer language models . https://doi.org/10.48550/arXiv.2205.01068 https://doi.org/10.48550/arXiv.2205.01068

Zhang Y , Li YF , Cui LY , et al. , 2023 . Siren's song in the AI ocean: a survey on hallucination in large language models . https://doi.org/10.48550/arXiv.2309.01219 https://doi.org/10.48550/arXiv.2309.01219

Zhang YQ , Ding L , Zhang LF , et al. , 2024 . Intention analysis makes LLMs a good jailbreak defender . https://doi.org/10.48550/arXiv.2401.06561 https://doi.org/10.48550/arXiv.2401.06561

Zhong QH , Ding L , Liu JH , et al. , 2023 . Can ChatGPT understand too? A comparative study on ChatGPT and fine-tuned BERT . https://doi.org/10.48550/arXiv.2302.10198 https://doi.org/10.48550/arXiv.2302.10198

Zhong QH , Ding L , Liu JH , et al. , 2024 . Rose doesn't do that: boosting the safety of instruction-tuned large language models with reverse prompt contrastive decoding . https://doi.org/10.48550/arXiv.2402.11889 https://doi.org/10.48550/arXiv.2402.11889

Views

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

No data

Related Author

Xin Peng

Related Institution

School of Computer Science, Fudan University

Map

Chat

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰