FOLLOWUS
1.School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2.Southeast Academy of Information Technology, Beijing Institute of Technology, Putian 351100, China
3.Huawei Noahs Ark Lab, Shenzhen 518129, China
E-mail: yhli@bit.edu.cn
hhy63@bit.edu.cn
puking.w@huawei.com
‡Corresponding author
收稿日期:2023-12-01,
修回日期:2024-05-06,
纸质出版日期:2025-03
Scan QR Code
李英豪, 黄河燕, 王宝军, 等. DRMSpell:中文拼写纠正中的动态多模态重新加权技术[J]. 信息与电子工程前沿(英文), 2025,26(3):354-366.
Yinghao LI, Heyan HUANG, Baojun WANG, et al. DRMSpell: dynamically reweighting multimodality for Chinese spelling correction[J]. Frontiers of information technology & electronic engineering, 2025, 26(3): 354-366.
李英豪, 黄河燕, 王宝军, 等. DRMSpell:中文拼写纠正中的动态多模态重新加权技术[J]. 信息与电子工程前沿(英文), 2025,26(3):354-366. DOI: 10.1631/FITEE.2300816.
Yinghao LI, Heyan HUANG, Baojun WANG, et al. DRMSpell: dynamically reweighting multimodality for Chinese spelling correction[J]. Frontiers of information technology & electronic engineering, 2025, 26(3): 354-366. DOI: 10.1631/FITEE.2300816.
中文拼写纠正任务旨在检测和纠正中文文本中可能出现的拼写错误。但中文表现出高度的复杂性,其特点是存在多种声调变化的拼音表示,这些声调变化可以对应不同的字符。鉴于中文语言的这种复杂性,中文拼写纠正任务对于确保书面交流的准确性和清晰度至关重要,最近的研究已经将外部知识通过语音和视觉模态引入模型中。然而,这些方法未能有效地利用模态信息来针对性地解决不同类型的拼写错误。在本文中我们提出一个名为DRMSpell的多模态预训练语言模型以用于中文拼写纠正,该模型考虑了模态之间的交互作用。我们引入一个动态多模态重新加权模块,用于重新加权各种模态以获取更多的多模态信息。为充分利用所获得的多模态信息并进一步加强模型,我们提出一个独立模态掩码策略,在预训练阶段独立掩蔽一个词元的三种模态。我们的方法在大多数广泛使用的基准测试指标上实现了最先进的性能,实验结果表明,我们的方法能够建模模态之间的交互信息,即使对错误模态信息也具有鲁棒性。
Chinese spelling correction (CSC) is a task that aims to detect and correct the spelling errors that may occur in Chinese texts. However
the Chinese language exhibits a high degree of complexity
characterized by the presence of multiple phonetic representations known as pinyin
which possess distinct tonal variations that can correspond to various characters. Given the complexity inherent in the Chinese language
the CSC task becomes imperative for ensuring the accuracy and clarity of written communication. Recent research has included external knowledge into the model using phonological and visual modalities. However
these methods do not effectively target the utilization of modality information to address the different types of errors. In this paper
we propose a multimodal pretrained language model called DRMSpell for CSC
which takes into consideration the interaction between the modalities. A dynamically reweighting multimodality (DRM) module is introduced to reweight various modalities for obtaining more multimodal information. To fully use the multimodal information obtained and to further strengthen the model
an independent-modality masking strategy (IMS) is proposed to independently mask three modalities of a token in the pretraining stage. Our method achieves state-of-the-art performance on most metrics constituting widely used benchmarks. The findings of the experiments demonstrate that our method is capable of modeling the interactive information between modalities and is also robust to incorrect modal information.
Bahdanau D , Cho K , Bengio Y , 2015 . Neural machine translation by jointly learning to align and translate . Proc 3 rd Int Conf on Learning Representations .
Bhardwaj V , Ben Othman MT , Kukreja V , et al. , 2022 . Automatic speech recognition (ASR) systems for children: a systematic literature review . Appl Sci , 12 ( 9 ): 4419 . https://doi.org/10.3390/app12094419 https://doi.org/10.3390/app12094419
Cheng XY , Xu WD , Chen KL , et al. , 2020 . SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check . Proc 58 th Annual Meeting of the Association for Computational Linguistics , p. 871 - 881 . https://doi.org/10.18653/v1/2020.acl-main.81 https://doi.org/10.18653/v1/2020.acl-main.81
Devlin J , Chang MW , Lee K , et al. , 2019 . BERT: pre-training of deep bidirectional Transformers for language understanding . Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , p. 4171 - 4186 . https://doi.org/10.18653/v1/N19-1423 https://doi.org/10.18653/v1/N19-1423
Guo Z , Ni Y , Wang KQ , et al. , 2021 . Global attention decoder for Chinese spelling error correction . Proc Findings of the Association for Computational Linguistics , p. 1419 - 1428 . https://doi.org/10.18653/v1/2021.findings-acl.122 https://doi.org/10.18653/v1/2021.findings-acl.122
He KM , Zhang XY , Ren SQ , et al. , 2016 . Deep residual learning for image recognition . Proc IEEE Conf on Computer Vision and Pattern Recognition , p. 770 - 778 . https://doi.org/10.1109/CVPR.2016.90 https://doi.org/10.1109/CVPR.2016.90
Hong YZ , Yu XG , He N , et al. , 2019 . FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm . Proc 5 th Workshop on Noisy User-Generated Text , p. 160 - 169 . https://doi.org/10.18653/V1/D19-5522 https://doi.org/10.18653/V1/D19-5522
Huang L , Li JJ , Jiang WW , et al. , 2021 . PHMOSpell: phonological and morphological knowledge guided Chinese spelling check . Proc 59 th Annual Meeting of the Association for Computational Linguistics and the 11 th Int Joint Conf on Natural Language Processing , p. 5958 - 5967 . https://doi.org/10.18653/v1/2021.acl-long.464 https://doi.org/10.18653/v1/2021.acl-long.464
Jin H , Zhang ZB , Yuan PP , 2022 . Improving Chinese word representation using four corners features . IEEE Trans Big Data , 8 ( 4 ): 982 - 993 . https://doi.org/10.1109/TBDATA.2021.3106582 https://doi.org/10.1109/TBDATA.2021.3106582
Kim G , Hong T , Yim M , et al. , 2022 . OCR-free document understanding Transformer . Proc 17 th European Conf on Computer Vision , p. 498 - 517 . https://doi.org/10.1007/978-3-031-19815-1_29 https://doi.org/10.1007/978-3-031-19815-1_29
Kipf TN , Welling M , 2017 . Semi-supervised classification with graph convolutional networks . Proc 5 th Int Conf on Learning Representations .
Li PJ , Shi SM , 2021 . Tail-to-tail non-autoregressive sequence prediction for Chinese grammatical error correction . Proc 59 th Annual Meeting of the Association for Computational Linguistics and the 11 th Int Joint Conf on Natural Language Processing , p. 4973 - 4984 . https://doi.org/10.18653/v1/2021.acl-long.385 https://doi.org/10.18653/v1/2021.acl-long.385
Li YH , Zhou QY , Li YN , et al. , 2022 . The past mistake is the future wisdom: error-driven contrastive probability optimization for Chinese spell checking . Proc Findings of the Association for Computational Linguistics , p. 3202 - 3213 . https://doi.org/10.18653/v1/2022.findings-acl.252 https://doi.org/10.18653/v1/2022.findings-acl.252
Liang ZH , Quan XJ , Wang QF , 2023 . Disentangled phonetic representation for Chinese spelling correction . Proc 61 st Annual Meeting of the Association for Computational Linguistics , p. 13509 - 13521 . https://doi.org/10.18653/v1/2023.acl-long.755 https://doi.org/10.18653/v1/2023.acl-long.755
Lin C , Miller T , Dligach D , et al. , 2019 . A BERT-based universal model for both within- and cross-sentence clinical temporal relation extraction . Proc 2 nd Clinical Natural Language Processing Workshop , p. 65 - 71 . https://doi.org/10.18653/v1/W19-1908 https://doi.org/10.18653/v1/W19-1908
Liu SL , Yang T , Yue TC , et al. , 2021 . PLOME: pre-training with misspelled knowledge for Chinese spelling correction . Proc 59 th Annual Meeting of the Association for Computational Linguistics and the 11 th Int Joint Conf on Natural Language Processing , p. 2991 - 3000 . https://doi.org/10.18653/v1/2021.acl-long.233 https://doi.org/10.18653/v1/2021.acl-long.233
Lv Q , Cao ZQ , Geng L , et al. , 2023 . General and domain-adaptive Chinese spelling check with error-consistent pretraining . ACM Trans Asian Low-Resour Lang Inform Process , 22 ( 5 ): 124 . https://doi.org/10.1145/3564271 https://doi.org/10.1145/3564271
Ma CS , Hu M , Peng JJ , et al. , 2023 . Improving Chinese spell checking with bidirectional LSTMs and confusionset-based decision network . Neur Comput Appl , 35 ( 21 ): 15679 - 15692 . https://doi.org/10.1007/s00521-023-08570-5 https://doi.org/10.1007/s00521-023-08570-5
Shen J , Pang RM , Weiss RJ , et al. , 2018 . Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions . Proc IEEE Int Conf on Acoustics, Speech and Signal Processing , p. 4779 - 4783 . https://doi.org/10.1109/ICASSP.2018.8461368 https://doi.org/10.1109/ICASSP.2018.8461368
Simonyan K , Zisserman A , 2014 . Very deep convolutional networks for large-scale image recognition . https://doi.org/10.48550/arXiv.1409.1556 https://doi.org/10.48550/arXiv.1409.1556
State Administration for Market Regulation (SAMR) , Standardization Administration of the People’s Republic of China (SAC) , 2022 . Information Technology - Chinese Coded Character Set, GB 18030-2022 . National Standards of People’s Republic of China (in Chinese) .
Sun ZJ , Li XY , Sun XF , et al. , 2021 . ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information . Proc 59 th Annual Meeting of the Association for Computational Linguistics and the 11 th Int Joint Conf on Natural Language Processing , p. 2065 - 2075 . https://doi.org/10.18653/v1/2021.acl-long.161 https://doi.org/10.18653/v1/2021.acl-long.161
Tseng YH , Lee LH , Chang LP , et al. , 2015 . Introduction to SIGHAN 2015 Bake-off for Chinese spelling check . Proc 8 th SIGHAN Workshop on Chinese Language Processing , p. 32 - 37 . https://doi.org/10.18653/v1/W15-3106 https://doi.org/10.18653/v1/W15-3106
Vaswani A , Shazeer N , Parmar N , et al. , 2017 . Attention is all you need . Proc 31 st Int Conf on Neural Information Processing Systems , p. 6000 - 6010 .
Wang DM , Song Y , Li J , et al. , 2018 . A hybrid approach to automatic corpus generation for Chinese spelling check . Proc Conf on Empirical Methods in Natural Language Processing , p. 2517 - 2527 . https://doi.org/10.18653/v1/D18-1273 https://doi.org/10.18653/v1/D18-1273
Weigang L , Marinho MC , Li DL , et al. , 2024 . Six-writings multimodal processing with pictophonetic coding to enhance Chinese language models . Front Inform Technol Electron Eng , 25 ( 1 ): 84 - 105 . https://doi.org/10.1631/FITEE.2300384 https://doi.org/10.1631/FITEE.2300384
Wu SH , Liu CL , Lee LH , 2013 . Chinese spelling check evaluation at SIGHAN Bake-off 2013 . Proc 7 th SIGHAN Workshop on Chinese Language Processing , p. 35 - 42 .
Xie ZK , Sato I , Sugiyama M , 2020 . Stable weight decay regularization . https://arxiv.org/abs/2011.11152v2 https://arxiv.org/abs/2011.11152v2
Xu HD , Li ZL , Zhou QY , et al. , 2021 . Read, listen, and see: leveraging multimodal information helps Chinese spell checking . Proc Findings of the Association for Computational Linguistics , p. 716 - 728 . https://doi.org/10.18653/v1/2021.findings-acl.64 https://doi.org/10.18653/v1/2021.findings-acl.64
Yang HY , 2023 . Block the label and noise: an n -gram masked speller for Chinese spell checking . https://arxiv.org/abs/2305.03314 https://arxiv.org/abs/2305.03314
Yang SJ , Yu L , 2022 . CoSPA: an improved masked language model with copy mechanism for Chinese spelling correction . Proc 38 th Conf on Uncertainty in Artificial Intelligence , p. 2225 - 2234 .
Yang W , Xie YQ , Lin A , et al. , 2019 . End-to-end open-domain question answering with BERTserini . Proc Conf of the North American Chapter of the Association for Computational Linguistics , p. 72 - 77 . https://doi.org/10.18653/v1/N19-4013 https://doi.org/10.18653/v1/N19-4013
Yu LC , Lee LH , Tseng YH , et al. , 2014 . Overview of SIGHAN 2014 bake-off for Chinese spelling check . Proc 3 rd CIPS-SIGH AN Joint Conf on Chinese Language Processing , p. 126 - 132 . https://doi.org/10.3115/v1/W14-6820 https://doi.org/10.3115/v1/W14-6820
Zhang D , Li YH , Zhou QY , et al. , 2023 . Contextual similarity is more valuable than character similarity: an empirical study for Chinese spell checking . Proc IEEE Int Conf on Acoustics, Speech and Signal Processing , p. 1 - 5 . https://doi.org/10.1109/ICASSP49357.2023.10095675 https://doi.org/10.1109/ICASSP49357.2023.10095675
Zhang RQ , Pang C , Zhang CQ , et al. , 2021 . Correcting Chinese spelling errors with phonetic pre-training . Proc Findings of the Association for Computational Linguistics , p. 2250 - 2261 . https://doi.org/10.18653/v1/2021.findings-acl.198 https://doi.org/10.18653/v1/2021.findings-acl.198
Zhang SH , Huang HR , Liu JC , et al. , 2020 . Spelling error correction with soft-masked BERT . Proc 58 th Annual Meeting of the Association for Computational Linguistics , p. 882 - 890 . https://doi.org/10.18653/v1/2020.acl-main.82 https://doi.org/10.18653/v1/2020.acl-main.82
Zhu CX , Ying ZQ , Zhang BY , et al. , 2022 . MDCSpell: a multi-task detector-corrector framework for Chinese spelling correction . Proc Findings of the Association for Computational Linguistics , p. 1244 - 1253 . https://doi.org/10.18653/v1/2022.findings-acl.98 https://doi.org/10.18653/v1/2022.findings-acl.98
关联资源
相关文章
相关作者
相关机构