  • 余俊晖,陈艳平,秦永彬,黄辉.基于机器阅读理解的中文司法实体识别优化策略研究[J].广西科学,2023,30(1):27-34.    [点击复制]
  • YU Junhui,CHEN Yanping,QIN Yongbin,HUANG Hui.Research on Optimization Strategy of Chinese Judicial Entity Recognition Based on Machine Reading Comprehension[J].Guangxi Sciences,2023,30(1):27-34.   [点击复制]
余俊晖1,2, 陈艳平1,2, 秦永彬1,2, 黄辉1,2
(1.公共大数据国家重点实验室, 贵州贵阳 550025;2.贵州大学计算机科学与技术学院, 贵州贵阳 550025)
关键词:  司法信息抽取|预训练|自注意力机制|标签语义|对抗训练
Research on Optimization Strategy of Chinese Judicial Entity Recognition Based on Machine Reading Comprehension
YU Junhui1,2, CHEN Yanping1,2, QIN Yongbin1,2, HUANG Hui1,2
(1.State Key Laboratory of Public Big Data, Guiyang, Guizhou, 550025, China;2.College of Computer Science and Technology, Guizhou University, Guiyang, Guizhou, 550025, China)
Aiming at the problems that the entities in the Chinese judicial information extraction dataset are highly professional,the existing Machine Reading Comprehension (MRC) model cannot provide sufficient label semantics by constructing questions and performs poorly on noise samples,a joint optimization strategy is proposed in this study.Firstly,a judicial domain dictionary is constructed by aggregating entities that appear many times in the judicial corpus,and professional entity knowledge is injected into the RoBERTa-wwm pre-training language model for pre-training.Then,the entity label semantics are integrated into the sentence representation by distinguishing the importance of each word to different label words based on the self-attention mechanism.Finally,in the fine-tuning stage,the adversarial training algorithm is used to optimize the model to enhance the robustness and generalization ability of the model.The experimental results on the 2021 China Legal Intelligence Evaluation (CAIL2021) judicial information extraction dataset show that compared with the baseline model,the F1 value of this research method is increased by 2.79%.And the model in the CAIL2021 judicial information extraction track won the national third prize,which verified the effectiveness of the joint optimization strategy.
Key words:  judicial information extraction|pre-training|self-attention mechanism|label semantics|adversarial training

