引用本文: |
-
钟维幸,王海荣,王栋,车淼.多模态语义协同交互的图文联合命名实体识别方法[J].广西科学,2022,29(4):681-690. [点击复制]
- ZHONG Weixing,WANG Hairong,WANG Dong,CHE Miao.Image-Text Joint Named Entity Recognition Method Based on Multi-modal Semantic Interaction[J].Guangxi Sciences,2022,29(4):681-690. [点击复制]
|
|
摘要: |
针对现有多模态命名实体识别(Multimodal Named Entity Recognition,MNER)研究中存在的噪声影响和图文语义融合不足问题,本文提出一个多模态语义协同交互的图文联合命名实体识别(Image-Text Joint Named Entity Recognition,ITJNER)模型。ITJNER模型加入图像描述作为额外特征丰富了多模态特征表示,图像描述可以帮助过滤掉从图像特征中引入的噪声并以文本形式总结图像语义信息;还构建了多模态协同交互的多模态语义融合模型,可以加强多模态信息融合,并减少图像信息的语义偏差。在Twitter-2015和Twitter-2017数据集上进行方法实验,分析实验结果并与AdaCAN、UMT、UMGF、Object-AGBAN等方法进行对比。相较于对比方法中的最优方法UMGF,本方法在Twitter-2017数据集上的准确率、召回率、F1值分别提高了0.67%、2.26%、0.93%;在Twitter-2015数据集上,召回率提高了0.19%。实验结果验证了本方法的有效性。 |
关键词: 多模态命名实体识别 图文数据 多模态注意力 图像描述 语义融合< |
DOI:10.13656/j.cnki.gxkx.20220919.008 |
投稿时间:2022-03-24 |
基金项目:宁夏自然科学基金项目(2020AAC03218),北方民族大学校级科研项目(2021XYZJK06)和北方民族大学研究生创新项目(YCX21092)资助。 |
|
Image-Text Joint Named Entity Recognition Method Based on Multi-modal Semantic Interaction |
ZHONG Weixing, WANG Hairong, WANG Dong, CHE Miao
|
(School of Computer Science and Engineering, North Minzu University, Yinchuan, Ningxia, 750021, China) |
Abstract: |
To solve the problem of noise impact and insufficient image-text semantic fusion in existing Multimodal Named Entity Recognition (MNER) research, this article proposes an Image-Text Joint Named Entity Recognition (ITJNER) model with multi-modal semantic interaction.The ITJNER model adds image description as an additional feature to enrich the multi-modal feature representation.The image description can help filter out the noise introduced from image features and summarize image semantic information in text form.A multi-modal semantic fusion model of multi-modal collaborative interaction is also constructed, which can enhance the multi-modal information fusion and reduce the semantic deviation of image information.Method experiments were performed on the Twitter-2015 and Twitter-2017 datasets, and the results were analyzed and compared with AdaCAN, UMT, UMGF, Object-AGBAN and other methods.Compared with UMGF method, which showed the optimal result in the above methods, the accuracy, recall and F1 value of this method on Twitter-2017 dataset increased by 0.67%, 2.26% and 0.93%, respectively.On the Twitter-2015 dataset, the recall rate increased by 0.19%.The experimental results verify the effectiveness of this method. |
Key words: multi-modal named entity recognition image-text data multi-modal attention image description the semantic integration |