广西科学

引用本文：

张俸玺,吴丞楚,张运泽,董洛兵.基于改进损失函数的实体类别平衡优化算法[J].广西科学,2023,30(1):100-105. [点击复制]
ZHANG Fengxi,WU Chengchu,ZHANG Yunze,DONG Luobing.Entity Category Balance Optimization Algorithm Based on Improved Loss Function[J].Guangxi Sciences,2023,30(1):100-105. [点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 672次下载 825次	码上扫一扫！
基于改进损失函数的实体类别平衡优化算法
张俸玺¹, 吴丞楚¹, 张运泽¹, 董洛兵²
0 字体:加大+\|默认\|缩小-
(1.西安电子科技大学通信工程学院, 陕西西安 710071;2.西安电子科技大学计算机科学与技术学院, 陕西西安 710071)

摘要:

针对自然语言处理(Natural Language Processing，NLP)任务中，命名实体识别(Named Entity Recognition,NER)存在实体类别样本不平衡的问题，提出一种基于改进损失函数的实体类别平衡优化算法。新算法是对神经网络模型中的损失函数进行优化处理,通过分析命名实体识别数据特点，在平衡正负样本的基础上引入平滑系数和权重系数，保证模型在梯度传递的过程更关注于实体类别较少和带有嵌套的难识别样本，同时减少对样本数较多的、易识别样本的关注。利用公共数据集ACE05、MSRA进行实验对比，结果表明改进的损失函数在数据集ACE05和MSRA上，F1值分别提高1.53%和0.91%。上述结果表明改进的损失函数能够较好地缓解实体中正负难易样本的不平衡。

DOI：10.13656/j.cnki.gxkx.20230308.011

基金项目:国家级大学生创新创业训练计划项目(202110701085)资助。

Entity Category Balance Optimization Algorithm Based on Improved Loss Function

ZHANG Fengxi¹, WU Chengchu¹, ZHANG Yunze¹, DONG Luobing²

(1.School of Telecommunications Engineering, Xidian University, Xi'an, Shaanxi, 710071, China;2.College of Computer Science & Technology, Xidian University, Xi'an, Shaanxi, 710071, China)

Abstract:

Aiming at the problem of unbalanced entity category samples in Named Entity Recognition (NER) in Natural Language Processing (NLP) tasks,an entity category balance optimization algorithm based on improved loss function is proposed.The new algorithm is to optimize the loss function in the neural network model.By analyzing the characteristics of named entity recognition data,the smoothing coefficient and the weight coefficient are introduced on the basis of balancing the positive and negative samples to ensure that the model pays more attention to the difficult recognition samples with fewer entity categories and nesting in the process of gradient transfer,while reducing the focus on easy-to-identify samples with more samples.Using the public datasets ACE05 and MSRA for experimental comparison,the results show that the improved loss function is on the data sets ACE05 and MSRA,and F1 value increases by 1.53% and 0.91%,respectively.The above results show that the improved loss function can better alleviate the imbalance of positive and negative difficult and easy samples in the entity.

用微信扫一扫