Electronic Science and Technology ›› 2023, Vol. 36 ›› Issue (3): 7-13.doi: 10.16180/j.cnki.issn1007-7820.2023.03.002
Previous Articles Next Articles
HE Chuanpeng,YIN Ling,HUANG Bo,WANG Mingsheng,GUO Ruyan,ZHANG Shuai,JU Jiaji
Received:
2021-08-21
Online:
2023-03-15
Published:
2023-03-16
Supported by:
CLC Number:
HE Chuanpeng,YIN Ling,HUANG Bo,WANG Mingsheng,GUO Ruyan,ZHANG Shuai,JU Jiaji. Text Keyword Extraction Method Based on BERT and LightGBM[J].Electronic Science and Technology, 2023, 36(3): 7-13.
Table 1.
Comparison results of multi-type keywords extraction methods"
模型 | 参数 | N=3 | N=4 | N=5 | N=6 | |
---|---|---|---|---|---|---|
Text Rank | P | 0.711 | 0.701 | 0.688 | 0.661 | |
R | 0.694 | 0.685 | 0.674 | 0.642 | ||
F1 | 0.702 | 0.693 | 0.681 | 0.652 | ||
LDA | P | 0.772 | 0.764 | 0.742 | 0.722 | |
R | 0.756 | 0.746 | 0.731 | 0.701 | ||
F1 | 0.764 | 0.755 | 0.734 | 0.711 | ||
LightGBM | P | 0.812 | 0.802 | 0.786 | 0.756 | |
R | 0.801 | 0.795 | 0.771 | 0.745 | ||
F1 | 0.807 | 0.797 | 0.778 | 0.751 | ||
LB-LightGBM | P | 0.852 | 0.833 | 0.813 | 0.791 | |
R | 0.843 | 0.825 | 0.804 | 0.786 | ||
F1 | 0.848 | 0.829 | 0.808 | 0.788 |
[1] | 王俊玲. 改进TextRank的文本关键词提取算法[J]. 软件导刊, 2021, 20(4):49-52. |
Wang Junling. Text keyword extraction algorithm based on improved TextRank[J]. Software Guide, 2021, 20(4):49-52. | |
[2] | 詹飞, 朱艳辉, 梁文桐, 等. 基于BERT和TextRank关键词提取的实体链接方法[J]. 湖南工业大学学报, 2020, 34(4): 63-70. |
Zhan Fei, Zhu Yanhui, Liang Wentong, et al. Entity linking via BERT and TextRank keyword extraction[J]. Journal of Hunan University of Technology, 2020, 34(4):63-70. | |
[3] | 王成柱, 魏银珍. 语义相似度领域基于XGBOOST算法的关键词自动抽取方法[J]. 计算机与数字工程, 2020, 48(6): 1300-1303. |
Wang Chengzhu, Wei Yinzhen. Automatic keyword extraction method based on XGBOOST algorithm in semantic similarity domain[J]. Computer and Digital Engineering, 2020, 48(6):1300-1303. | |
[4] |
祖弦, 谢飞, 刘啸剑. 融合词和文档嵌入的关键词抽取算法[J]. 计算机科学与探索, 2021, 15(2):294-302.
doi: 10.3778/j.issn.1673-9418.2003022 |
Zu Xian, Xie Fei, Liu Xiaojian. Keyphrase extraction combining word and document embeddings[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 294-302.
doi: 10.3778/j.issn.1673-9418.2003022 |
|
[5] | 陈芬. 基于Word2Vec与TextRank的关键词抽取研究[D]. 武汉: 华中师范大学, 2020. |
Chen Fen. Keywords extraction based on Word2Vec and TextRank[D]. Wuhan: Central China Normal University, 2020. | |
[6] | 李俊, 吕学强. 融合BERT语义加权与网络图的关键词抽取方法[J]. 计算机工程, 2020, 46(9):89-93. |
Li Jun, Lü Xueqiang. Keyword extraction method based on BERT semantic weighting and network graph[J]. Computer Engineering, 2020, 46(9):89-93. | |
[7] | Fanani A M, Suyanto S. Syllabification model of indonesian language named-Entity using syntactic n-gram[J]. Procedia Computer Science, 2021, 17(9):721-727. |
[8] | 张伟, 石倩, 何霄, 等. 改进的TF-IDF算法在文本分类中的研究[J]. 信息技术与网络安全, 2021, 40(7):72-76. |
Zhang Wei, Shi Qian, He Xiao, et al. Research on improved TF-IDF algorithm in text classification[J]. Information Technology and Network Security, 2021, 40(7):72-76. | |
[9] | 姚兆旭, 马静. 面向微博话题的“主题+观点”词条抽取算法研究[J]. 数据分析与知识发现, 2016(7):78-86. |
Yao Zhaoxu, Ma Jing. Extracting topic and opinion from microblog posts with new algorithm[J]. Data Analysis and Knowledge Discovery, 2016(7):78-86. | |
[10] |
Wang Z H, Wang D, Li Q. Keyword extraction from scientific research projects based on SRP-TF-IDF[J]. Chinese Journal of Electronics, 2021, 30(4):652-657.
doi: 10.1049/cje2.v30.4 |
[11] | Jeong S, Kang Y, Lee J, et al. Variational embedding of a hidden Markov model to generate human activity sequences[J]. Transportation Reasearch Part C:Emerging Technologies, 2021, 13(1):1-22. |
[12] | 李航. 统计学习方法[M]. 2版. 北京: 清华大学出版社, 2019. |
Li Hang. Statistical learning methods[M]. 2nd ed. Beijing: Tsinghua University Press, 2019. | |
[13] | 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. |
Zhou Zhihua. Machine Learning[M]. Beijing: Tsinghua University Press, 2016. | |
[14] |
Ji J, Wang H Y, Song S S, et al. Sentiment analysis of comments of wooden furniture based on naïve Bayesian model[J]. Progress in Artificial Intelligence, 2021, 10(1):23-35.
doi: 10.1007/s13748-020-00221-3 |
[15] | 罗恺, 袁晓东. 基于LDA主题模型与社会网络的专利技术融合趋势研究——以关节机器人为例[J]. 情报杂志, 2021, 40(3):89-97. |
Luo Kai, Yuan Xiaodong. A study on the technology convergence trend of patent based on LDA and social network-An example of joint robot[J]. Journal of Intelligence, 2021, 40(3):89-97. | |
[16] | 刘艳文, 魏赟. 基于LDA主题模型的情感分析研究[J]. 电子科技, 2020, 33(7):12-16. |
Liu Yanwen, Wei Yun. Research on emotional analysis based on LDA topic model[J]. Electronic Science and Technology, 2020, 33(7):12-16. | |
[17] | Zhang Z, Wu S, Jiang D, et al. BERT-JAM: Maximizing the utilization of BER for neural machine translation[J]. Neurocomputing, 2021, 46(5):84-94. |
[18] | Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]. Long Beach: Proceedings of the Thirty-first Conference on Neural Information Processing Systems, 2017. |
[19] | Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[EB/OL].(2018-10-11) [2019-06-02]. https://arxiv.org/abs/1810.04805. |
[20] | 卢佳伟, 陈玮, 尹钟. 融合TextRank算法的中文短文本相似度计算[J]. 电子科技, 2020, 33(10):51-56. |
Lu Jiawei, Chen Wei, Yin Zhong. Chinese short text similarity calculation based on TextRank algorithm[J]. Electronic Science and Technology, 2020, 33(10):51-56. | |
[21] | 诸葛越, 江云胜. 百面深度学习:算法工程师带你去面试[M]. 北京: 人民邮电出版社, 2020. |
Zhuge Yue, Jiang Yunsheng. Hundred-faced deep learning: An algorithm engineer takes you to an interview[M]. Beijing: People Post Press, 2020. | |
[22] | 刘昌澍, 李响, 詹瑾瑜, 等. 基于TextCNN和LightGBM的导游违规行为检测[J]. 计算机技术与发展, 2021, 31(5):143-149. |
Liu Changshu, Li Xiang, Zhan Jinyu, et al. Illegal tour guide behavior detection based on TextCNN and LightGBM[J]. Computer Technology and Development, 2021, 31(5):143-149. |
[1] | ZHANG Manjie,YANG Fangyan,JI Yunfeng. Research Progress of Body Posture Estimation in Ball Games [J]. Electronic Science and Technology, 2023, 36(1): 28-37. |
[2] | WANG Yumei,ZHENG Yi. Harmonic Detection Technology Based on Improved Wavelet Threshold Denoising and CEEMDAN-HT Fusion [J]. Electronic Science and Technology, 2023, 36(1): 60-66. |
[3] | ZHANG Qiaomu,ZHONG Qianwen,SUN Ming,LUO Wencheng,CHAI Xiaodong. Research on Dynamic Monitoring Method of Pantograph-Net Contact Position in Complex Environment [J]. Electronic Science and Technology, 2022, 35(8): 66-72. |
[4] | TONG Xiaosen,YANG Jinxian. Drilling Tool Acceleration Denoising Based on GRNN Network Adaptive Filtering [J]. Electronic Science and Technology, 2022, 35(7): 46-51. |
[5] | SUN Kang,XUAN Xuyang,LIU Penghui,ZHAO Laijun,LONG Jie. Partial Discharge Pattern Recognition of Cable Based on CNN-DCGAN under Small Data [J]. Electronic Science and Technology, 2022, 35(7): 7-13. |
[6] | Shanghong LU,Wenguo LI. The Point Cloud Coarse Registration Method Based on Boundary Centroid [J]. Electronic Science and Technology, 2022, 35(4): 53-59. |
[7] | Qingqing YUAN,Min JIANG,Yumei YANG. Characteristic Signal Extraction of Non-Ideal Three-Phase Grid [J]. Electronic Science and Technology, 2022, 35(4): 78-86. |
[8] | WU Weijia,YANG Jian,YUAN Tianchen,SHAO Zhihui. Research on Track Structure Damage Identification Based on Support Vector Machine [J]. Electronic Science and Technology, 2022, 35(2): 27-33. |
[9] | LI Hui,WANG Yicheng. CNNCIFG-Attention Model for Text Sentiment Classifcation [J]. Electronic Science and Technology, 2022, 35(2): 46-51. |
[10] | SHAO Zhihui,YANG Jian,YUAN Tianchen,WU Weijia. Sleeper Diseases Diagnosis Based on Permutation Entropy and Support Vector Machine [J]. Electronic Science and Technology, 2022, 35(2): 52-58. |
[11] | LI Yipei,WANG Yuxiang. Knowledge Graph Query Method Based on Geographic Location Information [J]. Electronic Science and Technology, 2022, 35(12): 17-25. |
[12] | SU Bo,CHAI Ziqiang,WANG Li,CUI Shuaihua. Eight-Section Brocade Sequence Action Recognition and Evaluation Based on Pose Estimation [J]. Electronic Science and Technology, 2022, 35(12): 84-90. |
[13] | HU Xueruobai,HUANG Jie,WANG Jiantao,LI Yiming. Link Prediction of Knowledge Graph Based on Gaussian Hierarchy-Aware [J]. Electronic Science and Technology, 2022, 35(12): 91-96. |
[14] | ZUO Wencheng,ZHAO Ziwen,XU Zhijiang,TAN Kangbo. Analysis and Research of Electromagnetic Environment in Space Station Cabin Based on 5G Communication [J]. Electronic Science and Technology, 2022, 35(10): 1-7. |
[15] | SI Mingming,CHEN Wei,HU Chunyan,YIN Zhong. Fundus Blood Vessel Image Segmentation Combining Resnet50 and U-Net [J]. Electronic Science and Technology, 2021, 34(8): 19-24. |
|