电子科技 ›› 2024, Vol. 37 ›› Issue (9): 87-94.doi: 10.16180/j.cnki.issn1007-7820.2024.09.013

• • 上一篇    

短文本新闻标题生成方法

赵明   

  1. 福建省龙岩市新罗区教育局 网络中心,福建 龙岩 364099
  • 收稿日期:2023-01-08 出版日期:2024-09-15 发布日期:2024-09-20
  • 作者简介:赵明(1976-),男,工程师。研究方向:信息技术。
  • 基金资助:
    国家重点研发计划(2022YFF0903404)

Researchon Short Text News Title Generation Method

ZHAO Ming   

  1. Network Center,Fujian Longyan Xinluo District Education Bureau,Longyan 364099,China
  • Received:2023-01-08 Online:2024-09-15 Published:2024-09-20
  • Supported by:
    National Key R&D Program of China(2022YFF0903404)

摘要:

当今新闻具有文本短、发布频繁、时效性强等特点,一个媒体账号一天内发布数十条新闻。为大量新闻制定适用且有吸引力的标题已经成为媒体工作者的一项主要工作内容。媒体工作者需要一个自动生成短文本标题的系统来缓解工作压力。为解决该问题,文中提出了一种短文本新闻标题生成模型。该模型采用序列到序列结构,在编码器和解码器分别应用预训练语言模型和分层自注意力解码器。为了使生成标题包含原始新闻的关键信息,提出一种基于LCSTS数据集和Weibo4数据集的分阶段训练方法,并使模型分别从这两个数据集学习提取关键新闻信息和构建风格化表达,使模型生成标题能够准确表达新闻的核心内容从而吸引读者。

关键词: 新闻标题生成, 预训练语言模型, 分层自注意力解码器, 编码器, 文本提取, 文本生成

Abstract:

Today's news has the characteristics of short text, frequent release, timeliness, etc. A media account releases dozens of news in a day. Developing suitable and attractive headlines for large volumes of news has become a major part of the work of media workers. Media workers need a system that automatically generates short text headlines to relieve their stress. To solve this problem, this study proposes a short text news title generation model. The model adopts sequence-to-sequence structure, using pre-trained language model and layered self-attention decoder in encoder and decoder respectively. In order to make the generated headlines contain the key information of the original news, a staged training method based on LCSTS data set and Weibo4 data set is proposed, and the model learns to extract the key news information and construct a stylized expression from the two data sets respectively, so that the generated headlines can accurately express the core content of the news and attract readers.

Key words: news headline generation, pre-training language model,, layered self-attention decoder, encoder, text extraction, text generation

中图分类号: 

  • TP391