Electronic Science and Technology ›› 2023, Vol. 36 ›› Issue (12): 72-78.doi: 10.16180/j.cnki.issn1007-7820.2023.12.010

Previous Articles     Next Articles

Research on Generating News Text Summarization Based on Improved T5 PEGASUS Model

ZHANG Qi,FAN Yongsheng   

  1. School of Computer and Information Science,Chongqing Normal University,Chongqing 401331,China
  • Received:2022-08-10 Online:2023-12-15 Published:2023-12-05
  • Supported by:
    Humanities and Social Science Research Project of Ministry of Education(18XJC880002);Science and Technology Project of Chongqing Education Commission(KJQN201800539);Chongqing Normal University (Talent Introduction/Doctoral Program) Foundation Project(17XCB008)

Abstract:

The task of generating news text summarizations aims to solve the problems of wasting time and reading fatigue caused by users' inability to quickly grasp the key points of the content when reading news. At present, the best text summarization model for Chinese is the T5 PEGASUS model, but there are few researches on this model. In this study, the Chinese word segmentation of the T5 PEGASUS model is improved, and the Pkuseg word segmentation method, which is more suitable for news field, is used for processing, and its effectiveness is verified on three public datasets with different news lengths: NLPCC2017, LCSTS and SogouCS. It is found that the Pkuseg method is more suitable for the T5 PEGASUS model. The ROUGE value of T5 Pegasus model generated summaries is positively correlated with the length of news text, and the loss value of training set and the decline speed of loss value are negatively correlated with the length of news text. In the face of a small number of training sets, the model can get a high ROUGE score, so the model has a strong few-shot learning ability.

Key words: text summarization, generative model, T5 PEGASUS, news text, Chinese word segmentation, Pkuseg, few-shot learning, ROUGE

CLC Number: 

  • TP391.1