Electronic Science and Technology ›› 2024, Vol. 37 ›› Issue (7): 16-24.doi: 10.16180/j.cnki.issn1007-7820.2024.07.003

Previous Articles     Next Articles

Automatic Summarization of Small Samples Based on Enhanced Regularization

LI Qing, WAN Weibing   

  1. School of Electronic and Electrical Engineering,Shanghai University of Engineering Science, Shanghai 201620,China
  • Received:2023-02-04 Online:2024-07-15 Published:2024-07-17
  • Supported by:
    Scientific and Technological Innovation 2030-Major Project of New Generation Artificial Intelligence(2020AAA0109300)

Abstract:

Automatic text summarization aims to extract the main statements from text information for the purpose of compressing information. Existing generative automatic summarization methods do not take full advantage of the pre-trained model to learn the semantics of the original text, resulting in the loss of important information in the generated content, when the data set with a small number of samples is often prone to overfitting. In order to solve such problems and obtain better fine-tuning performance, the pre-trained model mT5(multilingual T5) is used as a baseline to improve the learning ability of the model by combining R-drop(Regularized dropout) with reinforced regularity for model fine-tuning, and Sparse softmax is used to reduce the ambiguity of prediction generation to ensure the accuracy of the output. The model calculates BLEU(Bilingual Evaluation Understudy) for hyperparameter test on Chinese data sets LCSTS and CSL, and uses Rouge as evaluation index to evaluate data sets of different orders of magnitude. The experimental results show that the optimized pre-trained model can better learn the semantic representation of the original text, and the model can maintain a good fit in the small samples and generate more practical results.

Key words: automatic text summarization, text generation, pre-trained model, small sample data, reinforced regularity, sparse output, semantic representation learning, mT5

CLC Number: 

  • TP391.1