Electronic Science and Technology ›› 2024, Vol. 37 ›› Issue (10): 48-54.doi: 10.16180/j.cnki.issn1007-7820.2024.10.007

Previous Articles     Next Articles

Character Enhancement Based on Named Entity Recognition for Industrial Equipment Faults

ZHANG Yang, LIU Jin   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science,Shanghai 201600, China
  • Received:2023-03-02 Online:2024-10-15 Published:2024-11-04
  • Supported by:
    National Natural Science Foundation of China(U1831133);Shanghai Science and Technology Commission Science and Technology Innovation Action Plan(22S31903700);Shanghai Science and Technology Commission Science and Technology Innovation Action Plan(21S31904200)

Abstract:

To address the issues of sparse training data, complex entity structures, and uneven entity distribution in the industrial equipment failure domain, this study constructs an industrial equipment failure named entity recognition corpus. Due to the difficulty of character-level named entity recognition models in representing the professional vocabulary information in the field of industrial equipment failure, this study proposes a character-enhanced industrial equipment failure named entity recognition model to address this problem. In the embedding layer, professional vocabulary information is directly fused between the Transformer layers of RoBERTa-WWM (Robustly Optimized BERT Pretraining Approach with Whole Word Masking) to allocate word information to each of its constituent characters for enhanced semantics. The global semantic information is obtained through a BiLSTM(Bidirectional Long Short-Term Memory), and the CRF(Conditional Random Field) is used to learn the dependency relationship between adjacent labels to obtain the optimal sentence-level label sequence. Experimental results demonstrate that the proposed model has good performance on industrial equipment fault named entity recognition tasks, with an average F1 score of 92.403%.

Key words: industrial equipment failure, corpus, named entity recognition, RoBERTa-WWM, professional word embedding, BiLSTM, CRF, deep learning

CLC Number: 

  • TP183