Long Text Classification for Web News Based on Enhanced Language Representation Model

doi:10.16389/j.cnki.cn42-1737/n.2024.04.004

Journal of Jianghan University (Natural Science Edition) ›› 2024, Vol. 52 ›› Issue (4): 37-44.doi: 10.16389/j.cnki.cn42-1737/n.2024.04.004

Long Text Classification for Web News Based on Enhanced Language Representation Model

XU Nanxi，KE Yuanyuan，HU Xiaoli^*

School of Artificial Intelligence，Jianghan University，Wuhan 430056，Hubei，China

Published:2024-09-29
Contact: HU Xiaoli

Abstract

Abstract: Based on the real- time news content data of the Internet，the author classified the news topic of a time-limited Chinese long text data set. The segmentation scheme of annual keyword enhancement was used to improve the segmentation accuracy. In addition， the author adopted a long text compression method to process the special data of Chinese long text. The specific method was to select key sentences，and extract the keywords in long text using the TF-IDF algorithm，then carry out word vector training on the combined new text. Finally，the author used an enhanced language representation model to classify news topics and compared them with six machine learning and deep learning models，including recall rate，accuracy，precision，and F1 score. The experimental results show that the model can effectively classify long text in real- time news by extracting 16 important words.

Key words: ERNIE model, pretraining model, news classification, long text processing, Chinese text

CLC Number:

TP391.1

XU Nanxi，KE Yuanyuan，HU Xiaoli. Long Text Classification for Web News Based on Enhanced Language Representation Model[J]. Journal of Jianghan University (Natural Science Edition), 2024, 52(4): 37-44.

[1]	CHENG Yuan-bin. Encoding Scheme Based on Words [J]. Journal of Jianghan University(Natural Science Edition), 2013, 41(2): 47-52.
[2]	SUN Min，LI Yang，ZHUANG Zhengfei，QIAN Tao. Sentiment Analysis Based on BGRU and Self-Attention Mechanism [J]. Journal of Jianghan University (Natural Science Edition), 2020, 48(4): 80-89.
[3]	WANG Xiaopeng，LI Dan. Entity Alignment Relation-aware Neighborhood Matching Model Combining Attribute Information and Dual Attention [J]. Journal of Jianghan University (Natural Science Edition), 2022, 50(4): 75-86.

Long Text Classification for Web News Based on Enhanced Language Representation Model

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

Comments