江汉大学学报(自然科学版) ›› 2024, Vol. 52 ›› Issue (3): 74-86.doi: 10.16389/j.cnki.cn42-1737/n.2024.03.008

• 计算机科学与技术 • 上一篇    

基于投票网络解决样本非均衡的入侵检测识别模型

李 熙1,梅 倩*2,陶 洁1,余嘉伟1,冯常奇1   

  1. 1. 武汉船舶职业技术学院,湖北 武汉 430050;2. 湖北教育出版社,湖北 武汉 430070
  • 发布日期:2024-06-17
  • 通讯作者: 梅 倩
  • 作者简介:李 熙(1987— ),男,高级工程师,硕士,研究方向:自然语言处理 、计算机神经网络攻击与防御、多模态目标识别。
  • 基金资助:
    “新基建”视角下高职院校工科专业信息技术公共基础课程建设研究项目(2021-AFCEC-093)

Voting-based Framework for Auto Cyber Intrusion Detection System in Imbalanced Dataset Environment

LI Xi1,MEI Qian*2,TAO Jie1,YU Jiawei1,FENG Changqi1   

  1. 1. Wuhan Institute of Shipbuilding Technology,Wuhan 430050,Hubei,China;2. Hubei Education Press,Wuhan 430070,Hubei,China
  • Published:2024-06-17
  • Contact: MEI Qian

摘要: 目前主流入侵检测系统通过学习人工标识的网络流量数据获得自动检测未知威胁的能 力,但人工标识的数据出现偏差、缺失、小类样本过少等现象时,本应是攻击样本常会被认定为无 害样本,致使入侵检测系统失效。大多数研究入侵检测的工作将整体性能作为检测性能的量化 标准,而忽视了入侵检测的原始初衷,导致警告系统遭受攻击。针对以上问题,提出基于投票网 络的智能识别模型来解决入侵检测系统训练数据不均衡的问题。通过可训练的投票模型,整合 了传统机器学习模型与深度学习模型,在关注整体性能的同时,提升致命攻击的被检出率。实验 结果显示,本模型在 3 种不同样本分布类型的数据集上均有较好的整体表现,并且有效地提高了 小类别的检出率。

关键词: 入侵检测, 网络攻击识别, 不均衡样本数据集, 深度学习, 机器学习

Abstract: Modern cyber attack intrusion detection systems apply network flows with artificial labels to build the ability to detect potential threats automatically. Errors,sample insufficiency,and lack of essential features in artificial labeling would severely restrict the system's capability. It is a fatal flaw that the system could not discern attacking samples from benign samples. Most researchers regard the overall performance measurements as the benchmarks for intrusion detection systems while omitting what they are. It was created to warn people about dangerous network attacks. Hence,the article proposed a voting-based framework for an auto cyber intrusion detection system in an imbalanced dataset environment. Based on the trainable voting network,the framework integrated machine learning techniques and deep learning techniques to solve the problem of imbalanced datasets. The article focused on increasing the precision of fatal attack detection without compromising the system's overall performance. The experimental results suggest that the proposed model runs stable and well overall in these different datasets,and the model promotes the detection rate of the minority class effectively.

Key words: intrusion detection, cyber attack recognition, imbalanced sample dataset, deep learning, machine learning

中图分类号: