网络空间安全命名实体识别综述

高宸; 张璇; 韩梦婷; 刘会

doi:10.1631/FITEE.2000286

Your Location：

Home >

Browse articles >

网络空间安全命名实体识别综述

常规文章 | Updated：2022-05-19

- 网络空间安全命名实体识别综述
- A review on cyber security named entity recognition
- 信息与电子工程前沿（英文） 2021年22卷第9期页码：1153-1168
- Affiliations：
  
  School of Software, Yunnan University, Kunming 650091, China
  Key Laboratory of Software Engineering of Yunnan Province, Kunming 650091, China
  Engineering Research Center of Cyberspace, Kunming 650091, China
- Author bio：
  
  Xuan ZHANG, E-mail: zhxuan@ynu.edu.cn
- Funds：
  
  Project supported by the National Natural Science Foundation of China (Nos. 61862063, 61502413, and 61262025), the National Social Science Foundation of China (No. 18BJL104), the Natural Science Foundation of Key Laboratory of Software Engineering of Yunnan Province, China (No. 2020SE301), the Yunnan Science and Technology Major Project (Nos. 202002AE090010 and 202002AD080002-5), and the Data Driven Software Engineering Innovative Research Team Funding of Yunnan Province, China (No. 2017HC012)
- DOI：10.1631/FITEE.2000286
  中图分类号： TP393.08
- 收稿：2020-06-13，
  
  修回：2021-;8-24，
  
  纸质出版：2021-09
- Accepted：
Scan QR Code
高宸, 张璇, 韩梦婷, 等. 网络空间安全命名实体识别综述[J]. 信息与电子工程前沿（英文）, 2021,22(9):1153-1168.

Chen GAO, Xuan ZHANG, Mengting HAN, et al. A review on cyber security named entity recognition[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1153-1168.
高宸, 张璇, 韩梦婷, 等. 网络空间安全命名实体识别综述[J]. 信息与电子工程前沿（英文）, 2021,22(9):1153-1168. DOI： 10.1631/FITEE.2000286.

Chen GAO, Xuan ZHANG, Mengting HAN, et al. A review on cyber security named entity recognition[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1153-1168. DOI： 10.1631/FITEE.2000286.

摘要

随着互联网技术飞速发展和大数据时代到来，越来越多网络空间安全文本出现在互联网上。这些文本不仅包括安全概念、事件、工具、指南和政策，还包括风险管理方法、最佳实践、保证和技术等。整合大规模、异构和非结构化的网络空间安全信息，对网络空间安全实体进行识别和分类，有助于处理和解决网络空间安全问题。由于网络空间安全领域文本的复杂性和多样性，使用传统的命名实体识别（NER）方法难以识别该领域中的安全实体。本文介绍该领域NER的各种方法和技术，包括基于规则的方法、基于字典的方法和基于机器学习的方法，并讨论该领域NER研究面临的问题，如实体词组的结合与分离、非标准化的命名约定、缩写和大量嵌套等。最后，提出NER在网络空间安全方面的3个研究方向：（1）应用无监督或半监督技术；（2）开发更全面的网络空间安全本体；（3）应用更加有效的深度学习模型。

Abstract

With the rapid development of Internet technology and the advent of the era of big data

more and more cyber security texts are provided on the Internet. These texts include not only security concepts

incidents

tools

guidelines

and policies

but also risk management approaches

best practices

assurances

technologies

and more. Through the integration of large-scale

heterogeneous

unstructured cyber security information

the identification and classification of cyber security entities can help handle cyber security issues. Due to the complexity and diversity of texts in the cyber security domain

it is difficult to identify security entities in the cyber security domain using the traditional named entity recognition (NER) methods. This paper describes various approaches and techniques for NER in this domain

including the rule-based approach

dictionary-based approach

and machine learning based approach

and discusses the problems faced by NER research in this domain

such as conjunction and disjunction

non-standardized naming convention

abbreviation

and massive nesting. Three future directions of NER in cyber security are proposed: (1) application of unsupervised or semi-supervised technology; (2) development of a more comprehensive cyber security ontology; (3) development of a more comprehensive deep learning model.

关键词

Keywords

references

RA Bridges , , , CL Jones , , , MD Iannacone , , , 等 . . Automatic labeling for entity extraction in cyber security , , 2013 . . https://arxiv.org/abs/1308.4941 https://arxiv.org/abs/1308.4941 , , . .

R Caruana . . Multitask learning . . Mach Learn , , 1997 . . 28 ( ( 1 ): ): 41 - - 75 . . DOI: 10.1023/A:1007379606734 http://doi.org/10.1023/A:1007379606734 . .

J Devlin , , , MW Chang , , , K Lee . . BERT: pre-training of deep bidirectional transformers for language understanding , , 2018 . . https://arxiv.org/abs/1810.04805 https://arxiv.org/abs/1810.04805 , , . .

N Dionsio , , , F Alves , , , PM Ferreira , , , 等 . . Cyberthreat detection from Twitter using deep neural networks . . Int Joint Conf on Neural Networks , , 2019 . . p. 1 - - 8 . . DOI: 10.1109/IJCNN.2019.8852475 http://doi.org/10.1109/IJCNN.2019.8852475 . .

SR Eddy . . Hidden Markov models . . Curr Opin Struct Biol , , 1996 . . 6 ( ( 3 ): ): 361 - - 365 . . DOI: 10.1016/s0959-440X(96)80056-X http://doi.org/10.1016/s0959-440X(96)80056-X . .

H Gasmi , , , A Bouras , , , J Laval . . LSTM recurrent neural networks for cyber security named entity recognition . . Proc 13 th Int Conf on Software Engineering Advances , , 2018 . . p. 12 - - 17 . . . .

TM Georgescu , , , B Iancu , , , M Zurini . . Named-entity-recognition-based automated system for diagnosing cybersecurity situations in IoT networks . . Sensors , , 2019 . . 19 ( ( 15 ): ): 3380 DOI: 10.3390/s19153380 http://doi.org/10.3390/s19153380 . .

XM Gu , , , JY Liu , , , PS Cheng , , , 等 . . Malware name recognition in tweets based on enhanced BiLSTM-CRF model . . Comput Sci , , 2020 . . 47 ( ( 2 ): ): 245 - - 250 . . DOI: 10.11896/jsjkx.190500063 http://doi.org/10.11896/jsjkx.190500063 . .

MA Hearst , , , ST Dumais , , , E Osuna , , , 等 . . Support vector machines . . IEEE Intell Syst Their Appl , , 1998 . . 13 ( ( 4 ): ): 18 - - 28 . . DOI: 10.1109/5254.708428 http://doi.org/10.1109/5254.708428 . .

A Joshi , , , R Lal , , , T Finin , , , 等 . . Extracting cybersecurity related linked data from text . . Proc 7 th Int Conf on Semantic Computing , , 2013 . . p. 252 - - 259 . . DOI: 10.1109/ICSC.2013.50 http://doi.org/10.1109/ICSC.2013.50 . .

LP Kaelbling , , , ML Littman , , , AW Moore . . Reinforcement learning: a survey . . J Artif Intell Res , , 1996 . . 4 237 - - 285 . . DOI: 10.1613/jair.301 http://doi.org/10.1613/jair.301 . .

G Kim , , , C Lee , , , J Jo , , , 等 . . Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network . . Int J Mach Learn Cyber , , 2020 . . 11 ( ( 10 ): ): 2341 - - 2355 . . DOI: 10.1007/s13042-020-01122-6 http://doi.org/10.1007/s13042-020-01122-6 . .

JD Lafferty , , , A McCallum , , , FCN Pereira . . Conditional random fields: probabilistic models for segmenting and labeling sequence data . . Proc 18 th Int Conf on Machine Learning , , 2001 . . p. 282 - - 289 . . . .

R Lal . . Information Extraction of Security Related Entities and Concepts from Unstructured Text . . MS Thesis, University of Maryland, Baltimore County, Baltimore, USA , , 2013 . . .

G Lample , , , M Ballesteros , , , S Subramanian , , , 等 . . Neural architectures for named entity recognition , , 2016 . . https://arxiv.org/abs/1603.01360 https://arxiv.org/abs/1603.01360 , , . .

Y LeCun , , , Y Bengio , , , G Hinton . . Deep learning . . Nature , , 2015 . . 521 ( ( 7553 ): ): 436 - - 444 . . DOI: 10.1038/nature14539 http://doi.org/10.1038/nature14539 . .

JY Lee , , , F Dernoncourt , , , P Szolovits . . Transfer learning for named-entity recognition with neural networks . . Proc 11 th Int Conf on Language Resources and Evaluation , , 2018 . . p. 4471 - - 4473 . . . .

T Li , , , YB Guo , , , AK Ju . . A self-attention-based approach for named entity recognition in cybersecurity . . Proc 15 th Int Conf on Computational Intelligence and Security , , 2019 . . p.147 - - 150 . . DOI: 10.1109/CIS.2019.00039 http://doi.org/10.1109/CIS.2019.00039 . .

WG Liu . . Network security entity recognition methods based on the deep neural network . . In: Huang CC, Chan YW, Yen N (Eds. ), Data Processing Techniques and Applications for Cyber-Physical Systems. Springer, Singapore , , 2020 . . p. 1687 - - 1692 . . DOI: 10.1007/978-981-15-1468-5_201 http://doi.org/10.1007/978-981-15-1468-5_201 . .

Z Long , , , LZ Tan , , , SP Zhou , , , 等 . . Collecting indicators of compromise from unstructured text of cybersecurity articles using neural-based sequence labelling . . Int Joint Conf on Neural Networks , , 2019 . . p. 1 - - 8 . . DOI: 10.1109/IJCNN.2019.8852142 http://doi.org/10.1109/IJCNN.2019.8852142 . .

D Lowd , , , C Meek . . Adversarial learning . . Proc 11 th ACM SIGKDD Int Conf on Knowledge Discovery in Data Mining , , 2005 . . p. 641 - - 647 . . DOI: 10.1145/1081870.1081950 http://doi.org/10.1145/1081870.1081950 . .

PC Ma , , , B Jiang , , , ZG Lu , , , 等 . . Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields . . Tsinghua Sci Technol , , 2021 . . 26 ( ( 3 ): ): 259 - - 265 . . DOI: 10.26599/TST.2019.9010033 http://doi.org/10.26599/TST.2019.9010033 . .

M Marrero , , , J Urbano , , , S Snchez-Cuadrado , , , 等 . . Named entity recognition: fallacies, challenges and opportunities . . Comput Stand Interf , , 2013 . . 35 ( ( 5 ): ): 482 - - 489 . . DOI: 10.1016/j.csi.2012.09.004 http://doi.org/10.1016/j.csi.2012.09.004 . .

I Mazharov , , , BV Dobrov . . Named entity recognition for information security domain . . Proc 20 th Int Conf on Data Analytics and Management in Data Intensive Domains , , 2018 . . p. 200 - - 207 . . . .

N McNeil , , , RA Bridges , , , MD Iannacone , , , 等 . . PACE: pattern accurate computationally efficient bootstrapping for timely discovery of cyber-security concepts . . Proc 12 th Int Conf on Machine Learning and Applications , , 2013 . . p.60 - - 65 . . DOI: 10.1109/ICMLA.2013.106 http://doi.org/10.1109/ICMLA.2013.106 . .

PN Mendes , , , M Jakob , , , A Garca-Silva , , , 等 . . DBpedia spotlight: shedding light on the web of documents . . Proc 7 th Int Conf on Semantic Systems , , 2011 . . p. 1 - - 8 . . DOI: 10.1145/2063518.2063519 http://doi.org/10.1145/2063518.2063519 . .

V Mulwad , , , WJ Li , , , A Joshi , , , 等 . . Extracting information about security vulnerabilities from web text . . IEEE/WIC/ACM Int Conf on Web Intelligence and Intelligent Agent Technology , , 2011 . . p. 257 - - 260 . . DOI: 10.1109/WI-IAT.2011.26 http://doi.org/10.1109/WI-IAT.2011.26 . .

D Nadeau , , , S Sekine . . A survey of named entity recognition and classification . . Lingv Investig , , 2007 . . 30 ( ( 1 ): ): 3 - - 26 . . DOI: 10.1075/li.30.1.03nad http://doi.org/10.1075/li.30.1.03nad . .

ME Peters , , , W Ammar , , , C Bhagavatula , , , 等 . . Semi-supervised sequence tagging with bidirectional language models , , 2017 . . https://arxiv.org/abs/1705.00108 https://arxiv.org/abs/1705.00108 , , . .

Y Qin , , , GW Shen , , , WB Zhao , , , 等 . . A network security entity recognition method based on feature template and CNN-BiLSTM-CRF . . Front Inform Technol Electron Eng , , 2019 . . 20 ( ( 6 ): ): 872 - - 884 . . DOI: 10.1631/FITEE.1800520 http://doi.org/10.1631/FITEE.1800520 . .

E Riloff . . Automatically constructing a dictionary for information extraction tasks . . Proc 11 th National Conf on Artificial Intelligence , , 1993 . . p. 811 - - 816 . . . .

A Roy , , , Y Park , , , SH Pan . . Learning domain-specific word embeddings from sparse cybersecurity texts , , 2017 . . https://arxiv.org/abs/1709.07470 https://arxiv.org/abs/1709.07470 , , . .

S Ruder . . An overview of gradient descent optimization algorithms , , 2016 . . https://arxiv.org/abs/1609.04747 https://arxiv.org/abs/1609.04747 , , . .

HJ Shang , , , R Jiang , , , AP Li , , , 等 . . A framework to construct knowledge base for cyber security . . Proc IEEE 2 nd Int Conf on Data Science in Cyberspace , , 2017 . . p. 242 - - 248 . . DOI: 10.1109/DSC.2017.55 http://doi.org/10.1109/DSC.2017.55 . .

YY Shen , , , H Yun , , , ZC Lipton , , , 等 . . Deep active learning for named entity recognition . . Proc 2 nd Workshop on Representation Learning for NLP , , 2017 . . p. 252 - - 256 . . . .

K Simran , , , S Sriram , , , R Vinayakumar , , , 等 . . Deep learning approach for intelligent named entity recognition of cyber security , , 2020 . . https://arxiv.org/abs/2004.00502 https://arxiv.org/abs/2004.00502 , , . .

Z Syed . . Wikitology: a Novel Hybrid Knowledge Base Derived from Wikipedia . . PhD Thesis, University of Maryland, Baltimore County, Baltimore, USA , , 2010 . . .

Z Syed , , , A Padia , , , ML Mathews , , , 等 . . UCO: a unified cybersecurity ontology . . AAAI Workshop on Artificial Intelligence for Cyber Security , , 2016 . . p.14 - - 21 . . . .

M Tikhomirov , , , N Loukachevitch , , , A Sirotina , , , 等 . . Using BERT and augmentation in named entity recognition for cybersecurity domain . . Proc 25 th Int Conf on Applications of Natural Language to Information Systems , , 2020 . . p.16 - - 24 . . DOI: 10.1007/978-3-030-51310-8_2 http://doi.org/10.1007/978-3-030-51310-8_2 . .

A Vaswani , , , N Shazeer , , , N Parmar , , , 等 . . Attention is all you need . . Proc 31 st Int Conf on Neural Information Processing Systems , , 2017 . . p. 6000 - - 6010 . . . .

XR Wang , , , ZH Xiong , , , XY Du , , , 等 . . NER in threat intelligence domain with TSFL . . Proc 9 th Int Conf on Natural Language Processing and Chinese Computing , , 2020 . . p.157 - - 169 . . DOI: 10.1007/978-3-030-60450-9_13 http://doi.org/10.1007/978-3-030-60450-9_13 . .

S Weerawardhana , , , S Mukherjee , , , I Ray , , , 等 . . Automated extraction of vulnerability information for home computer security . . Proc 7 th Int Symp on Foundations and Practice of Security , , 2014 . . p.356 - - 366 . . DOI: 10.1007/978-3-319-17040-4_24 http://doi.org/10.1007/978-3-319-17040-4_24 . .

H Wu , , , XY Li , , , YL Gao . . An effective approach of named entity recognition for cyber threat intelligence . . Proc IEEE 4 th Information Technology, Networking, Electronic and Automation Control Conf , , 2020 . . p.1370 - - 1374 . . DOI: 10.1109/ITNEC48623.2020.9085102 http://doi.org/10.1109/ITNEC48623.2020.9085102 . .

ZF Xiao . . Towards a two-phase unsupervised system for cybersecurity concepts extraction . . Proc 13 th Int Conf on Natural Computation, Fuzzy Systems and Knowledge Discovery , , 2018 . . p.2161 - - 2168 . . DOI: 10.1109/FSKD.2017.8393106 http://doi.org/10.1109/FSKD.2017.8393106 . .

H Zhang , , , YB Guo , , , T Li . . Multifeature named entity recognition in information security based on adversarial learning . . Secur Commun Netw , , 2019 . . 2019 6417407 DOI: 10.1155/2019/6417407 http://doi.org/10.1155/2019/6417407 . .

SP Zhou , , , Z Long , , , LZ Tan , , , 等 . . Automatic identification of indicators of compromise using neural-based sequence labelling , , 2018 . . https://arxiv.org/abs/1810.10156 https://arxiv.org/abs/1810.10156 , , . .

浏览量

Downloads

CSCD

文章被引用时，请邮件提醒。

Submit

工具集

关联资源

Advances and challenges in artificial intelligence text generation

Data-driven soft sensors in blast furnace ironmaking: a survey

A review of computer graphics approaches to urban modeling from a machine learning perspective

MltAuxTSPP: a unified benchmark for deep learning-based traffic state prediction with multi-source auxiliary data

Q-space-coordinate-guided neural networks for high-fidelity diffusion tensor estimation from minimal diffusion-weighted images