A survey of binary code representation technology

Taiyan WANG; Qingsong XIE; Lu YU; Zulie PAN; Min ZHANG

doi:10.1631/FITEE.2400088

Your Location：

Home >

Browse articles >

A survey of binary code representation technology

Regular Papers | Updated：2025-06-09

- A survey of binary code representation technology
- 二进制代码表征技术研究进展综述
- Frontiers of Information Technology & Electronic Engineering Vol. 26, Issue 5, Pages: 671-694(2025)
- Affiliations：
  
  1.College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
  2.Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China
- Author bio：
  
  ‡ Corresponding author
- Funds：
- DOI：10.1631/FITEE.2400088
  CLC： TP312
- Received：06 February 2024，
  
  Revised：24 June 2024，
  
  Published：2025-05
- Accepted：
Scan QR Code
Taiyan WANG, Qingsong XIE, Lu YU, et al. A survey of binary code representation technology[J]. Frontiers of information technology & electronic engineering, 2025, 26(5): 671-694.
DOI：

Taiyan WANG, Qingsong XIE, Lu YU, et al. A survey of binary code representation technology[J]. Frontiers of information technology & electronic engineering, 2025, 26(5): 671-694. DOI： 10.1631/FITEE.2400088.

摘要

二进制分析作为一项重要的基础技术，为软件工程与安全研究领域的众多应用提供支撑。随着软件规模的不断扩大与软件体系架构的复杂演进，二进制分析技术面临全新挑战。为突破现有瓶颈，研究人员将人工智能技术应用于二进制代码理解与分析，其核心在于如何对二进制代码进行表征，即如何使用智能化方法为二进制代码生成含有语义信息的表征向量，从而应用于多种二进制分析下游任务。本文围绕现阶段二进制代码表征技术的研究最新进展进行调研与分析，将现有相关研究的工作流程分为二进制代码特征提取方法与二进制代码特征嵌入方法两部分予以介绍。特征提取部分主要包含特征定义与分类以及特征构造。首先系统性阐述特征的抽象定义与分类，其次详细介绍构建特征具体表征的过程。在特征嵌入部分，根据所用的不同智能语义理解模型，以文本嵌入模型与图嵌入模型的使用情况作为分类依据，将嵌入方法分为4类并予以介绍。最后总结现有研究的整体发展思路，并对二进制代码表征技术相关的一些潜在研究方向进行展望。

Abstract

Binary analysis

as an important foundational technology

provides support for numerous applications in the fields of software engineering and security research. With the continuous expansion of software scale and the complex evolution of software architecture

binary analysis technology is facing new challenges. To break through existing bottlenecks

researchers have applied artificial intelligence (AI) technology to the understanding and analysis of binary code. The core lies in characterizing binary code

i.e.

how to use intelligent methods to generate representation vectors containing semantic information for binary code

and apply them to multiple downstream tasks of binary analysis. In this paper

we provide a comprehensive survey of recent advances in binary code representation technology

and introduce the workflow of existing research in two parts

i.e.

binary code feature selection methods and binary code feature embedding methods. The feature selection section includes mainly two parts: definition and classification of features

and feature construction. First

the abstract definition and classification of features are systematically explained

and second

the process of constructing specific representations of features is introduced in detail. In the feature embedding section

based on the different intelligent semantic understanding models used

the embedding methods are classified into four categories based on the usage of text-embedding models and graph-embedding models. Finally

we summarize the overall development of existing research and provide prospects for some potential research directions related to binary code representation technology.

关键词

Keywords

references

Ahn S , Ahn S , Koo H , et al. , 2022 . Practical binary code similarity detection with BERT-based transferable similarity learning . Proc 38 th Annual Computer Security Applications Conf , p. 361 - 374 . https://doi.org/10.1145/3564625.3567975 https://doi.org/10.1145/3564625.3567975

Allamanis M , Barr ET , Ducousso S , et al. , 2020 . Typilus: neural type hints . Proc 41 st ACM SIGPLAN Conf on Programming Language Design and Implementation , p. 91 - 105 . https://doi.org/10.1145/3385412.3385997 https://doi.org/10.1145/3385412.3385997

Bengio Y , Courville A , Vincent P , 2013 . Representation learning: a review and new perspectives . IEEE Trans Patt Anal Mach Intell , 35 ( 8 ): 1798 - 1828 . https://doi.org/10.1109/TPAMI.2013.50 https://doi.org/10.1109/TPAMI.2013.50

Chaganti R , Ravi V , Pham TD , 2022 . Deep learning based cross architecture Internet of Things malware detection and classification . Comput Secur , 120 : 102779 . https://doi.org/10.1016/j.cose.2022.102779 https://doi.org/10.1016/j.cose.2022.102779

Chandramohan M , Xue YX , Xu ZZ , et al. , 2016 . BinGo: cross-architecture cross-OS binary search . Proc 24 th ACM SIGSOFT Int Symp on Foundations of Software Engineering , p. 678 - 689 . https://doi.org/10.1145/2950290.2950350 https://doi.org/10.1145/2950290.2950350

Chen LG , He ZL , Mao B , 2020 . CATI: context-assisted type inference from stripped binaries . Proc 50 th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks , p. 88 - 98 . https://doi.org/10.1109/DSN48063.2020.00028 https://doi.org/10.1109/DSN48063.2020.00028

Chen QB , Lacomis J , Schwartz EJ , et al. , 2022 . Augmenting decompiler output with learned variable names and types . Proc 31 st USENIX Security Symp , p. 4327 - 4343 .

Chu QF , Liu GS , Zhu X , 2020 . Visualization feature and CNN based homology classification of malicious code . Chin J Electron , 29 ( 1 ): 154 - 160 . https://doi.org/10.1049/cje.2019.11.005 https://doi.org/10.1049/cje.2019.11.005

Chua ZL , Shen SQ , Saxena P , et al. , 2017 . Neural nets can learn function type signatures from binaries . Proc 26 th USENIX Conf on Security Symp , p. 99 - 116 .

Dai HJ , Dai B , Song L , 2016 . Discriminative embeddings of latent variable models for structured data . Proc 33 rd Int Conf on Machine Learning , p. 2702 - 2711 .

David Y , Yahav E , 2014 . Tracelet-based code search in executables . Proc 35 th ACM SIGPLAN Conf on Programming Language Design and Implementation , p. 349 - 360 . https://doi.org/10.1145/2594291.2594343 https://doi.org/10.1145/2594291.2594343

David Y , Partush N , Yahav E , 2016 . Statistical similarity of binaries . ACM SIGPLAN Not , 51 ( 6 ): 266 - 280 . https://doi.org/10.1145/2980983.2908126 https://doi.org/10.1145/2980983.2908126

David Y , Partush N , Yahav E , 2017 . Similarity of binaries through re-optimization . Proc 38 th ACM SIGPLAN Conf on Programming Language Design and Implementation , p. 79 - 94 . https://doi.org/10.1145/3062341.3062387 https://doi.org/10.1145/3062341.3062387

David Y , Partush N , Yahav E , 2018 . FirmUp: precise static detection of common vulnerabilities in firmware . ACM SIGPLAN Not , 53 ( 2 ): 392 - 404 . https://doi.org/10.1145/3296957.3177157 https://doi.org/10.1145/3296957.3177157

David Y , Alon U , Yahav E , 2020 . Neural reverse engineering of stripped binaries using augmented control flow graphs . Proc ACM Program Lang , 4 ( OOPSLA ): 225 . https://doi.org/10.1145/3428293 https://doi.org/10.1145/3428293

Devlin J , Chang MW , Lee K , et al. , 2019 . BERT: pre-training of deep bidirectional Transformers for language understanding . Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , p. 4171 - 4186 . https://doi.org/10.18653/v1/N19-1423 https://doi.org/10.18653/v1/N19-1423

Ding SHH , Fung BCM , Charland P , 2019 . Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization . Proc IEEE Symp on Security and Privacy , p. 472 - 489 . https://doi.org/10.1109/SP.2019.00003 https://doi.org/10.1109/SP.2019.00003

Duan Y , Li XZX , Wang JH , et al. , 2020 . DeepBinDiff: learning program-wide code representations for binary diffing . Network and Distributed Systems Security Symp , p. 1 - 16 . https://doi.org/10.14722/ndss.2020.24311 https://doi.org/10.14722/ndss.2020.24311

Feng Q , Zhou RD , Xu CC , et al. , 2016 . Scalable graph-based bug search for firmware images . Proc ACM SIGSAC Conf on Computer and Communications Security , p. 480 - 491 . https://doi.org/10.1145/2976749.2978370 https://doi.org/10.1145/2976749.2978370

Gao H , Cheng SY , Xue YX , et al. , 2021 . A lightweight framework for function name reassignment based on large-scale stripped binaries . Proc 30 th ACM SIGSOFT Int Symp on Software Testing and Analysis , p. 607 - 619 . https://doi.org/10.1145/3460319.3464804 https://doi.org/10.1145/3460319.3464804

Gao J , Yang X , Fu Y , et al. , 2018a . VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary . Proc 33 rd ACM/IEEE Int Conf on Automated Software Engineering , p. 896 - 899 . https://doi.org/10.1145/3238147.3240480 https://doi.org/10.1145/3238147.3240480

Gao J , Yang X , Fu Y , et al. , 2018b . VulSeeker-Pro: enhanced semantic learning based binary vulnerability seeker with emulation . Proc 26 th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering , p. 803 - 808 . https://doi.org/10.1145/3236024.3275524 https://doi.org/10.1145/3236024.3275524

Gao J , Jiang Y , Liu Z , et al. , 2021 . Semantic learning and emulation based cross-platform binary vulnerability seeker . IEEE Trans Softw Eng , 47 ( 11 ): 2575 - 2589 . https://doi.org/10.1109/TSE.2019.2956932 https://doi.org/10.1109/TSE.2019.2956932

Giaretta L , Lekssays A , Carminati B , et al. , 2021 . LiMNet: early-stage detection of IoT botnets with lightweight memory networks . Proc 26 th European Symp on Research in Computer Security , p. 605 - 625 . https://doi.org/10.1007/978-3-030-88418-5_29 https://doi.org/10.1007/978-3-030-88418-5_29

Gilmer J , Schoenholz SS , Riley PF , et al. , 2017 . Neural message passing for quantum chemistry . Proc 34 th Int Conf on Machine Learning , p. 1263 - 1272 .

Grover A , Leskovec J , 2016 . node2vec: scalable feature learning for networks . Proc 22 nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining , p. 855 - 864 . https://doi.org/10.1145/2939672.2939754 https://doi.org/10.1145/2939672.2939754

Guo WB , Mu DL , Xing XY , et al. , 2019 . DEEPVSA: facilitating value-set analysis with deep learning for postmortem program analysis . Proc 28 th USENIX Conf on Security Symp , p. 1787 - 1804 .

Guo XX , Cai RJ , Yin XK , et al. , 2023 . Searching open-source vulnerability function based on software modularization . Appl Sci , 13 ( 2 ): 701 . https://doi.org/10.3390/app13020701 https://doi.org/10.3390/app13020701

Guo YX , Li PC , Luo YW , et al. , 2022 . Exploring GNN based program embedding technologies for binary related tasks . Proc 30 th IEEE/ACM Int Conf on Program Comprehension , p. 366 - 377 . https://doi.org/10.1145/3524610.3527900 https://doi.org/10.1145/3524610.3527900

Haq IU , Caballero J , 2021 . A survey of binary code similarity . ACM Comput Surv , 54 ( 3 ): 51 . https://doi.org/10.1145/3446371 https://doi.org/10.1145/3446371

Hou XY , Zhao YJ , Liu Y , et al. , 2024 . Large language models for software engineering: a systematic literature review . https://doi.org/10.48550/arXiv.2308.10620 https://doi.org/10.48550/arXiv.2308.10620

Houlsby N , Giurgiu A , Jastrzebski S , et al. , 2019 . Parameter-efficient transfer learning for NLP . Proc 36 th Int Conf on Machine Learning , p. 2790 - 2799 .

Huang X , Li JD , Hu X , 2017 . Accelerated attributed network embedding . Proc SIAM Int Conf on Data Mining , p. 633 - 641 . https://doi.org/10.1137/1.9781611974973.71 https://doi.org/10.1137/1.9781611974973.71

Ji YD , Cui L , Huang HH , 2021 . BugGraph: differentiating source-binary code similarity with graph triplet-loss network . Proc ACM Asia Conf on Computer and Communications Security , p. 702 - 715 . https://doi.org/10.1145/3433210.3437533 https://doi.org/10.1145/3433210.3437533

Jin X , Pei KX , Won JY , et al. , 2022 . SymLM: predicting function names in stripped binaries via context-sensitive execution-aware code embeddings . Proc ACM SIGSAC Conf on Computer and Communications Security , p. 1631 - 1645 . https://doi.org/10.1145/3548606.3560612 https://doi.org/10.1145/3548606.3560612

Kim D , Kim E , Cha SK , et al. , 2023 . Revisiting binary code similarity analysis using interpretable feature engineering and lessons learned . IEEE Trans Softw Eng , 49 ( 4 ): 1661 - 1682 . https://doi.org/10.1109/TSE.2022.3187689 https://doi.org/10.1109/TSE.2022.3187689

Kim G , Hong S , Franz M , et al. , 2022 . Improving cross-platform binary analysis using representation learning via graph alignment . Proc 31 st ACM SIGSOFT Int Symp on Software Testing and Analysis , p. 151 - 163 . https://doi.org/10.1145/3533767.3534383 https://doi.org/10.1145/3533767.3534383

Kim H , Bak J , Cho K , et al. , 2023 . A Transformer-based function symbol name inference model from an assembly language for binary reversing . Proc ACM Asia Conf on Computer and Communications Security , p. 951 - 965 . https://doi.org/10.1145/3579856.3582823 https://doi.org/10.1145/3579856.3582823

Kipf TN , Welling M , 2016 . Semi-supervised classification with graph convolutional networks . Proc 5 th Int Conf on Learning Representations .

Lafferty JD , McCallum A , Pereira FCN , 2001 . Conditional random fields: probabilistic models for segmenting and labeling sequence data . Proc 18 th Int Conf on Machine Learning , p. 282 - 289 .

Lattner C , Adve V , 2004 . LLVM: a compilation framework for lifelong program analysis & transformation . Proc Int Symp on Code Generation and Optimization , p. 75 - 86 . https://doi.org/10.1109/CGO.2004.1281665 https://doi.org/10.1109/CGO.2004.1281665

Li CF , Shen GM , Sun W , 2021 . Cross-architecture Internet-of-Things malware detection based on graph neural network . Proc Int Joint Conf on Neural Networks , p. 1 - 7 . https://doi.org/10.1109/IJCNN52387.2021.9533500 https://doi.org/10.1109/IJCNN52387.2021.9533500

Li XZX , Qu Y , Yin H , 2021 . PalmTree: learning an assembly language model for instruction embedding . Proc ACM SIGSAC Conf on Computer and Communications Security , p. 3236 - 3251 . https://doi.org/10.1145/3460120.3484587 https://doi.org/10.1145/3460120.3484587

Li YC , Wang BY , Hu BJ , 2020 . Semantically find similar binary codes with mixed key instruction sequence . Inform Softw Technol , 125 : 106320 . https://doi.org/10.1016/j.infsof.2020.106320 https://doi.org/10.1016/j.infsof.2020.106320

Li YJ , Tarlow D , Brockschmidt M , et al. , 2015 . Gated graph sequence neural networks . Proc 4 th Int Conf on Learning Representations .

Li YJ , Gu CJ , Dullien T , et al. , 2019 . Graph matching networks for learning the similarity of graph structured objects . Proc 36 th Int Conf on Machine Learning , p. 3835 - 3845 .

Liu BC , Huo W , Zhang C , et al. , 2018 . α Diff: cross-version binary code similarity detection with DNN . Proc 33 rd IEEE/ACM Int Conf on Automated Software Engineering , p. 667 - 678 . https://doi.org/10.1145/3238147.3238199 https://doi.org/10.1145/3238147.3238199

Liu QX , Liu JX , Jin Z , et al. , 2023 . Survey of artificial intelligence based IoT malware detection . J Comput Res Dev , 60 ( 10 ): 2234 - 2254 (in Chinese) . https://doi.org/10.7544/issn1000-1239.202330450 https://doi.org/10.7544/issn1000-1239.202330450

Liu YH , Ott M , Goyal N , et al. , 2019 . RoBERTa: a robustly optimized BERT pretraining approach . https://doi.org/10.48550/arXiv.1907.11692 https://doi.org/10.48550/arXiv.1907.11692

Liu ZA , 2021 . Binary code similarity detection . Proc 36 th IEEE/ACM Int Conf on Automated Software Engineering , p. 1056 - 1060 . https://doi.org/10.1109/ASE51524.2021.9678518 https://doi.org/10.1109/ASE51524.2021.9678518

Liu ZM , Kitouni O , Nolte N , et al. , 2022 . Towards understanding grokking: an effective theory of representation learning . Proc 36 th Conf on Neural Information Processing Systems , p. 34651 - 34663 .

Lu XD , Duan ZM , Qian YK , et al. , 2020 . Malicious code classification method based on deep forest . J Softw , 31 ( 5 ): 1454 . https://doi.org/10.13328/j.cnki.jos.005660 https://doi.org/10.13328/j.cnki.jos.005660

Lu YL , Yu L , Zhao JZ , 2023 . Survey of software vulnerability mining methods based on machine learning . Inform Counterm Technol , 2 ( 2 ): 1 - 19 (in Chinese) . https://doi.org/10.12399/j.issn.2097-163x.2023.02.001 https://doi.org/10.12399/j.issn.2097-163x.2023.02.001

Luo ZH , Wang PW , Wang BS , et al. , 2023 . VulHawk: cross-architecture vulnerability detection with entropy-based binary code search . Proc 30 th Annual Network and Distributed System Security Symp . https://doi.org/10.14722/ndss.2023.24415 https://doi.org/10.14722/ndss.2023.24415

Marcelli A , Graziano M , Ugarte-Pedrero X , et al. , 2022 . How machine learning is solving the binary function similarity problem . Proc 31 st USENIX Security Symp , p. 2099 - 2116 .

Massarelli L , Di Luna GA , Petroni F , et al. , 2019a . Investigating graph embedding neural networks with unsupervised features extraction for binary analysis . Proc Workshop on Binary Analysis Research , p. 1 - 11 . https://doi.org/10.14722/bar.2019.23020 https://doi.org/10.14722/bar.2019.23020

Massarelli L , Di Luna GA , Petroni F , et al. , 2019b . SAFE: self-attentive function embeddings for binary similarity . Proc 16 th Int Conf on Detection of Intrusions and Malware, and Vulnerability Assessment , p. 309 - 329 . https://doi.org/10.1007/978-3-030-22038-9_15 https://doi.org/10.1007/978-3-030-22038-9_15

Mikolov T , Chen K , Corrado G , et al. , 2013 . Efficient estimation of word representations in vector space . Proc 1 st Int Conf on Learning Representations .

Nethercote N , Seward J , 2007 . Valgrind: a framework for heavyweight dynamic binary instrumentation . Proc 28 th ACM SIGPLAN Conf on Programming Language Design and Implementation , p. 89 - 100 . https://doi.org/10.1145/1250734.1250746 https://doi.org/10.1145/1250734.1250746

Nitin V , Saieva A , Ray B , et al. , 2021 . DIRECT: a transformer-based model for decompiled identifier renaming . Proc 1 st Workshop on Natural Language Processing for Programming , p. 48 - 57 . https://doi.org/10.18653/v1/2021.nlp4prog-1.6 https://doi.org/10.18653/v1/2021.nlp4prog-1.6

Patrick-Evans J , Dannehl M , Kinder J , 2023 . XFL: naming functions in binaries with extreme multi-label learning . Proc IEEE Symp on Security and Privacy , p. 2375 - 2390 . https://doi.org/10.1109/SP46215.2023.10179439 https://doi.org/10.1109/SP46215.2023.10179439

Pei KX , Guan J , Broughton M , et al. , 2021 . StateFormer: fine-grained type recovery from binaries using generative state modeling . Proc 29 th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering , p. 690 - 702 . https://doi.org/10.1145/3468264.3468607 https://doi.org/10.1145/3468264.3468607

Pei KX , Xuan Z , Yang JF , et al. , 2023 . Learning approximate execution semantics from traces for binary function similarity . IEEE Trans Softw Eng , 49 ( 4 ): 2776 - 2790 . https://doi.org/10.1109/TSE.2022.3231621 https://doi.org/10.1109/TSE.2022.3231621

Peng DL , Zheng SX , Li YT , et al. , 2021 . How could neural networks understand programs ? Proc 38 th Int Conf on Machine Learning , p. 8476 - 8486 .

Pham DP , Marion D , Mastio M , et al. , 2021 . Obfuscation revealed: leveraging electromagnetic signals for obfuscated malware classification . Proc 37 th Annual Computer Security Applications Conf , p. 706 - 719 . https://doi.org/10.1145/3485832.3485894 https://doi.org/10.1145/3485832.3485894

Power A , Burda Y , Edwards H , et al. , 2022 . Grokking: generalization beyond overfitting on small algorithmic datasets . https://doi.org/10.48550/arXiv.2201.02177 https://doi.org/10.48550/arXiv.2201.02177

Qasem A , Debbabi M , Lebel B , et al. , 2023 . Binary function clone search in the presence of code obfuscation and optimization over multi-CPU architectures . Proc ACM Asia Conf on Computer and Communications Security , p. 443 - 456 . https://doi.org/10.1145/3579856.3582818 https://doi.org/10.1145/3579856.3582818

Qiao YC , Zhang WZ , Du XJ , et al. , 2021 . Malware classification based on multilayer perception and Word2Vec for IoT security . ACM Trans Int Technol , 22 ( 1 ): 10 . https://doi.org/10.1145/3436751 https://doi.org/10.1145/3436751

Ramos DA , Engler D , 2015 . Under-constrained symbolic execution: correctness checking for real code . Proc 24 th USENIX Conf on Security Symp , p. 49 - 64 .

Redmond K , Luo LN , Zeng Q , 2019 . A cross-architecture instruction embedding model for natural language processing-inspired binary code analysis . Proc Workshop on Binary Analysis Research , p. 1 - 8 . https://doi.org/10.14722/bar.2019.23057 https://doi.org/10.14722/bar.2019.23057

Shalev N , Partush N , 2018 . Binary similarity detection using machine learning . Proc 13 th Workshop on Programming Languages and Analysis for Security , p. 42 - 47 . https://doi.org/10.1145/3264820.3264821 https://doi.org/10.1145/3264820.3264821

Sun PF , Garcia L , Salles-Loustau G , et al. , 2020 . Hybrid firmware analysis for known mobile and IoT security vulnerabilities . Proc 50 th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks , p. 373 - 384 . https://doi.org/10.1109/DSN48063.2020.00053 https://doi.org/10.1109/DSN48063.2020.00053

Tai KS , Socher R , Manning CD , 2015 . Improved semantic representations from tree-structured long short-term memory networks . Proc 53 rd Annual Meeting of the Association for Computational Linguistics and 7 th Int Joint Conf on Natural Language Processing , p. 1556 - 1566 . https://doi.org/10.3115/v1/P15-1150 https://doi.org/10.3115/v1/P15-1150

Tang J , Qu M , Wang MZ , et al. , 2015 . LINE: large-scale information network embedding . Proc 24 th Int Conf on World Wide Web , p. 1067 - 1077 . https://doi.org/10.1145/2736277.2741093 https://doi.org/10.1145/2736277.2741093

Ullah S , Oh H , 2022 . BinDiff NN : learning distributed representation of assembly for robust binary diffing against semantic differences . IEEE Trans Softw Eng , 48 ( 9 ): 3442 - 3466 . https://doi.org/10.1109/TSE.2021.3093926 https://doi.org/10.1109/TSE.2021.3093926

Vasan D , Alazab M , Wassan S , et al. , 2020a . IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture . Comput Netw , 171 : 107138 . https://doi.org/10.1016/j.comnet.2020.107138 https://doi.org/10.1016/j.comnet.2020.107138

Vasan D , Alazab M , Venkatraman S , et al. , 2020b . MTHAEL: cross-architecture IoT malware detection based on neural network advanced ensemble learning . IEEE Trans Comput , 69 ( 11 ): 1654 - 1667 . https://doi.org/10.1109/TC.2020.3015584 https://doi.org/10.1109/TC.2020.3015584

Vaswani A , Shazeer N , Parmar N , et al. , 2017 . Attention is all you need . Proc 31 st Int Conf on Neural Information Processing Systems , p. 6000 - 6010 .

Vinyals O , Bengio S , Kudlur M , 2015 . Order Matters: sequence to sequence for sets . Proc 4 th Int Conf on Learning Representations .

Wang H , Qu WJ , Katz G , et al. , 2022 . jTrans: jump-aware Transformer for binary code similarity detection . Proc 31 st ACM SIGSOFT Int Symp on Software Testing and Analysis , p. 1 - 13 . https://doi.org/10.1145/3533767.3534367 https://doi.org/10.1145/3533767.3534367

Wang HJ , Ma PC , Yuan YY , et al. , 2023a . Enhancing DNN-based binary code function search with low-cost equivalence checking . IEEE Trans Softw Eng , 49 ( 1 ): 226 - 250 . https://doi.org/10.1109/TSE.2022.3149240 https://doi.org/10.1109/TSE.2022.3149240

Wang HJ , Ma PC , Wang S , et al. , 2023b . sem2vec: semantics-aware assembly tracelet embedding . ACM Trans Softw Eng Methodol , 32 ( 4 ): 90 . https://doi.org/10.1145/3569933 https://doi.org/10.1145/3569933

Wang JJ , Huang YC , Chen CY , et al. , 2024 . Software testing with large language model: survey, landscape, and vision . IEEE Trans Softw Eng , 50 ( 4 ): 911 - 936 . https://doi.org/10.1109/TSE.2024.3368208 https://doi.org/10.1109/TSE.2024.3368208

Wang JW , Chen ZJ , Xie X , et al. , 2023 . Review of malware detection and classification visualization techniques . Chin J Netw Inform Secur , 9 ( 5 ): 1 (in Chinese) .

Wu CY , Ban T , Cheng SM , et al. , 2023 . IoT malware classification based on reinterpreted function-call graphs . Comput Secur , 125 : 103060 . https://doi.org/10.1016/j.cose.2022.103060 https://doi.org/10.1016/j.cose.2022.103060

Xu MJ , 2021 . Understanding graph embedding methods and their applications . SIAM Rev , 63 : 825 - 853 . https://doi.org/10.1137/20M1386062 https://doi.org/10.1137/20M1386062

Xu XJ , Liu C , Feng Q , et al. , 2017 . Neural network-based graph embedding for cross-platform binary code similarity detection . Proc ACM SIGSAC Conf on Computer and Communications Security , p. 363 - 376 . https://doi.org/10.1145/3133956.3134018 https://doi.org/10.1145/3133956.3134018

Xu XZ , Feng SW , Ye YP , et al. , 2023 . Improving binary code similarity Transformer models by semantics-driven instruction deemphasis . Proc 32 nd ACM SIGSOFT Int Symp on Software Testing and Analysis , p. 1106 - 1118 . https://doi.org/10.1145/3597926.3598121 https://doi.org/10.1145/3597926.3598121

Yang C , Liu ZY , Zhao DL , et al. , 2015 . Network representation learning with rich text information . Proc 24 th Int Conf on Artificial Intelligence , p. 2111 - 2117 .

Yang J , Fu C , Liu XY , et al. , 2022 . Codee: a tensor embedding scheme for binary code search . IEEE Trans Softw Eng , 48 ( 7 ): 2224 - 2244 . https://doi.org/10.1109/TSE.2021.3056139 https://doi.org/10.1109/TSE.2021.3056139

Yang SG , Cheng L , Zheng YC , et al. , 2021 . Asteria: deep learning-based AST-encoding for cross-platform binary code similarity detection . 51 st Annual IEEE/IFIP Int Conf on Dependable Systems and Networks , p. 224 - 236 . https://doi.org/10.1109/DSN48987.2021.00036 https://doi.org/10.1109/DSN48987.2021.00036

Yang SG , Dong CP , Xiao Y , et al. , 2023 . Asteria-Pro: enhancing deep learning-based binary code similarity detection by incorporating domain knowledge . ACM Trans Softw Eng Methodol , 33 ( 1 ): 1 . https://doi.org/10.1145/3604611 https://doi.org/10.1145/3604611

Yu SY , Achamyeleh YG , Wang CH , et al. , 2023 . CFG2VEC: hierarchical graph neural network for cross-architectural software reverse engineering . Proc IEEE/ACM 45 th Int Conf on Software Engineering: Software Engineering in Practice , p. 281 - 291 . https://doi.org/10.1109/ICSE-SEIP58684.2023.00031 https://doi.org/10.1109/ICSE-SEIP58684.2023.00031

Yu YC , Gan ST , Qiu JY , et al. , 2022 . Binary code similarity analysis and its applications on embedded device firmware vulnerability search . J Softw , 33 ( 11 ): 4137 - 4172 . https://doi.org/10.13328/j.cnki.jos.006540 https://doi.org/10.13328/j.cnki.jos.006540

Yu ZP , Zheng WX , Wang JQ , et al. , 2020a . CodeCMR: cross-modal retrieval for function-level binary source code matching . 34 th Conf on Neural Information Processing Systems , p. 1 - 3 .

Yu ZP , Cao R , Tang QY , et al. , 2020b . Order Matters: semantic-aware neural networks for binary code similarity detection . Proc 34 th AAAI Conf on Artificial Intelligence , p. 1145 - 1152 . https://doi.org/10.1609/aaai.v34i01.5466 https://doi.org/10.1609/aaai.v34i01.5466

Yumlembam R , Issac B , Jacob SM , et al. , 2023 . IoT-based Android malware detection using graph neural network with adversarial defense . IEEE Int Things J , 10 ( 10 ): 8432 - 8444 . https://doi.org/10.1109/JIOT.2022.3188583 https://doi.org/10.1109/JIOT.2022.3188583

Zhang XC , Sun WJ , Pang JM , et al. , 2020 . Similarity metric method for binary basic blocks of cross-instruction set architecture . Proc Workshop on Binary Analysis Research , p. 1 - 12 . https://doi.org/10.14722/bar.2020.23002 https://doi.org/10.14722/bar.2020.23002

Zhang YF , Huang C , Zhang YK , et al. , 2022 . Pre-training representations of binary code using contrastive learning . https://doi.org/10.48550/arXiv.2210.05102 https://doi.org/10.48550/arXiv.2210.05102

Zhang Z , Ye YP , You W , et al. , 2021 . OSPREY: recovery of variable and data structure via probabilistic analysis for stripped binary . Proc IEEE Symp on Security and Privacy , p. 813 - 832 . https://doi.org/10.1109/SP40001.2021.00051 https://doi.org/10.1109/SP40001.2021.00051

Zuo F , Li XP , Zhang Z , et al. , 2019 . Neural machine translation inspired binary code similarity comparison beyond function pairs . https://arxiv.org/pdf/1808.04706 https://arxiv.org/pdf/1808.04706

Views

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

Map

Chat

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰