FOLLOWUS
1.Department of Computer Engineering, Hacettepe University, Ankara 06800, Türkiye
2.Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA
3.Department of Computer Engineering, Gazi University, Ankara 06570, Türkiye
‡ Corresponding author
mengu@hacettepe.edu.tr
ocankur@umd.edu
cagrisahin@gazi.edu.tr
Received:06 February 2024,
Revised:23 July 2024,
Published:2025-06
Scan QR Code
Adnan OZSOY, Mengu NAZLI, Onur CANKUR, et al. CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators[J]. Frontiers of information technology & electronic engineering, 2025, 26(6): 877-895.
Adnan OZSOY, Mengu NAZLI, Onur CANKUR, et al. CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators[J]. Frontiers of information technology & electronic engineering, 2025, 26(6): 877-895. DOI: 10.1631/FITEE.2400091.
提出一种字符串匹配算法研究工具(SMART)库的并行版本,该版本在NVDIA的统一计算设备架构(CUDA)平台上实现,采用通用图形处理器(GPGPU)编程概念以提升性能及深入了解这些字符匹配算法的并行版本。利用CUDA应用程序编程接口(API)开发了CUDA增强的SMART(CUSMART)库,该库集成了64种字符串匹配算法的并行迭代。为确保全面且公正的比较,在各种场景下评估这些算法的性能,进而识别它们在特定应用场景中的优势和劣势。探索并建立了优化技术,以评估它们对算法性能的影响。该研究的结果通过算法的可扩展性突出了GPGPU计算在字符串匹配应用中的潜力,表明性能有显著提高。此外,确定了不同场景下表现最佳和最差的算法。
This study presents a parallel version of the string matching algorithms research tool (SMART) library
implemented on NVIDIA’s compute unified device architecture (CUDA) platform
and uses general-purpose computing on graphics processing unit (GPGPU) programming concepts to enhance performance and gain insight into the parallel versions of these algorithms. We have developed the CUDA-enhanced SMART (CUSMART) library
which incorporates parallelized iterations of 64 string matching algorithms
leveraging the CUDA application programming interface. The performance of these algorithms has been assessed across various scenarios to ensure a comprehensive and impartial comparison
allowing for the identification of their strengths and weaknesses in specific application contexts. We have explored and established optimization techniques to gauge their influence on the performance of these algorithms. The results of this study highlight the potential of GPGPU computing in string matching applications through the scalability of algorithms
suggesting significant performance improvements. Furthermore
we have identified the best and worst performing algorithms in various scenarios.
Adams MD , Celniker SE , Holt RA , et al. , 2000 . The genome sequence of Drosophila melanogaster . Science , 287 ( 5461 ): 2185 - 2195 . https://doi.org/10.1126/science.287.5461.2185 https://doi.org/10.1126/science.287.5461.2185
Adey SP , 2013 . GPU Accelerated Pattern Matching Algorithm for DNA Sequences to Detect Cancer Using CUDA . MS Thesis, College of Engineering , Pune, India .
Ashkiani S , Amenta N , Owens JD , 2016 . Parallel approaches to the string matching problem on the GPU . Proc 28 th ACM Symp on Parallelism in Algorithms and Architectures , p. 275 - 285 . https://doi.org/10.1145/2935764.2935800 https://doi.org/10.1145/2935764.2935800
Baeza-Yates R , Gonnet GH , 1992 . A new approach to text searching . Commun ACM , 35 ( 10 ): 74 - 82 . https://doi.org/10.1145/135239.135243 https://doi.org/10.1145/135239.135243
Bellekens X , Atkinson RC , Renfrew C , et al. , 2013 . Investigation of GPU-based pattern matching . Proc 14 th Post Graduate Symp on the Convergence of T elecommunications, Networking and Broadcasting , Article 5 .
Blelloch GE , 1990 . Vector Models for Data-Parallel Computing . MIT Press , Cambridge, USA .
Ceruzzi PE , 2003 . A History of Modern Computing . MIT Press , Cambridge, USA .
Cormen TH , Leiserson CE , Rivest RL , et al. , 2009 . Introduction to Algorithms ( 3rd Ed .). MIT Press , Cambridge, USA .
Crochemore M , Rytter W , 1994 . Text Algorithms . Oxford University Press, Inc. , New York, USA .
Faro S , Lecroq T , 2010 . The exact string matching problem: a comprehensive experimental evaluation . http://arxiv.org/abs/1012.2547 http://arxiv.org/abs/1012.2547
Faro S , Lecroq T , 2013 . The exact online string matching problem . ACM Comput Surv , 45 ( 2 ): 13 . https://doi.org/10.1145/2431211.2431212 https://doi.org/10.1145/2431211.2431212
Faro S , Lecroq T , Borzì S , et al. , 2016 . The string matching algorithms research tool . Proc Prague Stringology Conf , p. 99 - 111 .
Feng WC , Cameron K , 2007 . The Green500 list: encouraging sustainable supercomputing . Computer , 40 ( 12 ): 50 - 55 . https://doi.org/10.1109/MC.2007.445 https://doi.org/10.1109/MC.2007.445
Girkar M , Polychronopoulos CD , 1995 . Extracting task-level parallelism . ACM Trans Program Lang Syst , 17 ( 4 ): 600 - 634 . https://doi.org/10.1145/210184.210189 https://doi.org/10.1145/210184.210189
Hains D , Cashero Z , Ottenberg M , et al. , 2011 . Improving CUDASW++, a parallelization of Smith–Waterman for CUDA enabled devices . IEEE Int Symp on Parallel and Distributed Processing Workshops and PhD Forum , p. 490 - 501 . https://doi.org/10.1109/IPDPS.2011.182 https://doi.org/10.1109/IPDPS.2011.182
Han TS , Ko SK , Kang J , 2007 . Efficient subsequence matching using the longest common subsequence with a dual match index . 5 th Int Conf on Machine Learning and Data Mining in Pattern Recognition , p. 585 - 600 . https://doi.org/10.1007/978-3-540-73499-4_44 https://doi.org/10.1007/978-3-540-73499-4_44
Harris M , 2012 . How to Overlap Data Transfers in CUDA C/C++ . https://developer.nvidia.com/blog/how-overlap-data-transfers-cuda-cc/ https://developer.nvidia.com/blog/how-overlap-data-transfers-cuda-cc/ [Accessed on Feb. 3, 2024 ] .
He LT , Fang BX , Sui J , 2005 . The wide window string matching algorithm . Theor Comput Sci , 332 ( 1-3 ): 391 - 404 . https://doi.org/10.1016/j.tcs.2004.12.002 https://doi.org/10.1016/j.tcs.2004.12.002
Horspool RN , 1980 . Practical fast searching in strings . Softw Pract Exp , 10 ( 6 ): 501 - 506 . https://doi.org/10.1002/spe.4380100608 https://doi.org/10.1002/spe.4380100608
Jaiswal M , 2014 . Accelerating enhanced Boyer–Moore string matching algorithm on multicore GPU for network security . Int J Comput Appl , 97 ( 1 ): 30 - 35 https://doi.org/10.5120/16973-6934 https://doi.org/10.5120/16973-6934
Kadhim HA , AbdulRashidx N , 2014 . Maximum-shift string matching algorithms . Int Conf on Computer and Information Sciences , p. 1 - 6 . https://doi.org/10.1109/ICCOINS.2014.6868423 https://doi.org/10.1109/ICCOINS.2014.6868423
Kouzinopoulos CS , Michailidis PD , Margaritis KG , 2015 . Multiple string matching on a GPU using CUDAs . Scalable Comput , 16 ( 2 ): 121 - 137 . https://doi.org/10.12694/scpe.v16i2.1085 https://doi.org/10.12694/scpe.v16i2.1085
Lee CL , Lin YS , Chen YC , 2015 . A hybrid CPU/GPU pattern-matching algorithm for deep packet inspection . PLoS ONE , 10 ( 10 ): e0139301 . https://doi.org/10.1371/journal.pone.0139301 https://doi.org/10.1371/journal.pone.0139301
Ligowski L , Rudnicki W , 2009 . An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases . IEEE Int Symp on Parallel & Distributed Processing , p. 1 - 8 . https://doi.org/10.1109/IPDPS.2009.5160931 https://doi.org/10.1109/IPDPS.2009.5160931
Lin CY , Och FJ , 2004 . Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics . Proc 42 nd Annual Meeting of the Association for Computational Linguistics , p. 605 - 612 . https://doi.org/10.3115/1218955.1219032 https://doi.org/10.3115/1218955.1219032
Lin DL , Huang TW , 2021 . Efficient GPU computation using task graph parallelism . 27 th Int Conf on Parallel and Distributed Computing on Euro-Par: Parallel Processing , p. 435 - 450 .
Lu SY , Fu KS , 1978 . A sentence-to-sentence clustering procedure for pattern analysis . IEEE Trans Syst Man Cybern , 8 ( 5 ): 381 - 389 . https://doi.org/10.1109/TSMC.1978.4309979 https://doi.org/10.1109/TSMC.1978.4309979
Mitani Y , Ino F , Hagihara K , 2017 . Parallelizing exact and approximate string matching via inclusive scan on a GPU . IEEE Trans Parall Distrib Syst , 28 ( 7 ): 1989 - 2002 . https://doi.org/10.1109/TPDS.2016.2645222 https://doi.org/10.1109/TPDS.2016.2645222
Nagaveni V , Raju GT , 2014 . Various string matching algorithms for DNA sequences to detect breast cancer using CUDA processors . Int J Eng Technol , 14 ( 3 ): 42 - 47 .
Navarro G , Raffinot M , 1998 . A bit-parallel approach to suffix automata: fast extended string matching . 9 th Annual Symp on Combinatorial Pattern Matching , p. 14 - 33 .
NVIDIA , 2019 . Record 136 NVIDIA GPU-Accelerated Supercomputers Feature in TOP500 Ranking . https://blogs.nvidia.com/blog/record-gpu-accelerated-supercomputers-top500/ https://blogs.nvidia.com/blog/record-gpu-accelerated-supercomputers-top500/ [Accessed on Feb. 3, 2024 ] .
Peng JF , Chen H , 2010 . CUgrep: a GPU-based high performance multi-string matching system . Proc 2 nd Int Conf on Future Computer and Communication , p. 77 - 81 . https://doi.org/10.1109/ICFCC.2010.5497832 https://doi.org/10.1109/ICFCC.2010.5497832
Petrakis EGM , 1993 . Image Representation, Indexing and Retrieval Based on Spatial Relationships and Properties of Objects . PhD Thesis, University of Crete , Crete, Greece .
Pungila C , Negru V , 2012 . A highly-efficient memory-compression approach for GPU-accelerated virus signature matching . 15 th Int Conf on Information Security , p. 354 - 369 . https://doi.org/10.1007/978-3-642-33383-5_22 https://doi.org/10.1007/978-3-642-33383-5_22
Quinn MJ , 2004 . Parallel Programming in C with MPI and OpenMP . McGraw-Hill Higher Education , Boston, USA .
Ramos-Frías R , Vargas-Lombardo M , 2017 . A review of string matching algorithms and recent implementations using GPU . Int J Secur Appl , 11 ( 6 ): 69 - 76 . https://doi.org/10.14257/ijsia.2017.11.6.06 https://doi.org/10.14257/ijsia.2017.11.6.06
Rasool A , Khare N , 2012 . Parallelization of KMP string matching algorithm on different SIMD architectures: multi-core and GPGPUs . Int J Comput Appl , 49 ( 11 ): 26 - 28 . https://doi.org/10.5120/7672-0963 https://doi.org/10.5120/7672-0963
Sellis TK , 1988 . Multiple-query optimization . ACM Trans Database Syst , 13 ( 1 ): 23 - 52 . https://doi.org/10.1145/42201.42203 https://doi.org/10.1145/42201.42203
Sharma J , Singh M , 2015 . CUDA based Rabin-Karp pattern matching for deep packet inspection on a multicore GPU . Int J Comput Netw Inform Secur , 7 ( 10 ): 70 - 77 . https://doi.org/10.5815/ijcnis.2015.10.08 https://doi.org/10.5815/ijcnis.2015.10.08
Subhlok J , Stichnoth JM , O’Hallaron DR , et al. , 1993 . Exploiting task and data parallelism on a multicomputer . Proc 4 th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming , p. 13 - 22 . https://doi.org/10.1145/155332.155334 https://doi.org/10.1145/155332.155334
Tian XX , Song YL , Wang XL , et al. , 2012 . Shortest path based potential common friend recommendation in social networks . 2 nd Int Conf on Cloud and Green Computing , p. 541 - 548 .
Tran NP , Lee M , 2013 . High performance string matching for security applications . Int Conf on ICT for Smart Society , p. 1 - 5 . https://doi.org/10.1109/ICTSS.2013.6588052 https://doi.org/10.1109/ICTSS.2013.6588052
Tran NP , Lee M , Hong S , et al. , 2012 . Memory efficient parallelization for Aho–Corasick algorithm on a GPU . IEEE 14 th Int Conf on High Performance Computing and Communication & IEEE 9 th Int Conf on Embedded Software and Systems , p. 432 - 438 . https://doi.org/10.1109/HPCC.2012.65 https://doi.org/10.1109/HPCC.2012.65
Weiner P , 1973 . Linear pattern matching algorithms . 14 th Annual Symp on Switching and Automata Theory , p. 1 - 11 . https://doi.org/10.1109/SWAT.1973.13 https://doi.org/10.1109/SWAT.1973.13
Xu KF , Cui WK , Hu Y , et al. , 2013 . Bit-parallel multiple approximate string matching based on GPU . Proc Comput Sci , 17 : 523 - 529 . https://doi.org/10.1016/j.procs.2013.05.067 https://doi.org/10.1016/j.procs.2013.05.067
Yao ACC , 1979 . The complexity of pattern matching for a random string . SIAM J Comput , 8 ( 3 ): 368 - 387 . https://doi.org/10.1137/0208029 https://doi.org/10.1137/0208029
Yong KK , Karuppiah EK , 2013 . Hash match on GPU . IEEE Conf on Open Systems , p. 150 - 155 . https://doi.org/10.1109/ICOS.2013.6735065 https://doi.org/10.1109/ICOS.2013.6735065
Zha XY , Sahni S , 2013 . GPU-to-GPU and host-to-host multipattern string matching on a GPU . IEEE Trans Comput , 62 ( 6 ): 1156 - 1169 . https://doi.org/10.1109/TC.2012.61 https://doi.org/10.1109/TC.2012.61
Ziv J , Lempel A , 1977 . A universal algorithm for sequential data compression . IEEE Trans Inform Theory , 23 ( 3 ): 337 - 343 . https://doi.org/10.1109/TIT.1977.1055714 https://doi.org/10.1109/TIT.1977.1055714
Publicity Resources
Related Articles
Related Author
Related Institution