Advertisement

A novel JSON based regular expression language for pattern matching in the internet of things

  • Raihan ur RasoolEmail author
  • Maleeha Najam
  • Hafiz Farooq Ahmad
  • Hua Wang
  • Zahid Anwar
Original Research
  • 109 Downloads

Abstract

The Internet of Things work by constantly sensing the physical properties in the vicinity of the user such as ambient light, sounds, motion and temperature. These sensors produce huge volumes of data that has to be efficiently sifted for relevant events required triggering certain actions. In addition, filtering has to be performed to ensure that privacy-sensitive confidential data is not leaked. Efficient and expressive pattern matching is thus a key enabling technology for the full realization of ambient and humanized computing. The bulk of research in this area has focused on the use of specialized hardware and reducing of the memory footprint. Unfortunately, there has been limited work if any on optimizing the core elements of pattern matching- the regular expression language and the compilation process that is responsible for converting patterns into internal data structures. The importance of writing good REs so that on compilation they do not lead to unrealizable data structures is relatively less understood. In the proposed research, we empirically compare different RE processing engines and practically demonstrate that the compilation phase is highly memory intensive and time-consuming as compared to the matching phase -and hence is worth exploring for new techniques and optimizations. As a second important contribution, we propose a novel technique for defining regular expressions by utilizing JavaScript Object Notation. Our evaluation with carefully created patterns shows that the performance of the proposed technique is at par with competing approaches. It is also less ambiguous, extensible, more expressive and much appropriate for defining large and complex patterns.

Keywords

Deep packet inspection/Deep content inspection Efficient matching JavaScript Object Notation (JSON) Pattern matching Parsing Regular expressions 

Notes

Acknowledgements

This research has been supported by DSR, King Faisal University, Saudi Arabia. We are grateful to Ms. Michela Becchi from Department of Electrical and Computer Engineering at The University of Missouri, Columbia for providing us with Regular Expression Processor. We are also thankful to Prof. Andrew A. Chien from Large Scale Systems Group of The University of Chicago for helpful discussions.

References

  1. Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic search. Commun ACM 18(6):333–340MathSciNetCrossRefzbMATHGoogle Scholar
  2. Antonello R, Fernandes S, Sadok D, Kelner J, Szabó G (2015) Design and optimizations for efficient regular expression matching in DPI systems. Comput Commun 61:103–120CrossRefGoogle Scholar
  3. Becchi M (2008) Regex-processor. Available from: http://regex.wustl.edu
  4. Becchi M, Cadambi S (2007) Memory-efficient regular expression search using state merging. In: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE (pp 1064–1072). IEEEGoogle Scholar
  5. Becchi M, Crowley P (2007) A hybrid finite automaton for practical deep packet inspection. In: Proceedings of the 2007 ACM CoNEXT conference (p 1). ACM, New YorkGoogle Scholar
  6. Becchi M, Crowley P (2008) Extending finite automata to efficiently match perl-compatible regular expressions. In: Proceedings of the 2008 ACM CoNEXT Conference (p. 25). ACM, New YorkGoogle Scholar
  7. Boyer RS, Moore JS (1977) A fast string searching algorithm. Commun ACM 20(10):762–772CrossRefzbMATHGoogle Scholar
  8. Chang YK, Li YS, Chen YT (2015) A memory efficient DFA using compression and pattern segmentation. Procedia Comput Sci 56:292–299CrossRefGoogle Scholar
  9. Chen P, Desmet L, Huygens C (2014) A study on advanced persistent threats. In: IFIP International Conference on Communications and Multimedia Security (pp 63–72). Springer, Berlin, HeidelbergGoogle Scholar
  10. Coit CJ, Staniford S, McAlerney J (2001) Towards faster string matching for intrusion detection or exceeding the speed of snort. In: DARPA Information Survivability Conference & Exposition II, 2001. DISCEX’01. Proceedings (vol 1, pp 367–373). IEEEGoogle Scholar
  11. Commentz-Walter B (1979) A string matching algorithm fast on the average. In: International Colloquium on Automata, Languages, and Programming. Springer, Berlin, pp 118–132CrossRefGoogle Scholar
  12. Cormode G, Thottan M (eds) (2010) Algorithms for next generation networks. Springer Science & Business Media, New YorkzbMATHGoogle Scholar
  13. Eriksson M, Hallberg V (2011) Comparison between JSON and YAML for data serialization. Bachelor’s thesisGoogle Scholar
  14. Ficara D, Giordano S, Procissi G, Vitucci F, Antichi G, Di Pietro A (2008) An improved DFA for fast regular expression matching. ACM SIGCOMM Comput Commun Rev 38(5):29–40CrossRefGoogle Scholar
  15. Ficara D, Di Pietro A, Giordano S, Procissi G, Vitucci F, Antichi G (2011) Differential encoding of DFAs for fast regular expression matching. IEEE/ACM Trans Netw 19(3):683–694CrossRefGoogle Scholar
  16. Fisk M, Varghese G (2002) Applying fast string matching to intrusion detection. LOS ALAMOS NATIONAL LAB NMGoogle Scholar
  17. Flex (1987) Text Processing Tool.  Available from: http://flex.sourceforge.net/manual/
  18. Fu Z, Wang K, Cai L, Li J (2014) Intelligent grouping algorithms for regular expressions in deep inspection. In: Computer Communication and Networks (ICCCN), 2014 23rd International Conference on (pp. 1–8). IEEEGoogle Scholar
  19. GSON (2008) Google Gson (Open Source Java library). Available from: https://sites.google.com/site/gson/streaming
  20. HOCON (2011a) Human-optimized config object notation. Available from: https://github.com/typesafehub/config/blob/master/HOCON.md
  21. HOCON (2011b) Human-optimized config object notation. Available from: https://github.com/lightbend/config/blob/master/HOCON.md
  22. JSON (2002) JavaScript Object Notation. Available from: http://www.json.org/
  23. JsonCpp (2007) C++ library to manipulate JSON values. Available from: https://github.com/open-source-parsers/jsoncpp
  24. jsonlite (2013) JSON parser/generator. Available from: https://github.com/amamchur/jsonlite
  25. Kong S, Smith R, Estan C (2008) Efficient signature matching with multiple alphabet compression tables. In: Proceedings of the 4th international conference on Security and privacy in communication netowrks (p 1). ACM, New YorkGoogle Scholar
  26. Kumar S, Dharmapurikar S, Yu F, Crowley P, Turner J (2006). Algorithms to accelerate multiple regular expressions matching for deep packet inspection. ACM SIGCOMM Computer Communication Review (vol 36, 4, pp 339–350). ACM, New YorkGoogle Scholar
  27. Kusswurm D (2014) Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, and AVX. Apress, New YorkGoogle Scholar
  28. Liu T, Yang Y, Liu Y, Sun Y, Guo L (2011) An efficient regular expressions compression algorithm from a new perspective. In: INFOCOM, 2011 Proceedings IEEE (pp 2129–2137). IEEEGoogle Scholar
  29. Liu T, Liu AX, Shi J, Sun Y, Guo L (2014) Towards fast and optimal grouping of regular expressions via DFA size estimation. IEEE J Sel Areas Commun 32(10):1797–1809CrossRefGoogle Scholar
  30. Luchaup D, Smith R, Estan C, Jha S (2009) Multi-byte regular expression matching with speculation. In: International Workshop on Recent Advances in Intrusion Detection (pp 284–303). Springer, Berlin, HeidelbergGoogle Scholar
  31. MIT DARPA (1999) Mitdarpa intrusion detection data sets. Available from: http://www.ll.mit.edu/mission/communications/ist/corpora/
  32. MongoDB (2009) Open-source cross-platform document-oriented database program. Available from: https://docs.mongodb.com/manual/reference/operator/query/regex/
  33. Moreira N, Reis R (eds) (2012) Implementation and application of automata: 17th International Conference, CIAA 2012, Porto, Portugal, July 17–20, 2012. Proceedings (vol 7381). Springer, New YorkGoogle Scholar
  34. Najam M, Younis U, Rasool RU (2014) Multi-byte Pattern Matching Using Stride-K DFA for High Speed Deep Packet Inspection. In: Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on (pp 547–553). IEEEGoogle Scholar
  35. Najam M, Younis U, & ur Rasool R (2015) Speculative parallel pattern matching using stride-k DFA for deep packet inspection. J Netw Comput Appl 54:78–87CrossRefGoogle Scholar
  36. Nebel ME (2006) Fast string matching by using probabilities: on an optimal mismatch variant of Horspool’s algorithm. Theor Comput Sci 359(1–3):329–343MathSciNetCrossRefzbMATHGoogle Scholar
  37. Nourian M, Wang X, Yu X, Feng WC, Becchi M (2017) Demystifying automata processing: GPUs, FPGAs or Micron’s AP? In: Proceedings of the International Conference on Supercomputing (p. 1). ACM, New YorkGoogle Scholar
  38. OpenDDL (2013) Open Data Description Language. Available from: http://openddl.org/
  39. OpenDDL (2017) Open Data Description Language. Available from: http://openddl.org/
  40. Patel J, Liu AX, Torng E (2014) Bypassing space explosion in high-speed regular expression matching. IEEE/ACM Trans Netw 22(6):1701–1714CrossRefGoogle Scholar
  41. Peng M, Gao W, Wang H, Zhang Y, Huang J, Xie Q et al (2017) Parallelization of massive textstream compression based on compressed sensing. ACM Trans Inf Syst (TOIS) 36(2):17CrossRefGoogle Scholar
  42. Perf (2009) Linux profiler. Available from: https://perf.wiki.kernel.org/
  43. Pintool (2012) A Dynamic Binary Instrumentation Tool. Available from: https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool
  44. RapidJSON (2011) JSON parser and generator. Available from: http://rapidjson.org/
  45. RE2 (2010) Regular Expression Engine. Available from: https://github.com/google/re2
  46. Rexgrep (2012) Graphical Interface to the UNIX grep command. Available from: https://github.com/mstoilov/rpatk
  47. SNORT (1998) Network Intrusion Detection and Prevention System. Available from: https://www.snort.org/
  48. Sustik MA, Moore JS (2007) String searching over small alphabets. Computer Science Department. University of Texas at Austin, AustinGoogle Scholar
  49. Tsai HJ, Chen CC, Peng YC, Tsao YH, Chiang YN, Zhao WC et al (2017) A Flexible wildcard-pattern matching accelerator via simultaneous discrete finite automata. IEEE Trans Very Large Scale Integr VLSI Syst 25(12):3302–3316CrossRefGoogle Scholar
  50. Tuck N, Sherwood T, Calder B, Varghese G (2004) Deterministic memory-efficient string matching algorithms for intrusion detection. In: INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (vol 4, pp 2628–2639). IEEEGoogle Scholar
  51. Vasiliadis G, Polychronakis M, Antonatos S, Markatos EP, Ioannidis S (2009) Regular expression matching on graphics hardware for intrusion detection. In: International Workshop on Recent Advances in Intrusion Detection (pp. 265–283). Springer, Berlin, HeidelbergGoogle Scholar
  52. Wang K, Li J (2013) Towards fast regular expression matching in practice. ACM SIGCOMM Computer Communication Review (vol 43, 4, pp 531–532). ACM, New YorkGoogle Scholar
  53. Wang K, Fu Z, Hu X, Li J (2014) Practical regular expression matching free of scalability and performance barriers. Comput Commun 54:97–119CrossRefGoogle Scholar
  54. Wang H, Zhang Z, Taleb T (2018) Special issue on security and privacy of IoT. World Wide Web 21(1):1–6CrossRefGoogle Scholar
  55. Wu S, Manber U (1994) A fast algorithm for multi-pattern searchingGoogle Scholar
  56. Xu Y, Jiang J, Wei R, Song Y, Chao HJ (2014) TFA: a tunable finite automaton for pattern matching in network intrusion detection systems. IEEE J Sel Areas Commun 32(10):1810–1821CrossRefGoogle Scholar
  57. YAJL (2007) JSON parsing library. Available from: https://github.com/lloyd/yajl
  58. YAML (2001) YAML Ain’t Markup Language. Available from: http://yaml.org/
  59. Yu F, Chen Z, Diao Y, Lakshman TV, Katz RH (2006) Fast and memory-efficient regular expression matching for deep packet inspection. In: Architecture for Networking and Communications systems, 2006. ANCS 2006. ACM/IEEE Symposium on (pp. 93–102). IEEEGoogle Scholar
  60. Yu X, Lin B, Becchi M (2014) Revisiting state blow-up: Automatically building augmented-fa while preserving functional equivalence. IEEE J Sel Areas Commun 32(10):1822–1833CrossRefGoogle Scholar
  61. Zhang Y, Shen Y, Wang H, Yong J, Jiang X (2016) On secure wireless communications for IoT under eavesdropper collusion. IEEE Trans Autom Sci Eng 13(3):1281–1293CrossRefGoogle Scholar
  62. Jackson (2008). Available from: https://github.com/FasterXML/jackson

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Victoria UniversityMelbourneAustralia
  2. 2.Department of Electronic EngineeringFatima Jinnah Women University (FJWU)The Mall, RawalpindiPakistan
  3. 3.College of Computer Sciences and Information TechnologyKing Faisal UniversityAl-AhsaSaudi Arabia
  4. 4.National University of Sciences and TechnologyIslamabadPakistan
  5. 5.Fontbonne UniversitySaint LouisUSA

Personalised recommendations