A novel JSON based regular expression language for pattern matching in the internet of things

Abstract

The Internet of Things work by constantly sensing the physical properties in the vicinity of the user such as ambient light, sounds, motion and temperature. These sensors produce huge volumes of data that has to be efficiently sifted for relevant events required triggering certain actions. In addition, filtering has to be performed to ensure that privacy-sensitive confidential data is not leaked. Efficient and expressive pattern matching is thus a key enabling technology for the full realization of ambient and humanized computing. The bulk of research in this area has focused on the use of specialized hardware and reducing of the memory footprint. Unfortunately, there has been limited work if any on optimizing the core elements of pattern matching- the regular expression language and the compilation process that is responsible for converting patterns into internal data structures. The importance of writing good REs so that on compilation they do not lead to unrealizable data structures is relatively less understood. In the proposed research, we empirically compare different RE processing engines and practically demonstrate that the compilation phase is highly memory intensive and time-consuming as compared to the matching phase -and hence is worth exploring for new techniques and optimizations. As a second important contribution, we propose a novel technique for defining regular expressions by utilizing JavaScript Object Notation. Our evaluation with carefully created patterns shows that the performance of the proposed technique is at par with competing approaches. It is also less ambiguous, extensible, more expressive and much appropriate for defining large and complex patterns.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  1. Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic search. Commun ACM 18(6):333–340

    MathSciNet  Article  MATH  Google Scholar 

  2. Antonello R, Fernandes S, Sadok D, Kelner J, Szabó G (2015) Design and optimizations for efficient regular expression matching in DPI systems. Comput Commun 61:103–120

    Article  Google Scholar 

  3. Becchi M (2008) Regex-processor. Available from: http://regex.wustl.edu

  4. Becchi M, Cadambi S (2007) Memory-efficient regular expression search using state merging. In: INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE (pp 1064–1072). IEEE

  5. Becchi M, Crowley P (2007) A hybrid finite automaton for practical deep packet inspection. In: Proceedings of the 2007 ACM CoNEXT conference (p 1). ACM, New York

  6. Becchi M, Crowley P (2008) Extending finite automata to efficiently match perl-compatible regular expressions. In: Proceedings of the 2008 ACM CoNEXT Conference (p. 25). ACM, New York

  7. Boyer RS, Moore JS (1977) A fast string searching algorithm. Commun ACM 20(10):762–772

    Article  MATH  Google Scholar 

  8. Chang YK, Li YS, Chen YT (2015) A memory efficient DFA using compression and pattern segmentation. Procedia Comput Sci 56:292–299

    Article  Google Scholar 

  9. Chen P, Desmet L, Huygens C (2014) A study on advanced persistent threats. In: IFIP International Conference on Communications and Multimedia Security (pp 63–72). Springer, Berlin, Heidelberg

  10. Coit CJ, Staniford S, McAlerney J (2001) Towards faster string matching for intrusion detection or exceeding the speed of snort. In: DARPA Information Survivability Conference & Exposition II, 2001. DISCEX’01. Proceedings (vol 1, pp 367–373). IEEE

  11. Commentz-Walter B (1979) A string matching algorithm fast on the average. In: International Colloquium on Automata, Languages, and Programming. Springer, Berlin, pp 118–132

    Google Scholar 

  12. Cormode G, Thottan M (eds) (2010) Algorithms for next generation networks. Springer Science & Business Media, New York

    Google Scholar 

  13. Eriksson M, Hallberg V (2011) Comparison between JSON and YAML for data serialization. Bachelor’s thesis

  14. Ficara D, Giordano S, Procissi G, Vitucci F, Antichi G, Di Pietro A (2008) An improved DFA for fast regular expression matching. ACM SIGCOMM Comput Commun Rev 38(5):29–40

    Article  Google Scholar 

  15. Ficara D, Di Pietro A, Giordano S, Procissi G, Vitucci F, Antichi G (2011) Differential encoding of DFAs for fast regular expression matching. IEEE/ACM Trans Netw 19(3):683–694

    Article  Google Scholar 

  16. Fisk M, Varghese G (2002) Applying fast string matching to intrusion detection. LOS ALAMOS NATIONAL LAB NM

  17. Flex (1987) Text Processing Tool.  Available from: http://flex.sourceforge.net/manual/

  18. Fu Z, Wang K, Cai L, Li J (2014) Intelligent grouping algorithms for regular expressions in deep inspection. In: Computer Communication and Networks (ICCCN), 2014 23rd International Conference on (pp. 1–8). IEEE

  19. GSON (2008) Google Gson (Open Source Java library). Available from: https://sites.google.com/site/gson/streaming

  20. HOCON (2011a) Human-optimized config object notation. Available from: https://github.com/typesafehub/config/blob/master/HOCON.md

  21. HOCON (2011b) Human-optimized config object notation. Available from: https://github.com/lightbend/config/blob/master/HOCON.md

  22. JSON (2002) JavaScript Object Notation. Available from: http://www.json.org/

  23. JsonCpp (2007) C++ library to manipulate JSON values. Available from: https://github.com/open-source-parsers/jsoncpp

  24. jsonlite (2013) JSON parser/generator. Available from: https://github.com/amamchur/jsonlite

  25. Kong S, Smith R, Estan C (2008) Efficient signature matching with multiple alphabet compression tables. In: Proceedings of the 4th international conference on Security and privacy in communication netowrks (p 1). ACM, New York

  26. Kumar S, Dharmapurikar S, Yu F, Crowley P, Turner J (2006). Algorithms to accelerate multiple regular expressions matching for deep packet inspection. ACM SIGCOMM Computer Communication Review (vol 36, 4, pp 339–350). ACM, New York

    Google Scholar 

  27. Kusswurm D (2014) Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, and AVX. Apress, New York

    Google Scholar 

  28. Liu T, Yang Y, Liu Y, Sun Y, Guo L (2011) An efficient regular expressions compression algorithm from a new perspective. In: INFOCOM, 2011 Proceedings IEEE (pp 2129–2137). IEEE

  29. Liu T, Liu AX, Shi J, Sun Y, Guo L (2014) Towards fast and optimal grouping of regular expressions via DFA size estimation. IEEE J Sel Areas Commun 32(10):1797–1809

    Article  Google Scholar 

  30. Luchaup D, Smith R, Estan C, Jha S (2009) Multi-byte regular expression matching with speculation. In: International Workshop on Recent Advances in Intrusion Detection (pp 284–303). Springer, Berlin, Heidelberg

  31. MIT DARPA (1999) Mitdarpa intrusion detection data sets. Available from: http://www.ll.mit.edu/mission/communications/ist/corpora/

  32. MongoDB (2009) Open-source cross-platform document-oriented database program. Available from: https://docs.mongodb.com/manual/reference/operator/query/regex/

  33. Moreira N, Reis R (eds) (2012) Implementation and application of automata: 17th International Conference, CIAA 2012, Porto, Portugal, July 17–20, 2012. Proceedings (vol 7381). Springer, New York

  34. Najam M, Younis U, Rasool RU (2014) Multi-byte Pattern Matching Using Stride-K DFA for High Speed Deep Packet Inspection. In: Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on (pp 547–553). IEEE

  35. Najam M, Younis U, & ur Rasool R (2015) Speculative parallel pattern matching using stride-k DFA for deep packet inspection. J Netw Comput Appl 54:78–87

    Article  Google Scholar 

  36. Nebel ME (2006) Fast string matching by using probabilities: on an optimal mismatch variant of Horspool’s algorithm. Theor Comput Sci 359(1–3):329–343

    MathSciNet  Article  MATH  Google Scholar 

  37. Nourian M, Wang X, Yu X, Feng WC, Becchi M (2017) Demystifying automata processing: GPUs, FPGAs or Micron’s AP? In: Proceedings of the International Conference on Supercomputing (p. 1). ACM, New York

  38. OpenDDL (2013) Open Data Description Language. Available from: http://openddl.org/

  39. OpenDDL (2017) Open Data Description Language. Available from: http://openddl.org/

  40. Patel J, Liu AX, Torng E (2014) Bypassing space explosion in high-speed regular expression matching. IEEE/ACM Trans Netw 22(6):1701–1714

    Article  Google Scholar 

  41. Peng M, Gao W, Wang H, Zhang Y, Huang J, Xie Q et al (2017) Parallelization of massive textstream compression based on compressed sensing. ACM Trans Inf Syst (TOIS) 36(2):17

    Article  Google Scholar 

  42. Perf (2009) Linux profiler. Available from: https://perf.wiki.kernel.org/

  43. Pintool (2012) A Dynamic Binary Instrumentation Tool. Available from: https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool

  44. RapidJSON (2011) JSON parser and generator. Available from: http://rapidjson.org/

  45. RE2 (2010) Regular Expression Engine. Available from: https://github.com/google/re2

  46. Rexgrep (2012) Graphical Interface to the UNIX grep command. Available from: https://github.com/mstoilov/rpatk

  47. SNORT (1998) Network Intrusion Detection and Prevention System. Available from: https://www.snort.org/

  48. Sustik MA, Moore JS (2007) String searching over small alphabets. Computer Science Department. University of Texas at Austin, Austin

    Google Scholar 

  49. Tsai HJ, Chen CC, Peng YC, Tsao YH, Chiang YN, Zhao WC et al (2017) A Flexible wildcard-pattern matching accelerator via simultaneous discrete finite automata. IEEE Trans Very Large Scale Integr VLSI Syst 25(12):3302–3316

    Article  Google Scholar 

  50. Tuck N, Sherwood T, Calder B, Varghese G (2004) Deterministic memory-efficient string matching algorithms for intrusion detection. In: INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (vol 4, pp 2628–2639). IEEE

  51. Vasiliadis G, Polychronakis M, Antonatos S, Markatos EP, Ioannidis S (2009) Regular expression matching on graphics hardware for intrusion detection. In: International Workshop on Recent Advances in Intrusion Detection (pp. 265–283). Springer, Berlin, Heidelberg

  52. Wang K, Li J (2013) Towards fast regular expression matching in practice. ACM SIGCOMM Computer Communication Review (vol 43, 4, pp 531–532). ACM, New York

    Google Scholar 

  53. Wang K, Fu Z, Hu X, Li J (2014) Practical regular expression matching free of scalability and performance barriers. Comput Commun 54:97–119

    Article  Google Scholar 

  54. Wang H, Zhang Z, Taleb T (2018) Special issue on security and privacy of IoT. World Wide Web 21(1):1–6

    Article  Google Scholar 

  55. Wu S, Manber U (1994) A fast algorithm for multi-pattern searching

  56. Xu Y, Jiang J, Wei R, Song Y, Chao HJ (2014) TFA: a tunable finite automaton for pattern matching in network intrusion detection systems. IEEE J Sel Areas Commun 32(10):1810–1821

    Article  Google Scholar 

  57. YAJL (2007) JSON parsing library. Available from: https://github.com/lloyd/yajl

  58. YAML (2001) YAML Ain’t Markup Language. Available from: http://yaml.org/

  59. Yu F, Chen Z, Diao Y, Lakshman TV, Katz RH (2006) Fast and memory-efficient regular expression matching for deep packet inspection. In: Architecture for Networking and Communications systems, 2006. ANCS 2006. ACM/IEEE Symposium on (pp. 93–102). IEEE

  60. Yu X, Lin B, Becchi M (2014) Revisiting state blow-up: Automatically building augmented-fa while preserving functional equivalence. IEEE J Sel Areas Commun 32(10):1822–1833

    Article  Google Scholar 

  61. Zhang Y, Shen Y, Wang H, Yong J, Jiang X (2016) On secure wireless communications for IoT under eavesdropper collusion. IEEE Trans Autom Sci Eng 13(3):1281–1293

    Article  Google Scholar 

  62. Jackson (2008). Available from: https://github.com/FasterXML/jackson

Download references

Acknowledgements

This research has been supported by DSR, King Faisal University, Saudi Arabia. We are grateful to Ms. Michela Becchi from Department of Electrical and Computer Engineering at The University of Missouri, Columbia for providing us with Regular Expression Processor. We are also thankful to Prof. Andrew A. Chien from Large Scale Systems Group of The University of Chicago for helpful discussions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Raihan ur Rasool.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been supported by DSR (Grant:160088), King Faisal University, Saudi Arabia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rasool, R.u., Najam, M., Ahmad, H.F. et al. A novel JSON based regular expression language for pattern matching in the internet of things. J Ambient Intell Human Comput 10, 1463–1481 (2019). https://doi.org/10.1007/s12652-018-0869-1

Download citation

Keywords

  • Deep packet inspection/Deep content inspection
  • Efficient matching
  • JavaScript Object Notation (JSON)
  • Pattern matching
  • Parsing
  • Regular expressions