Syntactical Similarity Learning by Means of Grammatical Evolution

  • Alberto Bartoli
  • Andrea De Lorenzo
  • Eric MedvetEmail author
  • Fabiano Tarlao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9921)


Several research efforts have shown that a similarity function synthesized from examples may capture an application-specific similarity criterion in a way that fits the application needs more effectively than a generic distance definition. In this work, we propose a similarity learning algorithm tailored to problems of syntax-based entity extraction from unstructured text streams. The algorithm takes in input pairs of strings along with an indication of whether they adhere or not adhere to the same syntactic pattern. Our approach is based on Grammatical Evolution and explores systematically a similarity definition space including all functions that may be expressed with a specialized, simple language that we have defined for this purpose. We assessed our proposal on patterns representative of practical applications. The results suggest that the proposed approach is indeed feasible and that the learned similarity function is more effective than the Levenshtein distance and the Jaccard similarity index.


Distance learning Entity extraction String patterns 



We are grateful to Michele Furlanetto who contributed in the implementation of our proposed method.


  1. 1.
    Yang, L., Jin, R.: Distance metric learning: a comprehensive survey. Michigan State Universiy 2 (2006)Google Scholar
  2. 2.
    Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data (2013). arXiv preprint arXiv:1306.6709
  4. 4.
    Fernau, H.: Algorithms for learning regular expressions from positive data. Inf. Comput. 207(4), 521–541 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Cicchello, O., Kremer, S.C.: Inducing grammars from sparse data sets: a survey of algorithms and results. J. Mach. Learn. Res. 4, 603–632 (2003)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Cetinkaya, A.: Regular expression generation through grammatical evolution. In: Proceedings of the 2007 GECCO Conference Companion on Genetic and Evolutionary Computation, pp. 2643–2646. ACM (2007)Google Scholar
  7. 7.
    Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Jagadish, H.: Regular expression learning for information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 21–30. Association for Computational Linguistics (2008)Google Scholar
  8. 8.
    Brauer, F., Rieger, R., Mocan, A., Barczynski, W.M.: Enabling information extraction by inference of regular expressions from sample entities. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1285–1294. ACM (2011)Google Scholar
  9. 9.
    Murthy, K., P., D., Deshpande, P.M.: Improving recall of regular expressions for information extraction. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 455–467. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Bartoli, A., Davanzo, G., De Lorenzo, A., Mauri, M., Medvet, E., Sorio, E.: Automatic generation of regular expressions from examples with genetic programming. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1477–1478. ACM (2012)Google Scholar
  11. 11.
    Bartoli, A., Davanzo, G., De Lorenzo, A., Medvet, E., Sorio, E.: Automatic synthesis of regular expressions from examples. Computer 12, 72–80 (2014)CrossRefGoogle Scholar
  12. 12.
    Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Learning text patterns using separate-and-conquer genetic programming. In: Machado, P., et al. (eds.) Genetic Programming, vol. 9025, pp. 16–27. Springer, Cham (2015)Google Scholar
  13. 13.
    Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Active learning approaches for learning regular expressions with genetic programming. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 97–102. ACM (2016)Google Scholar
  14. 14.
    Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Inference of regular expressions for text extraction from examples. IEEE Trans. Knowl. Data Eng. 28(5), 1217–1230 (2016)CrossRefGoogle Scholar
  15. 15.
    Megano, T., Fukui, K.i., Numao, M., Ono, S.: Evolutionary multi-objective distance metric learning for multi-label clustering. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 2945–2952. IEEE (2015)Google Scholar
  16. 16.
    Stahl, A., Gabel, T.: Using evolution programs to learn local similarity measures. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 537–551. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  17. 17.
    Xiong, N., Funk, P.: Building similarity metrics reflecting utility in case-based reasoning. J. Intell. Fuzzy Syst. 17(4), 407–416 (2006)zbMATHGoogle Scholar
  18. 18.
    Xiong, N.: Learning fuzzy rules for similarity assessment in case-based reasoning. Expert Syst. Appl. 38(9), 10780–10786 (2011)CrossRefGoogle Scholar
  19. 19.
    Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems (NIPS), p. 41 (2004)Google Scholar
  20. 20.
    Xiong, S., Pei, Y., Rosales, R., Fern, X.Z.: Active learning from relative comparisons. IEEE Trans. Knowl. Data Eng. 27(12), 3166–3175 (2015)CrossRefGoogle Scholar
  21. 21.
    Hao, S., Zhao, P., Hoi, S.C., Miao, C.: Learning relative similarity from data streams: active online learning approaches. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1181–1190. ACM (2015)Google Scholar
  22. 22.
    Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  23. 23.
    O’Neill, M., Ryan, C.: Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Alberto Bartoli
    • 1
  • Andrea De Lorenzo
    • 1
  • Eric Medvet
    • 1
    Email author
  • Fabiano Tarlao
    • 1
  1. 1.Department of Engineering and ArchitectureUniversity of TriesteTriesteItaly

Personalised recommendations