Advertisement

Adaptive Rule Adaptation in Unstructured and Dynamic Environments

  • Alireza TabebordbarEmail author
  • Amin Beheshti
  • Boualem Benatallah
  • Moshe Chai Barukh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11881)

Abstract

Rule-based systems have been used to augment machine learning based algorithms for annotating data in unstructured and dynamic environments. Rules can alleviate many of shortcomings inherent in pure algorithmic approaches. Rule adaptation is a challenging and error-prone task: in a rule-based system, there is a need for an analyst to adapt rules in order to keep them applicable and precise. In this paper, we present an approach for adapting data annotation rules in unstructured and constantly changing environments. Our approach offloads analysts from adapting rules and autonomically identifies the optimal modification for rules using a Bayesian multi-armed-bandit algorithm. We conduct experiments on different curation domains and compare the performance of our approach with systems relying on analysts. The experimental results show a comparative performance of our approach compared to analysts in adapting rules.

Keywords

Rule adaptation Data annotation Rule based systems Data curation 

Notes

Acknowledgements

We Acknowledge the AI-enabled Processes (AIP) Research Centre for funding part of this research.

We Acknowledge the Data to Decisions CRC (D2D CRC) and the Cooperative Research Centres Program for funding part of this research.

References

  1. 1.
    Anderson, M.R., Cafarella, M., Jiang, Y., Wang, G., Zhang, B.: An integrated development environment for faster feature engineering. Proc. VLDB Endowment 7(13), 1657–1660 (2014)CrossRefGoogle Scholar
  2. 2.
    Bak, P., Dolev, D., Yatzkar-Haham, T.: Rule adjustment by visualization of physical location data, 11 September 2014. US Patent App. 14/483,158Google Scholar
  3. 3.
    Beheshti, A., Benatallah, B., Nouri, R., Chhieng, V.M., Xiong, H., Zhao, X.: CoreDB: a data lake service. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 06–10 November 2017, pp. 2451–2454 (2017)Google Scholar
  4. 4.
    Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A.: CoreKG: a knowledge lake service. PVLDB 11(12), 1942–1945 (2018)Google Scholar
  5. 5.
    Beheshti, A., Benatallah, B., Tabebordbar, A., Motahari-Nezhad, H.R., Barukh, M.C., Nouri, R.: Datasynapse: a social data curation foundry. Distrib. Parallel Databases 37(3), 351–384 (2019)CrossRefGoogle Scholar
  6. 6.
    Beheshti, A., Vaghani, K., Benatallah, B., Tabebordbar, A.: Crowdcorrect: a curation pipeline for social data cleansing and curation. In: Proceedings of the Information Systems in the Big Data Era - CAiSE Forum 2018, Tallinn, Estonia, 11–15 June 2018, pp. 24–38 (2018)Google Scholar
  7. 7.
    Beheshti, S., Benatallah, B., Venugopal, S., Ryu, S.H., Motahari-Nezhad, H.R., Wang, W.: A systematic review and comparative analysis of cross-document coreference resolution methods and tools. Computing 99(4), 313–349 (2017)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Beheshti, S., Tabebordbar, A., Benatallah, B., Nouri, B.: On automating basic data curation tasks. In: Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017, pp. 165–169 (2017)Google Scholar
  9. 9.
    Burtini, G., Loeppky, J., Lawrence, R.: Improving online marketing experiments with drifting multi-armed bandits. In: ICEIS 1, pp. 630–636 (2015)Google Scholar
  10. 10.
    Clement, B., Roy, D., Oudeyer, P.-Y., Lopes, M.: Online optimization of teaching sequences with multi-armed bandits. In: 7th International Conference on Educational Data Mining (2014)Google Scholar
  11. 11.
    Paul Suganthan, G.C., et al.: Why big data industrial systems need rules and what we can do about it. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 265–276. ACM (2015)Google Scholar
  12. 12.
    Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. PVLDB 8(6), 654–665 (2015)Google Scholar
  13. 13.
    He, J., et al.: Interactive and deterministic data cleaning. In: Proceedings of the 2016 International Conference on Management of Data, pp. 893–907. ACM (2016)Google Scholar
  14. 14.
    Hunt, N., Tyrrell, S.: Stratified sampling. Retrieved November, 10:2012 (2001)Google Scholar
  15. 15.
    Kohavi, R., Longbotham, R., Sommerfield, D., Henne, R.M.: Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Disc. 18(1), 140–181 (2009)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Liu, B., Chiticariu, L., Chu, V., Jagadish, H., Reiss, F.: Refining information extraction rules using data provenance. IEEE Data Eng. Bull. 33(3), 17–24 (2010)Google Scholar
  17. 17.
    Liu, Y.-E., Mandel, T., Brunskill, E., Popovic, Z.: Trading off scientific knowledge and user learning with multi-armed bandits. In: EDM, pp. 161–168 (2014)Google Scholar
  18. 18.
    Milo, T., Novgorodov, S., Tan, W.-C.: Rudolf: interactive rule refinement system for fraud detection. Proc. VLDB Endowment 9(13), 1465–1468 (2016)CrossRefGoogle Scholar
  19. 19.
    Milo, T., Novgorodov, S., Tan, W.-C.: Interactive rule refinement for fraud detection. In: EDBT (2018)Google Scholar
  20. 20.
    Ortona, S., Meduri, V.V., Papotti, P.: Robust discovery of positive and negative rules in knowledge bases. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1168–1179. IEEE (2018)Google Scholar
  21. 21.
    Panahi, F., Wu, W., Doan, A., Naughton, J.F.: Towards interactive debugging of rule-based entity matching. In: EDBT, pp. 354–365 (2017)Google Scholar
  22. 22.
    Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. arXiv preprint arXiv:1711.10160 (2017)
  23. 23.
    Ratner, A.J., Bach, S.H., Ehrenberg, H.R., Ré, C.: Snorkel: fast training set generation for information extraction. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1683–1686. ACM (2017)Google Scholar
  24. 24.
    Rocchio, J.J.: Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing, pp. 313–323 (1971)Google Scholar
  25. 25.
    Russo, D., Van Roy, B., Kazerouni, A., Osband, I.: A tutorial on Thompson sampling. arXiv preprint arXiv:1707.02038 (2017)
  26. 26.
    Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: large-scale classification using machine learning, rules, and crowdsourcing. VLDB Endowment 7(13), 1529–1540 (2014)CrossRefGoogle Scholar
  27. 27.
    Tabebordbar, A., Beheshti, A.: Adaptive rule monitoring system. In: Proceedings of the 1st International Workshop on Software Engineering for Cognitive Services, SE4COG@ICSE 2018, Gothenburg, Sweden, 28–2 May 2018, pp. 45–51 (2018)Google Scholar
  28. 28.
    Volkovs, M., Chiang, F., Szlichta, F., Miller, R.J.: Continuous data cleaning. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 244–255. IEEE (2014)Google Scholar
  29. 29.
    Williams, J.J., et al.: Axis: generating explanations at scale with learnersourcing and machine learning. In: ACM Conference on Learning@ Scale, pp. 379–388. ACM (2016)Google Scholar
  30. 30.
    Xie, J., Sun, C., Yang, F., Rampalli, N.: Automatic rule coaching, 2 September 2014. US Patent App. 14/475,470Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Alireza Tabebordbar
    • 1
    Email author
  • Amin Beheshti
    • 2
  • Boualem Benatallah
    • 1
  • Moshe Chai Barukh
    • 1
  1. 1.University of New South WalesSydneyAustralia
  2. 2.Macquarie UniversitySydneyAustralia

Personalised recommendations