Abstract
Rule-based systems have been used to augment machine learning based algorithms for annotating data in unstructured and dynamic environments. Rules can alleviate many of shortcomings inherent in pure algorithmic approaches. Rule adaptation is a challenging and error-prone task: in a rule-based system, there is a need for an analyst to adapt rules in order to keep them applicable and precise. In this paper, we present an approach for adapting data annotation rules in unstructured and constantly changing environments. Our approach offloads analysts from adapting rules and autonomically identifies the optimal modification for rules using a Bayesian multi-armed-bandit algorithm. We conduct experiments on different curation domains and compare the performance of our approach with systems relying on analysts. The experimental results show a comparative performance of our approach compared to analysts in adapting rules.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Annotates data with a precision below \(\jmath \).
- 2.
As feature \(f_1\) is the root feature it annotates data above the average, thus satisfies the restriction condition.
- 3.
Features \(\{f_3,f_4\}\) are siblings for feature \(f_2\).
- 4.
Annotates data below the average number of items annotated with its siblings.
- 5.
- 6.
@lucianaberger: Mental health services facing serious shortages of mental health nurses decrease of 12% since 2010 psychiatrists.
- 7.
If i have to hire a car and drive home from belgium i am going to go mental stupid french air traffic control wanks on strike.
- 8.
We set the value of \(\epsilon \) and Q, experimentally using simulated data.
References
Anderson, M.R., Cafarella, M., Jiang, Y., Wang, G., Zhang, B.: An integrated development environment for faster feature engineering. Proc. VLDB Endowment 7(13), 1657–1660 (2014)
Bak, P., Dolev, D., Yatzkar-Haham, T.: Rule adjustment by visualization of physical location data, 11 September 2014. US Patent App. 14/483,158
Beheshti, A., Benatallah, B., Nouri, R., Chhieng, V.M., Xiong, H., Zhao, X.: CoreDB: a data lake service. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 06–10 November 2017, pp. 2451–2454 (2017)
Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A.: CoreKG: a knowledge lake service. PVLDB 11(12), 1942–1945 (2018)
Beheshti, A., Benatallah, B., Tabebordbar, A., Motahari-Nezhad, H.R., Barukh, M.C., Nouri, R.: Datasynapse: a social data curation foundry. Distrib. Parallel Databases 37(3), 351–384 (2019)
Beheshti, A., Vaghani, K., Benatallah, B., Tabebordbar, A.: Crowdcorrect: a curation pipeline for social data cleansing and curation. In: Proceedings of the Information Systems in the Big Data Era - CAiSE Forum 2018, Tallinn, Estonia, 11–15 June 2018, pp. 24–38 (2018)
Beheshti, S., Benatallah, B., Venugopal, S., Ryu, S.H., Motahari-Nezhad, H.R., Wang, W.: A systematic review and comparative analysis of cross-document coreference resolution methods and tools. Computing 99(4), 313–349 (2017)
Beheshti, S., Tabebordbar, A., Benatallah, B., Nouri, B.: On automating basic data curation tasks. In: Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017, pp. 165–169 (2017)
Burtini, G., Loeppky, J., Lawrence, R.: Improving online marketing experiments with drifting multi-armed bandits. In: ICEIS 1, pp. 630–636 (2015)
Clement, B., Roy, D., Oudeyer, P.-Y., Lopes, M.: Online optimization of teaching sequences with multi-armed bandits. In: 7th International Conference on Educational Data Mining (2014)
Paul Suganthan, G.C., et al.: Why big data industrial systems need rules and what we can do about it. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 265–276. ACM (2015)
Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. PVLDB 8(6), 654–665 (2015)
He, J., et al.: Interactive and deterministic data cleaning. In: Proceedings of the 2016 International Conference on Management of Data, pp. 893–907. ACM (2016)
Hunt, N., Tyrrell, S.: Stratified sampling. Retrieved November, 10:2012 (2001)
Kohavi, R., Longbotham, R., Sommerfield, D., Henne, R.M.: Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Disc. 18(1), 140–181 (2009)
Liu, B., Chiticariu, L., Chu, V., Jagadish, H., Reiss, F.: Refining information extraction rules using data provenance. IEEE Data Eng. Bull. 33(3), 17–24 (2010)
Liu, Y.-E., Mandel, T., Brunskill, E., Popovic, Z.: Trading off scientific knowledge and user learning with multi-armed bandits. In: EDM, pp. 161–168 (2014)
Milo, T., Novgorodov, S., Tan, W.-C.: Rudolf: interactive rule refinement system for fraud detection. Proc. VLDB Endowment 9(13), 1465–1468 (2016)
Milo, T., Novgorodov, S., Tan, W.-C.: Interactive rule refinement for fraud detection. In: EDBT (2018)
Ortona, S., Meduri, V.V., Papotti, P.: Robust discovery of positive and negative rules in knowledge bases. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1168–1179. IEEE (2018)
Panahi, F., Wu, W., Doan, A., Naughton, J.F.: Towards interactive debugging of rule-based entity matching. In: EDBT, pp. 354–365 (2017)
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. arXiv preprint arXiv:1711.10160 (2017)
Ratner, A.J., Bach, S.H., Ehrenberg, H.R., Ré, C.: Snorkel: fast training set generation for information extraction. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1683–1686. ACM (2017)
Rocchio, J.J.: Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing, pp. 313–323 (1971)
Russo, D., Van Roy, B., Kazerouni, A., Osband, I.: A tutorial on Thompson sampling. arXiv preprint arXiv:1707.02038 (2017)
Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: large-scale classification using machine learning, rules, and crowdsourcing. VLDB Endowment 7(13), 1529–1540 (2014)
Tabebordbar, A., Beheshti, A.: Adaptive rule monitoring system. In: Proceedings of the 1st International Workshop on Software Engineering for Cognitive Services, SE4COG@ICSE 2018, Gothenburg, Sweden, 28–2 May 2018, pp. 45–51 (2018)
Volkovs, M., Chiang, F., Szlichta, F., Miller, R.J.: Continuous data cleaning. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 244–255. IEEE (2014)
Williams, J.J., et al.: Axis: generating explanations at scale with learnersourcing and machine learning. In: ACM Conference on Learning@ Scale, pp. 379–388. ACM (2016)
Xie, J., Sun, C., Yang, F., Rampalli, N.: Automatic rule coaching, 2 September 2014. US Patent App. 14/475,470
Acknowledgements
We Acknowledge the AI-enabled Processes (AIP) Research Centre for funding part of this research.
We Acknowledge the Data to Decisions CRC (D2D CRC) and the Cooperative Research Centres Program for funding part of this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tabebordbar, A., Beheshti, A., Benatallah, B., Barukh, M.C. (2019). Adaptive Rule Adaptation in Unstructured and Dynamic Environments. In: Cheng, R., Mamoulis, N., Sun, Y., Huang, X. (eds) Web Information Systems Engineering – WISE 2019. WISE 2020. Lecture Notes in Computer Science(), vol 11881. Springer, Cham. https://doi.org/10.1007/978-3-030-34223-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-34223-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34222-7
Online ISBN: 978-3-030-34223-4
eBook Packages: Computer ScienceComputer Science (R0)