Abstract
Implementing rules, constraints, and requirements contained in regulatory documents such as standards or guidelines constitutes a mandatory task for organizations and institutions across several domains. Due to the amount of domain-specific information and actions encoded in these documents, organizations often need to establish cooperations between several departments and consulting experts to guide managers and employees in eliciting compliance requirements. Providing computer-based guidance and support for this often costly and tedious compliance task is the aim of this paper. The presented methodology utilizes well-known text mining techniques and clustering algorithms to classify (families) of documents according to topics and to derive significant sentences which support users in understanding and implementing compliance-related documents. Applying the approach to collections of documents from the security and the medical domain demonstrates that text mining is a promising domain-independent mean to provide support to the understanding, extraction, and analysis of regulatory documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Castro-Herrera, C., Duan, C., Cleland-Huang, J., Mobasher, B.: A recommender system for requirements elicitation in large-scale software projects. In: Symposium on Applied Computing, pp. 1419–1426 (2009)
Deeptimahanti, D.K., Babar, M.A.: An automated tool for generating UML models from natural language requirements. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE 2009, pp. 680–682. IEEE Computer Society, Washington, DC (2009). http://dx.doi.org/10.1109/ASE.2009.48
Dunkl, R., Fröschl, K.A., Grossmann, W., Rinderle-Ma, S.: Assessing medical treatment compliance based on formal process modeling. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 533–546. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25364-5_37
Feinerer, I.: An introduction to text mining in R. R News 8(2), 19–22 (2008). http://CRAN.R-project.org/doc/Rnews/
Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in R. J. Stat. Softw. 25(5), 1–54 (2008)
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)
Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 482–496. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21640-4_36
Garbe, C., Peris, K., Hauschild, A., Saiag, P., Middleton, M., Spatz, A., Grob, J.J., Malvehy, J., Newton-Bishop, J., Stratigos, A., et al.: Diagnosis and treatment of melanoma: European consensus-based interdisciplinary guideline. Eur. J. Cancer 46(2), 270–283 (2010)
Ghose, A., Koliadis, G., Chueng, A.: Rapid business process discovery (R-BPD). In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS, vol. 4801, pp. 391–406. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75563-0_27
Gomez, F., Segami, C., Delaune, C.: A system for the semiautomatic generation of E-R models from natural language specifications. Data Knowl. Eng. 29(1), 57–81 (1999). http://www.sciencedirect.com/science/article/pii/S0169023X98000329
Hill, T., Lewicki, P.: Statistics: Methods and Applications: A Comprehensive Reference for Science, Industry, and Data Mining. StatSoft, Inc., Tulsa (2006)
Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical \(k\)-means clustering. Journal of Statistical Software 50(10), 1–22 (2012). http://www.jstatsoft.org/v50/i10
Bank for International Settlements: Basel 3: International framework for liquidity risk measurement, standards and monitoring (2010)
Koliadis, G., Desai, N.V., Narendra, N.C., Ghose, A.K.: Analyst-mediated contextualization of regulatory policies. In: 2010 IEEE International Conference on Services Computing (SCC), pp. 281–288. IEEE (2010)
Leopold, H.: Natural Language in Business Process Models. Springer, Heidelberg (2013)
IT Governance Ltd.: ISO 27001 Global Report (2016). http://pribatua.org/wp-content/uploads/2016/08/ISO27001-Global-Report-2016.pdf
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Meth, H., Brhel, M., Maedche, A.: The state of the art in automated requirements elicitation. Inf. Softw. Technol. 55(10), 1695–1709 (2013). https://doi.org/10.1016/j.infsof.2013.03.008
More, P., Phalnikar, R.: Generating UML diagrams from natural language specifications. Int. J. Appl. Inf. Syst. 1(8), 19–23 (2012)
Omar, N., Hassan, R., Arshad, H., Sahran, S.: Automation of database design through semantic analysis. In: Proceedings of the 7th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics, CIMMACS, vol. 8, pp. 71–76 (2008)
Rinderle-Ma, S., Ma, Z., Madlmayr, B.: Using content analysis for privacy requirement extraction and policy formalization. In: Enterprise Modelling and Information Systems Architectures, pp. 93–107 (2015)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Winter, K., Rinderle-Ma, S., Grossmann, W., Feinerer, I., Ma, Z. (2017). Characterizing Regulatory Documents and Guidelines Based on Text Mining. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10573. Springer, Cham. https://doi.org/10.1007/978-3-319-69462-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-69462-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69461-0
Online ISBN: 978-3-319-69462-7
eBook Packages: Computer ScienceComputer Science (R0)