Skip to main content

Characterizing Regulatory Documents and Guidelines Based on Text Mining

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems. OTM 2017 Conferences (OTM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10573))

Abstract

Implementing rules, constraints, and requirements contained in regulatory documents such as standards or guidelines constitutes a mandatory task for organizations and institutions across several domains. Due to the amount of domain-specific information and actions encoded in these documents, organizations often need to establish cooperations between several departments and consulting experts to guide managers and employees in eliciting compliance requirements. Providing computer-based guidance and support for this often costly and tedious compliance task is the aim of this paper. The presented methodology utilizes well-known text mining techniques and clustering algorithms to classify (families) of documents according to topics and to derive significant sentences which support users in understanding and implementing compliance-related documents. Applying the approach to collections of documents from the security and the medical domain demonstrates that text mining is a promising domain-independent mean to provide support to the understanding, extraction, and analysis of regulatory documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.wst.univie.ac.at/projects/sprint/index.php?t=tm.

  2. 2.

    https://www.R-project.org/.

  3. 3.

    https://CRAN.R-project.org/package=tm.

  4. 4.

    https://CRAN.R-project.org/package=openNLP.

References

  1. Castro-Herrera, C., Duan, C., Cleland-Huang, J., Mobasher, B.: A recommender system for requirements elicitation in large-scale software projects. In: Symposium on Applied Computing, pp. 1419–1426 (2009)

    Google Scholar 

  2. Deeptimahanti, D.K., Babar, M.A.: An automated tool for generating UML models from natural language requirements. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE 2009, pp. 680–682. IEEE Computer Society, Washington, DC (2009). http://dx.doi.org/10.1109/ASE.2009.48

  3. Dunkl, R., Fröschl, K.A., Grossmann, W., Rinderle-Ma, S.: Assessing medical treatment compliance based on formal process modeling. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 533–546. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25364-5_37

    Chapter  Google Scholar 

  4. Feinerer, I.: An introduction to text mining in R. R News 8(2), 19–22 (2008). http://CRAN.R-project.org/doc/Rnews/

    Google Scholar 

  5. Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in R. J. Stat. Softw. 25(5), 1–54 (2008)

    Article  Google Scholar 

  6. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  7. Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 482–496. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21640-4_36

    Chapter  Google Scholar 

  8. Garbe, C., Peris, K., Hauschild, A., Saiag, P., Middleton, M., Spatz, A., Grob, J.J., Malvehy, J., Newton-Bishop, J., Stratigos, A., et al.: Diagnosis and treatment of melanoma: European consensus-based interdisciplinary guideline. Eur. J. Cancer 46(2), 270–283 (2010)

    Article  Google Scholar 

  9. Ghose, A., Koliadis, G., Chueng, A.: Rapid business process discovery (R-BPD). In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS, vol. 4801, pp. 391–406. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75563-0_27

    Chapter  Google Scholar 

  10. Gomez, F., Segami, C., Delaune, C.: A system for the semiautomatic generation of E-R models from natural language specifications. Data Knowl. Eng. 29(1), 57–81 (1999). http://www.sciencedirect.com/science/article/pii/S0169023X98000329

    Article  MATH  Google Scholar 

  11. Hill, T., Lewicki, P.: Statistics: Methods and Applications: A Comprehensive Reference for Science, Industry, and Data Mining. StatSoft, Inc., Tulsa (2006)

    Google Scholar 

  12. Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical \(k\)-means clustering. Journal of Statistical Software 50(10), 1–22 (2012). http://www.jstatsoft.org/v50/i10

    Article  Google Scholar 

  13. Bank for International Settlements: Basel 3: International framework for liquidity risk measurement, standards and monitoring (2010)

    Google Scholar 

  14. Koliadis, G., Desai, N.V., Narendra, N.C., Ghose, A.K.: Analyst-mediated contextualization of regulatory policies. In: 2010 IEEE International Conference on Services Computing (SCC), pp. 281–288. IEEE (2010)

    Google Scholar 

  15. Leopold, H.: Natural Language in Business Process Models. Springer, Heidelberg (2013)

    Book  Google Scholar 

  16. IT Governance Ltd.: ISO 27001 Global Report (2016). http://pribatua.org/wp-content/uploads/2016/08/ISO27001-Global-Report-2016.pdf

  17. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  18. Meth, H., Brhel, M., Maedche, A.: The state of the art in automated requirements elicitation. Inf. Softw. Technol. 55(10), 1695–1709 (2013). https://doi.org/10.1016/j.infsof.2013.03.008

    Article  Google Scholar 

  19. More, P., Phalnikar, R.: Generating UML diagrams from natural language specifications. Int. J. Appl. Inf. Syst. 1(8), 19–23 (2012)

    Google Scholar 

  20. Omar, N., Hassan, R., Arshad, H., Sahran, S.: Automation of database design through semantic analysis. In: Proceedings of the 7th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics, CIMMACS, vol. 8, pp. 71–76 (2008)

    Google Scholar 

  21. Rinderle-Ma, S., Ma, Z., Madlmayr, B.: Using content analysis for privacy requirement extraction and policy formalization. In: Enterprise Modelling and Information Systems Architectures, pp. 93–107 (2015)

    Google Scholar 

  22. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  23. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  24. Thorndike, R.L.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karolin Winter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Winter, K., Rinderle-Ma, S., Grossmann, W., Feinerer, I., Ma, Z. (2017). Characterizing Regulatory Documents and Guidelines Based on Text Mining. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10573. Springer, Cham. https://doi.org/10.1007/978-3-319-69462-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69462-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69461-0

  • Online ISBN: 978-3-319-69462-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics