Skip to main content

Using k-Means for Redundancy and Inconsistency Detection: Application to Industrial Requirements

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Abstract

Requirements are usually “hand-written” and suffers from several problems like redundancy and inconsistency. These problems between requirements or sets of requirements impact negatively the success of final products. Manually processing these issues requires too much time and it is very costly. We propose in this paper to automatically handle redundancy and inconsistency issues in a classification approach. The main contribution of this paper is the use of k-means algorithm for redundancy and inconsistency detection in a new context, which is Requirements Engineering context. Also, we introduce a preprocessing step based on the Natural Language Processing techniques in order to see the impact of this latter to the k-means results. We use Part-Of-Speech (POS) tagging and noun chunking in order to detect technical business terms associated with the requirements documents that we analyze. We experiment this approach on real industrial datasets. The results show the efficiency of the k-means clustering algorithm, especially with the preprocessing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.standishgroup.com.

  2. 2.

    They studied 50,000 projects around the world, ranging from tiny enhancements to massive systems re-engineering implementations.

  3. 3.

    https://spacy.io/.

  4. 4.

    A noun chunk is a noun plus the words describing the noun.

  5. 5.

    http://www.semiosapp.com/index.php?lang=en.

References

  1. Hull, E., Jackson, K., Dick, J.: Requirements Engineering. Springer-Verlag, London (2011)

    Book  Google Scholar 

  2. Daniel, M., Berry, E.K., Krieger, M.M.: From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity (2003)

    Google Scholar 

  3. Galin, D.: Software Quality Assurance: From Theory to Implementation (2003)

    Google Scholar 

  4. Bourque, P.: Guide to the Software Engineering Body of Knowledge (SWEBOK Guide) (2004)

    Google Scholar 

  5. Glas, R.L.: Facts and Fallacies of Software Engineering. Addison-Wesley Professional, Reading (2002)

    Google Scholar 

  6. Stecklein, J.M., Dabney, J., Dick, B., Haskins, B., Lovell, R., Moroney, G.: Error cost escalation through the project life cycle. In: Proceedings of the 14th Annual International Symposium, Toulouse, France (2004)

    Google Scholar 

  7. Winkler, J., Vogelsang, A.: Automatic classification of requirements based on convolutional neural networks. In: 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW), pp. 39–45, September 2016

    Google Scholar 

  8. Knauss, E., Damian, D., Poo-Caamao, G., Cleland-Huang, J.: Detecting and classifying patterns of requirements clarifications. In: 2012 20th IEEE International Requirements Engineering Conference (RE), pp. 251–260, September 2012

    Google Scholar 

  9. Ott, D.: Automatic requirement categorization of large natural language specifications at mercedes-benz for review improvements. In: Doerr, J., Opdahl, A.L. (eds.) REFSQ 2013. LNCS, vol. 7830, pp. 50–64. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37422-7_4

    Chapter  Google Scholar 

  10. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)

    Article  Google Scholar 

  11. Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, and timelines: umass and tdt-3. In: Proceedings of Topic Detection and Tracking Workshop (TDT-3), Vienna, VA, pp. 167–174 (2000)

    Google Scholar 

  12. Juergens, E., Deissenboeck, F., Feilkas, M., Hummel, B., Schaetz, B., Wagner, S., Domann, C., Streit, J.: Can clone detection support quality assessments of requirements specifications? In: Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering, vol. 2. ICSE 2010, New York, USA, pp. 79–88. ACM (2010)

    Google Scholar 

  13. Falessi, D., Cantone, G., Canfora, G.: Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Trans. Softw. Eng. 39(1), 18–44 (2013)

    Article  Google Scholar 

  14. Rago, A., Marcos, C., Diaz-Pace, J.A.: Identifying duplicate functionality in textual use cases by aligning semantic actions. Softw. Syst. Model. 15(2), 579–603 (2016)

    Article  Google Scholar 

  15. Belsis, P., Koutoumanos, A., Sgouropoulou, C.: Pburc: a patterns-based, unsupervised requirements clustering framework for distributed agile software development. Requir. Eng. 19(2), 213–225 (2014)

    Article  Google Scholar 

  16. Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)

    Article  Google Scholar 

  17. Dermeval, D., Vilela, J., Bittencourt, I.I., Castro, J., Isotani, S., Brito, P., Silva, A.: Applications of ontologies in requirements engineering: a systematic review of the literature. Requir. Eng. 21(4), 405–437 (2016)

    Article  Google Scholar 

  18. Abad, Z.S.H., Karras, O., Ghazi, P., Glinz, M., Ruhe, G., Schneider, K.: What works better? a study of classifying requirements. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), pp. 496–501, September 2017

    Google Scholar 

  19. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1. ACL 2003, Stroudsburg, PA, USA, pp. 423–430. Association for Computational Linguistics (2003)

    Google Scholar 

  20. Fu, X., Ch’ng, E., Aickelin, U., See, S.: CRNN: a joint neural network for redundancy detection. In: 2017 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–8, May 2017

    Google Scholar 

  21. Mohajer, M., Englmeier, K.H., Schmid, V.J.: A comparison of gap statistic definitions with and without logarithm function (2010)

    Google Scholar 

Download references

Acknowledgements

This work is financially supported by the Occitanie region of France in the framework of CLE (Contrat de recherche Laboratoires-Entreprises)-ELENAA (des Exigences en LanguEs Naturelles à leurs Analyses Automatiques) project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manel Mezghani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mezghani, M., Kang, J., Sèdes, F. (2018). Using k-Means for Redundancy and Inconsistency Detection: Application to Industrial Requirements. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91947-8_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91946-1

  • Online ISBN: 978-3-319-91947-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics