Abstract
Requirements are usually “hand-written” and suffers from several problems like redundancy and inconsistency. These problems between requirements or sets of requirements impact negatively the success of final products. Manually processing these issues requires too much time and it is very costly. We propose in this paper to automatically handle redundancy and inconsistency issues in a classification approach. The main contribution of this paper is the use of k-means algorithm for redundancy and inconsistency detection in a new context, which is Requirements Engineering context. Also, we introduce a preprocessing step based on the Natural Language Processing techniques in order to see the impact of this latter to the k-means results. We use Part-Of-Speech (POS) tagging and noun chunking in order to detect technical business terms associated with the requirements documents that we analyze. We experiment this approach on real industrial datasets. The results show the efficiency of the k-means clustering algorithm, especially with the preprocessing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
They studied 50,000 projects around the world, ranging from tiny enhancements to massive systems re-engineering implementations.
- 3.
- 4.
A noun chunk is a noun plus the words describing the noun.
- 5.
References
Hull, E., Jackson, K., Dick, J.: Requirements Engineering. Springer-Verlag, London (2011)
Daniel, M., Berry, E.K., Krieger, M.M.: From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity (2003)
Galin, D.: Software Quality Assurance: From Theory to Implementation (2003)
Bourque, P.: Guide to the Software Engineering Body of Knowledge (SWEBOK Guide) (2004)
Glas, R.L.: Facts and Fallacies of Software Engineering. Addison-Wesley Professional, Reading (2002)
Stecklein, J.M., Dabney, J., Dick, B., Haskins, B., Lovell, R., Moroney, G.: Error cost escalation through the project life cycle. In: Proceedings of the 14th Annual International Symposium, Toulouse, France (2004)
Winkler, J., Vogelsang, A.: Automatic classification of requirements based on convolutional neural networks. In: 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW), pp. 39–45, September 2016
Knauss, E., Damian, D., Poo-Caamao, G., Cleland-Huang, J.: Detecting and classifying patterns of requirements clarifications. In: 2012 20th IEEE International Requirements Engineering Conference (RE), pp. 251–260, September 2012
Ott, D.: Automatic requirement categorization of large natural language specifications at mercedes-benz for review improvements. In: Doerr, J., Opdahl, A.L. (eds.) REFSQ 2013. LNCS, vol. 7830, pp. 50–64. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37422-7_4
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)
Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, bounds, and timelines: umass and tdt-3. In: Proceedings of Topic Detection and Tracking Workshop (TDT-3), Vienna, VA, pp. 167–174 (2000)
Juergens, E., Deissenboeck, F., Feilkas, M., Hummel, B., Schaetz, B., Wagner, S., Domann, C., Streit, J.: Can clone detection support quality assessments of requirements specifications? In: Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering, vol. 2. ICSE 2010, New York, USA, pp. 79–88. ACM (2010)
Falessi, D., Cantone, G., Canfora, G.: Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Trans. Softw. Eng. 39(1), 18–44 (2013)
Rago, A., Marcos, C., Diaz-Pace, J.A.: Identifying duplicate functionality in textual use cases by aligning semantic actions. Softw. Syst. Model. 15(2), 579–603 (2016)
Belsis, P., Koutoumanos, A., Sgouropoulou, C.: Pburc: a patterns-based, unsupervised requirements clustering framework for distributed agile software development. Requir. Eng. 19(2), 213–225 (2014)
Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Dermeval, D., Vilela, J., Bittencourt, I.I., Castro, J., Isotani, S., Brito, P., Silva, A.: Applications of ontologies in requirements engineering: a systematic review of the literature. Requir. Eng. 21(4), 405–437 (2016)
Abad, Z.S.H., Karras, O., Ghazi, P., Glinz, M., Ruhe, G., Schneider, K.: What works better? a study of classifying requirements. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), pp. 496–501, September 2017
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1. ACL 2003, Stroudsburg, PA, USA, pp. 423–430. Association for Computational Linguistics (2003)
Fu, X., Ch’ng, E., Aickelin, U., See, S.: CRNN: a joint neural network for redundancy detection. In: 2017 IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–8, May 2017
Mohajer, M., Englmeier, K.H., Schmid, V.J.: A comparison of gap statistic definitions with and without logarithm function (2010)
Acknowledgements
This work is financially supported by the Occitanie region of France in the framework of CLE (Contrat de recherche Laboratoires-Entreprises)-ELENAA (des Exigences en LanguEs Naturelles à leurs Analyses Automatiques) project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Mezghani, M., Kang, J., Sèdes, F. (2018). Using k-Means for Redundancy and Inconsistency Detection: Application to Industrial Requirements. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)