Using Clustering to Improve the Structure of Natural Language Requirements Documents

  • Alessio Ferrari
  • Stefania Gnesi
  • Gabriele Tolomei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7830)


[Context and motivation] System requirements are normally provided in the form of natural language documents. Such documents need to be properly structured, in order to ease the overall uptake of the requirements by the readers of the document. A structure that allows a proper understanding of a requirements document shall satisfy two main quality attributes: (i) requirements relatedness: each requirement is conceptually connected with the requirements in the same section; (ii) sections independence: each section is conceptually separated from the others. [Question/Problem] Automatically identifying the parts of the document that lack requirements relatedness and sections independence may help improve the document structure. [Principal idea/results] To this end, we define a novel clustering algorithm named Sliding Head-Tail Component (S-HTC). The algorithm groups together similar requirements that are contiguous in the requirements document. We claim that such algorithm allows discovering the structure of the document in the way it is perceived by the reader. If the structure originally provided by the document does not match the structure discovered by the algorithm, hints are given to identify the parts of the document that lack requirements relatedness and sections independence. [Contribution] We evaluate the effectiveness of the algorithm with a pilot test on a requirements standard of the railway domain (583 requirements).


Requirements analysis requirements documents structure requirements quality similarity-based clustering lexical clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 305–316. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Berry, D.M., Bucchiarone, A., Gnesi, S., Lami, G., Trentanni, G.: A new quality model for natural language requirements specifications. In: Proc. of REFSQ 2006, pp. 115–128 (2006)Google Scholar
  3. 3.
    CENELEC: EN 50128, Railway applications - Communications, signalling and processing systems - Software for railway control and protection systems (2011)Google Scholar
  4. 4.
    Cleland-Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A machine learning approach for tracing regulatory codes to product specific requirements. In: Proc. of ICSE 2010, vol. 1, pp. 155–164. ACM, New York (2010)Google Scholar
  5. 5.
    Natt och Dag, J., Gervasi, V., Brinkkemper, S., Regnell, B.: A linguistic-engineering approach to large-scale requirements management. IEEE Software 22, 32–39 (2005)CrossRefGoogle Scholar
  6. 6.
    Falessi, D., Cantone, G., Canfora, G.: Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Transactions on Software Engineering PP(99) (2011)Google Scholar
  7. 7.
    Ferrari, A., Gnesi, S., Tolomei, G.: A clustering-based approach for discovering flaws in requirements specifications. In: Proceedings of ACM SAC 2012, pp. 1043–1050 (2012)Google Scholar
  8. 8.
    Gervasi, V., Nuseibeh, B.: Lightweight validation of natural language requirements. Software: Practice and Experience 32(2), 113–133 (2002)zbMATHCrossRefGoogle Scholar
  9. 9.
    Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans. Software Eng. 32(1), 4–19 (2006)CrossRefGoogle Scholar
  10. 10.
    IEEE: Std 830-1998 - Recommended Practice for Software Requirements Specifications (1998)Google Scholar
  11. 11.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  12. 12.
    Lucchese, C., Orlando, S., Perego, R., Silvestri, F., Tolomei, G.: Identifying task-based sessions in search engine query logs. In: Proc. of WSDM 2011, pp. 277–286. ACM, New York City (2011)Google Scholar
  13. 13.
    Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Proc. of DRR 2003, pp. 197–207 (2003)Google Scholar
  14. 14.
    MIL: Std 498 - Software Development and Documentation (1994)Google Scholar
  15. 15.
    Park, S., Kim, H., Ko, Y., Seo, J.: Implementation of an efficient requirements-analysis supporting system using similarity measure techniques. IST 42, 429–438 (2000)Google Scholar
  16. 16.
    Pohl, K.: Requirements Engineering: Fundamentals, Principles, and Techniques. Springer (2010)Google Scholar
  17. 17.
    Rauf, R., Antkiewicz, M., Czarnecki, K.: Logical structure extraction from software requirements documents. In: Proc. of IEEE RE 2011, pp. 101–110. IEEE Computer Society, Washington, DC (2011)Google Scholar
  18. 18.
    Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2005)Google Scholar
  19. 19.
    UIC - International Union of Railways: EIRENE Functional Requirements Specification v.7 (2006),
  20. 20.
    Wilson, W.M., Rosenberg, L.H., Hyatt, L.E.: Automated analysis of requirement specifications. In: Proc. of ICSE 1997, pp. 161–171. ACM Press, New York (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Alessio Ferrari
    • 1
  • Stefania Gnesi
    • 1
  • Gabriele Tolomei
    • 2
  1. 1.ISTI-CNRPisaItaly
  2. 2.DAISUniversità Ca’ Foscari VeneziaItaly

Personalised recommendations