Using semantic roles to improve text classification in the requirements domain
Engineering activities often produce considerable documentation as a by-product of the development process. Due to their complexity, technical analysts can benefit from text processing techniques able to identify concepts of interest and analyze deficiencies of the documents in an automated fashion. In practice, text sentences from the documentation are usually transformed to a vector space model, which is suitable for traditional machine learning classifiers. However, such transformations suffer from problems of synonyms and ambiguity that cause classification mistakes. For alleviating these problems, there has been a growing interest in the semantic enrichment of text. Unfortunately, using general-purpose thesaurus and encyclopedias to enrich technical documents belonging to a given domain (e.g. requirements engineering) often introduces noise and does not improve classification. In this work, we aim at boosting text classification by exploiting information about semantic roles. We have explored this approach when building a multi-label classifier for identifying special concepts, called domain actions, in textual software requirements. After evaluating various combinations of semantic roles and text classification algorithms, we found that this kind of semantically-enriched data leads to improvements of up to 18% in both precision and recall, when compared to non-enriched data. Our enrichment strategy based on semantic roles also allowed classifiers to reach acceptable accuracy levels with small training sets. Moreover, semantic roles outperformed Wikipedia- and WordNET-based enrichments, which failed to boost requirements classification with several techniques. These results drove the development of two requirements tools, which we successfully applied in the processing of textual use cases.
KeywordsText classification Natural language processing Knowledge representation Semantic enrichment Use case specification
This work was partially supported by ANPCyT (Argentina) through PICT Project 2015 No. 2565. The authors are grateful to the doctoral students that helped to manually tag the sentences of the case-studies with DAs. The authors would like to make a special mention to Paula Frade, Miguel Ruival, German Attanasio and Rodrigo Gonzalez for testing the DA classifier and helping us to make adjustments to the implementation. The authors also thank the anonymous reviewers for their feedback that helped to improve the quality of the manuscript.
- Bai, R., Wang, X., & Liao, J. (2010). Extract semantic information from wordnet to improve text classification performance. In T. H. Kim & H. Adeli (Eds.), Advances in computer science and information technology, Lecture notes in computer science (Vol. 6059, pp. 409–420). Berlin: Springer. https://doi.org/10.1007/978-3-642-13577-4_36.CrossRefGoogle Scholar
- Björkelund, A., Bohnet, B., Hafdell, L., & Nugues, P. (2010). A high-performance syntactic and semantic dependency parser. 23rd International conference on computational linguistics: Demonstrations (COLING ’10) (pp. 33–36). Beijing: Association for Computational Linguistics.Google Scholar
- Compeau, P., & Pevzner, P. (2015). Bioinformatics algorithms: An active learning approach (2nd ed.). San Diego: Active Learning Publishers.Google Scholar
- Huang, L. (2011). Concept-based text clustering. Doctoral thesis, The University of Waikato, HamiltonGoogle Scholar
- Hull, E., Jackson, K., & Dick, J. (2014). Requirements engineering (3rd ed.). Berlin: Springer.Google Scholar
- Kamalrudin M., Hosking J. G., & Grundy, J. (2011). Improving requirements quality using essential use case interaction patterns. In ICSE’11, Hawaii (pp. 531–540). https://doi.org/10.1145/1985793.1985866.
- Kelleher, J. D., Namee, B. M., & D’Arcy, A. (2015). Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies (1st ed.). Cambridge: MIT Press.Google Scholar
- Mahmoud A., & Carver, D. (2015). Exploiting online human knowledge in requirements engineering. In 23rd International requirements engineering conference (RE’15), IEEE (pp. 262–267). https://doi.org/10.1109/RE.2015.7320434
- Mansuy T., & Hilderman, R. J. (2006). A characterization of wordnet features in Boolean models for text classification. In 5th Australasian conference on data mining and analystics (AusDM’06) (Vol. 61, pp. 103–109).Google Scholar
- Mund, J, Fernandez, D, M., Femmer, H., & Eckhardt, J. (2015) Does quality of requirements specifications matter? Combined results of two empirical studies. In ACM/IEEE international symposium on empirical software engineering and measurement (ESEM’15) (pp. 1–10). https://doi.org/10.1109/ESEM.2015.7321195.
- Navigli, R., Faralli, S., Soroa, A., de Lacalle, O., & Agirre, E. (2011). Two birds with one stone: Learning semantic models for text categorization and word sense disambiguation. In 20th ACM international conference on information and knowledge management (CIKM’11) (pp. 2317–2320). https://doi.org/10.1145/2063576.2063955.
- Nguyen, T. H., Grundy, J., & Almorsy, M. (2015). Rule-based extraction of goal-use case models from text. In 10th Joint meeting on foundations of software engineering (FSE’2015) (pp. 591–601). https://doi.org/10.1145/2786805.2786876.
- Palmer, M., Gildea, D., & Xue, N. (2010). Semantic role labeling. Synthesis lectures on human language technologies. San Rafael: Morgan & Claypool.Google Scholar
- Rago, A., Marcos, C., & Diaz-Pace, A. (2016c). Opportunities for analyzing hardware specifications with NLP techniques. In 3rd Workshop on design automation for understanding hardware designs (DUHDe’16), design, automation and test in Europe conference and exhibition (DATE’16), Dresden, Germany.Google Scholar
- Rosadini, B., Ferrari, A., Gori, G., Fantechi, A., Gnesi, S., Trotta, I., & Bacherini, S. (2017). Using NLP to detect requirements defects: An industrial experience in the railway domain. In chap 23rd International working conference REFSQ 2017, Essen, Germany, February 27–March 2, 2017, Proceedings (pp. 344–360). Springer International Publishing. https://doi.org/10.1007/978-3-319-54045-0_24.
- Roth, M., & Klein, E. (2015). Parsing software requirements with an ontology-based semantic role labeler. In 1st Workshop on language and ontologies at the 11th international conference on computational semantics (IWCS’15) (pp. 15–21). London, United Kingdom.Google Scholar
- Roth, M., Diamantopoulos, T., Klein, E., & Symeonidis, A. (2014). Software requirements: A new domain for semantic parsers. In Workshop on semantic parsing at the conference of the association for computational linguistics (ACL’14) (pp. 50–54). Baltimore, MD.Google Scholar
- Sengupta, S., Ramnani, R. R., Das, S., & Chandran, A. (2015). Verb-based semantic modelling and analysis of textual requirements. In 8th India software engineering conference (ISEC’15) (pp. 30–39). https://doi.org/10.1145/2723742.2723745.
- Sinha, A., Paradkar, A., Takeuchi, H., & Nakamura, T. (2010). Extending automated analysis of natural language use cases to other languages. In 18th IEEE international requirements engineering conference (RE’10) (pp. 364–369). https://doi.org/10.1109/RE.2010.52
- Szu-ting, Y. (2015). Robust semantic role labeling. United States: LAP Lambert Academic Publishing.Google Scholar
- Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 667–685). Boston, MA: Springer. https://doi.org/10.1007/978-0-387-09823-4_34.
- Wang, P., & Domeniconi, C. (2008). Building semantic kernels for text classification using wikipedia. In 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08) (pp. 713–721). https://doi.org/10.1145/1401890.1401976.
- Wiegers, K., & Beatty, J. (2013). Software requirements (3rd ed.). Developer best practices. Redmond, WA: Microsoft Press.Google Scholar
- Zhang, H. (2004). The optimality of naive bayes. In V. Barr & Z. Markov (Eds.), 17th International Florida Artificial Intelligence Research Society conference (FLAIRS 2004) (pp. 562–567). Miami Beach, FL: AAAI Press.Google Scholar