Generalizability of Document Features for Identifying Rationale

Rogers, Benjamin; Justice, Connor; Mathur, Tanmay; Burge, Janet E.

doi:10.1007/978-3-319-44989-0_34

Benjamin Rogers²,
Connor Justice³,
Tanmay Mathur² &
…
Janet E. Burge³

1443 Accesses
5 Citations

Abstract

One of the challenges in using statistical machine learning for text mining is coming up with the right set of text features. We have developed a system that uses genetic algorithms (GAs) to evaluate candidate feature sets to classify sentences in a document. We have applied this tool to find design rationale (the reasons behind design decisions) in two different datasets to evaluate our approach for finding rationale and to see how features might differ for the same classification target in different types of data. We used Chrome bug reports and transcripts of design sessions. We found that we were able to get results with less overfitting by using a smaller set of features common to the set optimized for each document type.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

AuditMyPC (2010) Glossary of internet security. http://www.auditmypc.com/glossary-of-internet-security-terms.asp. Retrieved 23 Nov 2010
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30:107–117
Article Google Scholar
Burge J (2005) Software engineering using design RATionale. Ph.D. thesis, Worcester Polytechnic Institute
Google Scholar
Cunningham H, Maynard D, Bontcheva K, Tablan (2002) GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th anniversary meeting of the Association for Computational Linguistics (ACL’02). Philadelphia, July 2002
Google Scholar
de la Maza M, Tidor B (1993) An analysis of selection procedures with particular attention paid to proportional and Boltzmann selection. In: Forrest S (ed) Proceedings of the 5th international conference on genetic algorithms. Morgan Kaufmann, San Francisco, pp 124–131
Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutmann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Google Scholar
Liang Y, Liu Y, Kwong C, Lee W (2012) Learning the ‘whys’: discovering design rationale using text mining—an algorithm perspective. Comput Aided Des 44(10):916–930
Article Google Scholar
López C, Codocedo V, Astudillo H, Cysneiros LM (2012) Bridging the gap between software architecture rationale formalisms and actual architecture documents: an ontology-driven approach. Sci Comput Program 77(1):66–80
Article Google Scholar
Marcus M, Marcinkiewicz M, Santorini B (1993) Building a large annotated corpus of English: the penn treebank. Comput Linguist 19(2):313–330
Google Scholar
Mathur T (2015) Improving classification results using class imbalance solutions & evaluating the generalizability of rationale extraction techniques. Master of Computer Science thesis, Miami University. https://etd.ohiolink.edu/ap/10?0::NO:10:P10_ETD_SUBID:100565
Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge
MATH Google Scholar
Oliveira A, Braga P, Lima R, Cornelio M (2010) GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf Softw Technol 52:11
Article Google Scholar
Palau M, Moens M-F (2009) Argumentation mining: the detection, classification and structure of arguments in text. In: Proceedings of the 12th international conference on artificial intelligence and law (ICAIL ‘09). ACM, New York, pp 98–107
Google Scholar
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the international conference on software engineering, pp 522–531
Google Scholar
Rogers B, Gung J, Qaio Y, Burge JE (2012) Exploring techniques for rationale extraction from existing documents. In: Proceedings of international conference on software engineering. IEEE Press, pp 1313–1316
Google Scholar
Rogers B, Qaio Y, Gung J, Mathur T, Burge J (2014) Using text mining to extract rationale from existing documentation. In: Gero J (ed) Design, computing, and cognition. Springer
Google Scholar
Salcedo-Sanz S, Prado-Cumplido M, Perez-Cruz F, Bousono-Calzon C (2002) Feature selection via genetic optimization. In: Dorronsoro JR (ed) Proceedings of the international conference on artificial neural networks (ICANN ‘02). Springer, London, pp 547–552
Google Scholar
Tan F (2007) Improving feature selection techniques for machine learning. Ph.D. dissertation. Georgia State University, Atlanta. Advisor(s) Anu G. Bourgeois. AAI3293841
Google Scholar
Wasikowski M, Chen X (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400
Article Google Scholar
Yi Z, Zhao J, Mei (2012) Mining binary constraints in the construction of feature models. In: Proceedings of the IEEE international requirements engineering conference (RE 2012). IEEE, pp 141–150
Google Scholar

Download references

Acknowledgements

We would like to thank Miami graduate students John Malloy and Jennifer Flowers for their work in annotating the SPSD data. The design sessions that produced the SPSD data were funded by the National Science Foundation (Award CCF-0845840). We would like to thank the workshop organizers, André van der Hoek, Marian Petre, and Alex Baker for granting access to the transcripts. We would also like to thank Dr. Mike Zmuda for suggesting we move the information gain calculation outside of the GA. This work was supported by NSF CAREER Award CCF-0844638 (Burge). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Author information

Authors and Affiliations

Miami University, Oxford, Ohio, USA
Benjamin Rogers & Tanmay Mathur
Wesleyan University, Middletown, Connecticut, USA
Connor Justice & Janet E. Burge

Authors

Benjamin Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Connor Justice
View author publications
You can also search for this author in PubMed Google Scholar
Tanmay Mathur
View author publications
You can also search for this author in PubMed Google Scholar
Janet E. Burge
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Janet E. Burge .

Editor information

Editors and Affiliations

Department of Computer Science and School of Architecture, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
John. S Gero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rogers, B., Justice, C., Mathur, T., Burge, J.E. (2017). Generalizability of Document Features for Identifying Rationale. In: Gero, J. (eds) Design Computing and Cognition '16. Springer, Cham. https://doi.org/10.1007/978-3-319-44989-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-44989-0_34
Published: 03 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44988-3
Online ISBN: 978-3-319-44989-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics