Skip to main content

Generalizability of Document Features for Identifying Rationale

  • Conference paper
  • First Online:
Design Computing and Cognition '16

Abstract

One of the challenges in using statistical machine learning for text mining is coming up with the right set of text features. We have developed a system that uses genetic algorithms (GAs) to evaluate candidate feature sets to classify sentences in a document. We have applied this tool to find design rationale (the reasons behind design decisions) in two different datasets to evaluate our approach for finding rationale and to see how features might differ for the same classification target in different types of data. We used Chrome bug reports and transcripts of design sessions. We found that we were able to get results with less overfitting by using a smaller set of features common to the set optimized for each document type.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • AuditMyPC (2010) Glossary of internet security. http://www.auditmypc.com/glossary-of-internet-security-terms.asp. Retrieved 23 Nov 2010

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30:107–117

    Article  Google Scholar 

  • Burge J (2005) Software engineering using design RATionale. Ph.D. thesis, Worcester Polytechnic Institute

    Google Scholar 

  • Cunningham H, Maynard D, Bontcheva K, Tablan (2002) GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th anniversary meeting of the Association for Computational Linguistics (ACL’02). Philadelphia, July 2002

    Google Scholar 

  • de la Maza M, Tidor B (1993) An analysis of selection procedures with particular attention paid to proportional and Boltzmann selection. In: Forrest S (ed) Proceedings of the 5th international conference on genetic algorithms. Morgan Kaufmann, San Francisco, pp 124–131

    Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutmann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18

    Google Scholar 

  • Liang Y, Liu Y, Kwong C, Lee W (2012) Learning the ‘whys’: discovering design rationale using text mining—an algorithm perspective. Comput Aided Des 44(10):916–930

    Article  Google Scholar 

  • López C, Codocedo V, Astudillo H, Cysneiros LM (2012) Bridging the gap between software architecture rationale formalisms and actual architecture documents: an ontology-driven approach. Sci Comput Program 77(1):66–80

    Article  Google Scholar 

  • Marcus M, Marcinkiewicz M, Santorini B (1993) Building a large annotated corpus of English: the penn treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  • Mathur T (2015) Improving classification results using class imbalance solutions & evaluating the generalizability of rationale extraction techniques. Master of Computer Science thesis, Miami University. https://etd.ohiolink.edu/ap/10?0::NO:10:P10_ETD_SUBID:100565

  • Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge

    MATH  Google Scholar 

  • Oliveira A, Braga P, Lima R, Cornelio M (2010) GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf Softw Technol 52:11

    Article  Google Scholar 

  • Palau M, Moens M-F (2009) Argumentation mining: the detection, classification and structure of arguments in text. In: Proceedings of the 12th international conference on artificial intelligence and law (ICAIL ‘09). ACM, New York, pp 98–107

    Google Scholar 

  • Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the international conference on software engineering, pp 522–531

    Google Scholar 

  • Rogers B, Gung J, Qaio Y, Burge JE (2012) Exploring techniques for rationale extraction from existing documents. In: Proceedings of international conference on software engineering. IEEE Press, pp 1313–1316

    Google Scholar 

  • Rogers B, Qaio Y, Gung J, Mathur T, Burge J (2014) Using text mining to extract rationale from existing documentation. In: Gero J (ed) Design, computing, and cognition. Springer

    Google Scholar 

  • Salcedo-Sanz S, Prado-Cumplido M, Perez-Cruz F, Bousono-Calzon C (2002) Feature selection via genetic optimization. In: Dorronsoro JR (ed) Proceedings of the international conference on artificial neural networks (ICANN ‘02). Springer, London, pp 547–552

    Google Scholar 

  • Tan F (2007) Improving feature selection techniques for machine learning. Ph.D. dissertation. Georgia State University, Atlanta. Advisor(s) Anu G. Bourgeois. AAI3293841

    Google Scholar 

  • Wasikowski M, Chen X (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400

    Article  Google Scholar 

  • Yi Z, Zhao J, Mei (2012) Mining binary constraints in the construction of feature models. In: Proceedings of the IEEE international requirements engineering conference (RE 2012). IEEE, pp 141–150

    Google Scholar 

Download references

Acknowledgements

We would like to thank Miami graduate students John Malloy and Jennifer Flowers for their work in annotating the SPSD data. The design sessions that produced the SPSD data were funded by the National Science Foundation (Award CCF-0845840). We would like to thank the workshop organizers, André van der Hoek, Marian Petre, and Alex Baker for granting access to the transcripts. We would also like to thank Dr. Mike Zmuda for suggesting we move the information gain calculation outside of the GA. This work was supported by NSF CAREER Award CCF-0844638 (Burge). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janet E. Burge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this paper

Cite this paper

Rogers, B., Justice, C., Mathur, T., Burge, J.E. (2017). Generalizability of Document Features for Identifying Rationale. In: Gero, J. (eds) Design Computing and Cognition '16. Springer, Cham. https://doi.org/10.1007/978-3-319-44989-0_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44989-0_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44988-3

  • Online ISBN: 978-3-319-44989-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics