Automated Extraction of Vulnerability Information for Home Computer Security

  • Sachini Weerawardhana
  • Subhojeet Mukherjee
  • Indrajit RayEmail author
  • Adele Howe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8930)


Online vulnerability databases provide a wealth of information pertaining to vulnerabilities that are present in computer application software, operating systems, and firmware. Extracting useful information from these databases that can subsequently be utilized by applications such as vulnerability scanners and security monitoring tools can be a challenging task. This paper presents two approaches to information extraction from online vulnerability databases: a machine learning based solution and a solution that exploits linguistic patterns elucidated by part-of-speech tagging. These two systems are evaluated to compare accuracy in recognizing security concepts in previously unseen vulnerability description texts. We discuss design considerations that should be taken into account in implementing information retrieval systems for security domain.


Security Vulnerability Information extraction Named entity recognition 


  1. 1.
    Bridges, R.A., Jones, C.L., Iannacone, M.D., Goodall, J.R.: Automatic labeling for entity extraction in cyber security. Computing Research Repository (2013).
  2. 2.
    Esuli, A., Sebastiani, F.: SentIWordNet: A publicly available lexical resource for opinion mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation, Genoa, Italy, May 2006Google Scholar
  3. 3.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books, Cambridge (1998)Google Scholar
  4. 4.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, June 2005Google Scholar
  5. 5.
    Joshi, A., Lal, R., Finin, T., Joshi, A.: Extracting cybersecurity related linked data from text. In: Proceedings of the 7th IEEE International Conference on Semantic Computing, Irvine, CA, September 2013Google Scholar
  6. 6.
    Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan, July 2003Google Scholar
  7. 7.
    Lab, N.: BRAT annotation tool (2010).
  8. 8.
    Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: Proceedings of DARPA Broadcast News Workshop, Herndon, VA, March 1999Google Scholar
  9. 9.
    de Marneffe, M.C., et al.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the International Conference on Language Resources and Evaluation, Genoa, Italy, May 2006Google Scholar
  10. 10.
    McNeil, N., Bridges, R.A., Iannacone, M.D., Czejdo, B.D., Perez, N.: PACE: Pattern accurate computationally efficient bootstrapping for timely discovery of cyber-security concepts. Computing Research Repository (2013).
  11. 11.
    Mulwad, V., Li, W., Joshi, A., Finin, T., Viswanathan, K.: Extracting information about security vulnerabilities from web text. In: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Lyon, France, August 2011Google Scholar
  12. 12.
    Roschke, S., Cheng, F., Schuppenies, R., Meinel, C.: Towards unifying vulnerability information for attack graph construction. In: Samarati, P., Yung, M., Martinelli, F., Ardagna, C.A. (eds.) ISC 2009. LNCS, vol. 5735, pp. 218–233. Springer, Heidelberg (2009) Google Scholar
  13. 13.
    Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Geneva, Switzerland, August 2004Google Scholar
  14. 14.
    Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Hong Kong, October 2000Google Scholar
  15. 15.
    Urbanska, M., Ray, I., Howe, A., Roberts., M.: Structuring a vulnerability description for comprehensive single system security analysis. In: Rocky Mountain Celebration of Women in Computing, Fort Collins, CO, USA, November 2012Google Scholar
  16. 16.
    Urbanska, M., Roberts, M., Ray, I., Howe, A., Byrne, Z.: Accepting the inevitable: Factoring the user into home computer security. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, San Antonio, TX, USA, February 2013Google Scholar
  17. 17.
    Wallach, H.M.: Conditional random fields: An introduction. CIS Technical report MS-CIS-04-21, University of Pennsylvania (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Sachini Weerawardhana
    • 1
  • Subhojeet Mukherjee
    • 1
  • Indrajit Ray
    • 1
    Email author
  • Adele Howe
    • 1
  1. 1.Computer Science DepartmentColorado State UniversityFort CollinsUSA

Personalised recommendations