How Question Answering Technology Helps to Locate Malevolent Online Content
The inherent lack of control over the Internet content resulted in proliferation of online material that can be potentially detrimental. For example, the infamous “Anarchist Cookbook” teaching how to make weapons, home made bombs, and poisons, keeps re-appearing in various places. Some websites teach how to break into computer networks to steal passwords and credit card information. Law enforcement, security experts, and public watchdogs started to locate, monitor, and act when such malevolent content surfaces on the Internet. Since the resources of law enforcement are limited, it may take some time before potentially malevolent content is located, enough for it to disseminate and cause harm. The only practical way for searching the content of the Internet, available for law enforcement, security experts, and public watchdogs is by using a search engine, such as Google, AOL, MSN, etc. We have suggested and empirically evaluated an alternative technology (automated question answering or QA) capable of locating potentially malevolent online content. We have implemented a proof-of-concept prototype that is capable of finding web pages that provide the answers to given questions (e.g. “How to build a pipe bomb?”). Using students as subjects in a controlled experiment, we have empirically established that our QA prototype finds web pages that are more likely to provide answers to given questions than simple keyword search using Google. This suggests that QA technology can be a good replacement or an addition to the traditional keyword searching for the task of locating malevolent online content and, possibly, for a more general task of interactive online information exploration.
KeywordsQuestion Answering Software Piracy Security Expert Question Answering System Reciprocal Rank
Unable to display preview. Download preview PDF.
- 1.Brin, S., Page, L.: The Anatomy of a Large Scale Hypertextual Web Search Engine. Stanford technical report. Stanford Database Group Publication Server (1998), http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=1998-8&format=pdf&compression=
- 2.Dumais, S., Banko, M., Brill, E., Lin, J., Ng, A.: Web Question Answering: Is More Always Better? In: Proceedings of ACM Conference on Information Retrieval. ACM, New York (2002)Google Scholar
- 3.Lyman, P., Varian, H.R.: How Much Information?, School of Information Management and Systems, at the University of California at Berkeley (2000), [WWW] http://www.sims.berkeley.edu/research/projects/how-much-info/ (February 2005)
- 4.National Science Foundation. NSF Announces $30 Million Program in Cyber Trust. NSF Web site (2003), [WWW], http://www.nsf.gov/od/lpa/news/03/pr03133.htm (February 2004)
- 6.Roussinov, D., Robles-Flores, J.A.: Web Question Answering: Technology and Business Applications. In: Proceedings of the Tenth AMCIS, NY, USA, August 6-8, pp. 3248–3254 (2004)Google Scholar
- 7.Swartz, J.: Hackers hijack federal computers. USA Today (2004), http://www.usatoday.com/tech/news/computersecurity/2004-08-30-cyber-crime_x.htm
- 9.Verton, D., Verton, D.: Black Ice: The Invisible Threat of Cyber-Terrorism. McGraw-Hill Osborne Media, Emeryville (2003)Google Scholar
- 10.Voorhees, E., Buckland, L. (eds.): Proceedings of the Twelfth Text REtrieval Conference TREC, Gaithersburg, Maryland, USA, NIST, November 18-21 (2003)Google Scholar