Improving Analysis and Decision-Making Through Intelligent Web Crawling
Analysts across national security domains are required to sift through large amounts of data to find and compile relevant information in a form that enables decision makers to take action in high-consequence scenarios. However, even the most experienced analysts are unable to be 100 % consistent and accurate based on the entire dataset, unbiased towards familiar documentation, and are unable to synthesize and process large amounts of information in a small amount of time. Sandia National Laboratories has attempted to solve this problem by developing an intelligent web crawler called Huntsman. Huntsman acts as a personal research assistant by browsing the internet or offline datasets in a way similar to the human search process, only much faster (millions of documents per day), by submitting queries to search engines and assessing the usefulness of page results through analysis of full-page content with a suite of text analytics. This paper will discuss Huntsman’s capability to both mirror and enhance human analysts using intelligent web crawling with analysts-in-the-loop. The goal is to demonstrate how weaknesses in human cognitive processing can be compensated for by fusing human processes with text analytics and web crawling systems, which ultimately reduces analysts’ cognitive burden and increases mission effectiveness.
KeywordsText analytics Intelligent web crawling Decision making Cognitive consistency
- Henzinger, M., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling. In: Proceedings of the 9th International World Wide Web Conference, pp. 295–308. Elsevier Science, Amsterdam, Netherlands, May 2000Google Scholar
- Jasra, M.: Google Has Indexed Only 0.004 % of All Data on the Internet (2010). http://www.webanalyticsworld.net/2010/11/google-indexes-only-0004-of-all-data-on.html
- Zeinalipour-Yazti, D., Dikaiakos, M.: (2002)Google Scholar
- Najork, M., Wiener, J.L.: Breadth-first search crawling yields high-quality pages. In: WWW 10, Hong Kong, 1–5 May 2001Google Scholar
- Marchionini, G.: Information Seeking in Electronic Environments, vol. 9. Cambridge University Press, Cambridge (1997)Google Scholar