Acquisition and Preparation of Data for OSINT Investigations

Chapter
Part of the Advanced Sciences and Technologies for Security Applications book series (ASTSA)

Abstract

Underpinning all open-source intelligence investigations is data. Without data there is nothing to build upon, to combine, to analyse or draw conclusions from. This chapter outlines some of the processes an investigator can undertake to obtain data from open sources as well as methods for the preparation of this data into usable formats for further analysis. First, it discusses the reasons for needing to collect data from open sources. Secondly, it introduces different types of data that may be encountered including unstructured and structured data sources and where to obtain such data. Thirdly, it reviews methods for information extraction—the first step in preparing data for further analysis. Finally, it covers some of the privacy, legal and ethical good practices that should be adhered to when accessing, interrogating and using open source data.

References

  1. Bayerl PS, Akhgar B (2015) Surveillance and falsification implications for open source intelligence investigations. Commun ACM 58(8):62–69CrossRefGoogle Scholar
  2. Bazzell M (2016) Open source intelligence techniques: resources for searching and analyzing online information. CCI PublishingGoogle Scholar
  3. Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions. Association for Computational Linguistics, July 2006, pp 69–72Google Scholar
  4. Bradbury D (2011) In plain view: open source intelligence. Comput Fraud Secur 2011(4):5–9CrossRefGoogle Scholar
  5. Cavoukian A (2011) 7 Foundational principles of privacy by design. https://www.ipc.on.ca/images/Resources/7foundationalprinciples.pdf
  6. Chen H (2011) Dark Web: exploring and mining the dark side of the web. In: 2011 European intelligence and security informatics conference (EISIC). IEEE, Sept 2011, pp 1–2Google Scholar
  7. College of Policing (2013) Investigation process. In: Authorised professional practice. https://www.app.college.police.uk/app-content/investigations/investigation-process/#material
  8. College of Policing (2015) Intelligence cycle. In: Authorised professional practice. https://www.app.college.police.uk/app-content/intelligence-management/intelligence-cycle/
  9. Cunningham H, Tablan V, Roberts A, Bontcheva K (2013) Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput Biol 9(2):e1002854CrossRefGoogle Scholar
  10. DARPA (2014) Memex aims to create a new paradigm for domain-specific search. In: Defense Advanced Research Projects Agency. http://www.darpa.mil/news-events/2014-02-09
  11. Defense Technical Information Center (DTIC), Department of Defense (2007) Joint intelligence. http://www.dtic.mil/doctrine/new_pubs/jp2_0.pdf
  12. FBI Intelligence Cycle (n.d.) In: Federal Bureau of Investigation. https://www.fbi.gov/about-us/intelligence/intelligence-cycle
  13. Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, June 2005, pp 363–370Google Scholar
  14. Fu T, Abbasi A, Chen H (2010) A focused crawler for Dark Web forums. J Am Soc Inform Sci Technol 61(6):1213–1231Google Scholar
  15. Gibson S (2004) Open source intelligence. RUSI J 149:16–22CrossRefGoogle Scholar
  16. Greenwald G, MacAskill E, Poitras L (2013) Edward Snowden: the whistleblower behind the NSA surveillance revelations. In: The guardian. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance
  17. Hansen D, Shneiderman B, Smith MA (2010) Analyzing social media networks with NodeXL: insights from a connected world. Morgan Kaufmann, Los AltosGoogle Scholar
  18. HMIC (Her Majesty’s Inspectorate of Constabulary) (2011) The rules of engagement: a review of the August 2011 riots. https://www.justiceinspectorates.gov.uk/hmic/media/a-review-of-the-august-2011-disorders-20111220.pdf
  19. Hoepman JH (2014) Privacy design strategies. In: IFIP international information security conference. Springer, Berlin, June 2014, pp 446–459Google Scholar
  20. Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Practical extraction of disaster-relevant information from social media. In: Proceedings of the 22nd international conference on World Wide Web. ACM, May 2013, pp 1021–1024Google Scholar
  21. Lohr S (2014) For big-data scientists, “Janitor Work” is key hurdle to insights. In: The New York Times. http://mobile.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=2
  22. Madhavan J, Ko D, Kot Ł, Ganapathy V, Rasmussen A, Halevy A (2008) Google’s deep web crawl. Proc VLDB Endowment 1(2):1241–1252CrossRefGoogle Scholar
  23. Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The Stanford CoreNLP Natural Language Processing Toolkit. In ACL (System Demonstrations), June 2014, pp 55–60Google Scholar
  24. Mercado SC (2009) Sailing the sea of OSINT in the information age. Secret Intell Reader 78Google Scholar
  25. NATO (2001) NATO open source intelligence handbookGoogle Scholar
  26. Omand D, Bartlett J, Miller C (2012) Introducing social media intelligence (SOCMINT). Intell Natl Secur 27(6):801–823CrossRefGoogle Scholar
  27. Pallaris C (2008) Open source intelligence: a strategic enabler of national security. CSS Analyses Secur Policy 3(32):1–3Google Scholar
  28. Rogers C, Lewis R (eds) (2013) Introduction to police work. Routledge, LondonGoogle Scholar
  29. Shein E (2013) Ephemeral data. Commun ACM 56:20CrossRefGoogle Scholar
  30. Warden P (2010) How I got sued by Facebook. In: Pete Warden’s blog. https://petewarden.com/2010/04/05/how-i-got-sued-by-facebook/

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.CENTRIC/Sheffield Hallam UniversitySheffieldUK

Personalised recommendations