Soft Computing

, Volume 21, Issue 5, pp 1245–1252 | Cite as

Searching the Web for illegal content: the anatomy of a semantic search engine

Methodologies and Application
  • 326 Downloads

Abstract

In this paper, we describe the challenges in the realization of a semantic search engine, suited to help law enforcements in the fight against the online drug marketplaces, where new psychoactive substances are sold. This search engine has been developed under the Semantic Illegal Content Hunter (SICH) Project, with the financial support of the Prevention of and Fight Against Crime Programme ISEC 2012 European Commission. The SICH Project-specific objective is to develop new strategic tools and assessment techniques, based on semantic analysis on texts, to support the dynamic mapping and the automatic identification of illegal content over the Net. In particular, a Web search engine can be roughly divided into three main components: (a) the crawler that is in charge of collecting the Web pages to be indexed, (b) the indexer that parses and stores the collected data and (c) the query processor that interacts with the user parsing a query and returning the relevant document; in this paper, we detail each of these components of the SICH search engine, highlighting the differences from a traditional Web search engine.

References

  1. Arapakis I (2015) System and user aspects of web search latency. http://www.slideshare.net/iarapakis/upf15
  2. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New YorkGoogle Scholar
  3. Bitcoin (2011) Bitcoin P2P digital currencyGoogle Scholar
  4. Brandes U, Gaertler M, Wagner D (2003) Experiments on graph clustering algorithms. Springer, New YorkCrossRefMATHGoogle Scholar
  5. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRefGoogle Scholar
  6. Camastra F, Ciaramella A, Staiano A (2013) Machine learning and soft computing for ICT security: an overview of current trends. J Ambient Intell Humaniz Comput 4(2):235–247CrossRefGoogle Scholar
  7. Cho J, Garcia-Molina H (2002) Parallel crawlers. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 124–135Google Scholar
  8. Corazza O, Assi S, Simonato P, Corkery J, Bersani FS, Demetrovics Z, Stair J, Fergus S, Pezzolesi C, Pasinetti M, Deluca P, Drummond C, Davey Z, Blaszko U, Moskalewicz J, Mervo B, Furia LD, Farre M, Flesland L, Pisarska A, Shapiro H, Siemann H, Skutle A, Sferrazza E, Torrens M, Sambola F, van der Kreeft P, Scherbaum N, Schifano F (2013) Promoting innovation and excellence to face the rapid diffusion of novel psychoactive substances in the EU: the outcomes of the rednet project. Hum Psychopharmacol Clin Exp 28(4):317–323CrossRefGoogle Scholar
  9. Corazza O, Valeriani G, Bersani FS, Corkery J, Martinotti G, Bersani G, Schifano F (2014) “Spice”, “Kryptonite”, “Black Mamba”: an overview of brand names and marketing strategies of novel psychoactive substances on the Web. J Psychoact Drugs 46(4):287–294CrossRefGoogle Scholar
  10. Deluca P, Davey Z, Corazza O, Furia LD, Farre M, Flesland LH, Mannonen M, Majava A, Peltoniemi T, Pasinetti M, Pezzolesi C, Scherbaum N, Siemann H, Skutle A, Torrens M, van der Kreeft P, Iversen E, Schifano F (2012) Identifying emerging trends in recreational drug use; outcomes from the psychonaut web mapping project. Prog Neuro Psychopharmacol Biol Psychiatr 39(2):221–226 (new drugs of abuse)CrossRefGoogle Scholar
  11. Diestel R (2012) Graph theory, Graduate texts in mathematics, vol 173, 4th edn. Springer, HeidelbergGoogle Scholar
  12. Fruchterman TM, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164CrossRefGoogle Scholar
  13. Han X, Ma J, Wu Y, Cui C (2014) A novel machine learning approach to rank web forum posts. Soft Comput 18(5):941–959CrossRefGoogle Scholar
  14. Hoque E, Hoeber O, Strong G, Gong M (2013) Combining conceptual query expansion and visual search results exploration for web image retrieval. J Ambient Intell Humaniz Comput 4(3):389–400CrossRefGoogle Scholar
  15. Hout MCV, Bingham T (2013a) Silk Road, the virtual drug marketplace: a single case study of user experiences. Int J Drug Policy 24(5):385–391Google Scholar
  16. Hout MCV, Bingham T (2013b) Surfing the Silk Road: a study of users experiences. Int J Drug Policy 24(6):524–529Google Scholar
  17. Hout MCV, Bingham T (2014) Responsible vendors, intelligent consumers: Silk road, the online revolution in drug trading. Int J Drug Policy 25(2):183–189CrossRefGoogle Scholar
  18. Jansen BJ (2006) Adversarial information retrieval aspects of sponsored search. In: AIRWeb, pp 33–36Google Scholar
  19. Laura L, Me G (2015) Searching the web for illegal content: the anatomy of a semantic search engine. In: Proceedings of the 10th international conference on global security, safety & sustainability. SpringerGoogle Scholar
  20. Maleki-Dizaji S, Siddiqi J, Soltan-Zadeh Y, Rahman F (2014) Adaptive information retrieval system via modelling user behaviour. J Ambient Intell Humaniz Comput 5(1):105–110CrossRefGoogle Scholar
  21. Nikravesh M, Loia V, Azvine B (2002) Fuzzy logic and the internet (flint): Internet, world wide web, and search engines. Soft Comput 6(5):287–299CrossRefMATHGoogle Scholar
  22. Ogiela M, Sukowski P (2014) Protocol for irreversible off-line transactions in anonymous electronic currency exchange. Soft Comput 18(12):2587–2594CrossRefGoogle Scholar
  23. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/
  24. Pereira RAM, Molinari A, Pasi G (2005) Contextual weighted representations and indexing models for the retrieval of html documents. Soft Comput 9(7):481–492CrossRefGoogle Scholar
  25. Tor project (2011) Anonymity online. https://www.torproject.org/. Accessed 20 Sept 2012
  26. United Nations Office on Drugs and Crime (UNODC) (2014) Global synthetic drugs assessment (United Nations publication, Sales No. E.14.XI.6). https://www.unodc.org/documents/scientific/2014_Global_Synthetic_Drugs_Assessment_web.pdf
  27. Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, San FranciscoMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Computer, Control, and Management Engineering “Antonio Ruberti”“Sapienza” University of RomeRomeItaly
  2. 2.Research Centre for Transport and Logistics (CTL)“Sapienza” Università di RomaRomeItaly
  3. 3.CeRSI-Research Center in Information SystemsLUISS Guido Carli UniversityRomeItaly

Personalised recommendations