Soft Computing

, Volume 21, Issue 5, pp 1245–1252

Searching the Web for illegal content: the anatomy of a semantic search engine

Methodologies and Application

DOI: 10.1007/s00500-015-1857-4

Cite this article as:
Laura, L. & Me, G. Soft Comput (2017) 21: 1245. doi:10.1007/s00500-015-1857-4
  • 216 Downloads

Abstract

In this paper, we describe the challenges in the realization of a semantic search engine, suited to help law enforcements in the fight against the online drug marketplaces, where new psychoactive substances are sold. This search engine has been developed under the Semantic Illegal Content Hunter (SICH) Project, with the financial support of the Prevention of and Fight Against Crime Programme ISEC 2012 European Commission. The SICH Project-specific objective is to develop new strategic tools and assessment techniques, based on semantic analysis on texts, to support the dynamic mapping and the automatic identification of illegal content over the Net. In particular, a Web search engine can be roughly divided into three main components: (a) the crawler that is in charge of collecting the Web pages to be indexed, (b) the indexer that parses and stores the collected data and (c) the query processor that interacts with the user parsing a query and returning the relevant document; in this paper, we detail each of these components of the SICH search engine, highlighting the differences from a traditional Web search engine.

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Computer, Control, and Management Engineering “Antonio Ruberti”“Sapienza” University of RomeRomeItaly
  2. 2.Research Centre for Transport and Logistics (CTL)“Sapienza” Università di RomaRomeItaly
  3. 3.CeRSI-Research Center in Information SystemsLUISS Guido Carli UniversityRomeItaly