Searching the Web for illegal content: the anatomy of a semantic search engine
- First Online:
- Cite this article as:
- Laura, L. & Me, G. Soft Comput (2017) 21: 1245. doi:10.1007/s00500-015-1857-4
- 252 Downloads
In this paper, we describe the challenges in the realization of a semantic search engine, suited to help law enforcements in the fight against the online drug marketplaces, where new psychoactive substances are sold. This search engine has been developed under the Semantic Illegal Content Hunter (SICH) Project, with the financial support of the Prevention of and Fight Against Crime Programme ISEC 2012 European Commission. The SICH Project-specific objective is to develop new strategic tools and assessment techniques, based on semantic analysis on texts, to support the dynamic mapping and the automatic identification of illegal content over the Net. In particular, a Web search engine can be roughly divided into three main components: (a) the crawler that is in charge of collecting the Web pages to be indexed, (b) the indexer that parses and stores the collected data and (c) the query processor that interacts with the user parsing a query and returning the relevant document; in this paper, we detail each of these components of the SICH search engine, highlighting the differences from a traditional Web search engine.