Searching the Web for illegal content: the anatomy of a semantic search engine

Methodologies and Application
Published: 04 September 2015

Volume 21, pages 1245–1252, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

816 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we describe the challenges in the realization of a semantic search engine, suited to help law enforcements in the fight against the online drug marketplaces, where new psychoactive substances are sold. This search engine has been developed under the Semantic Illegal Content Hunter (SICH) Project, with the financial support of the Prevention of and Fight Against Crime Programme ISEC 2012 European Commission. The SICH Project-specific objective is to develop new strategic tools and assessment techniques, based on semantic analysis on texts, to support the dynamic mapping and the automatic identification of illegal content over the Net. In particular, a Web search engine can be roughly divided into three main components: (a) the crawler that is in charge of collecting the Web pages to be indexed, (b) the indexer that parses and stores the collected data and (c) the query processor that interacts with the user parsing a query and returning the relevant document; in this paper, we detail each of these components of the SICH search engine, highlighting the differences from a traditional Web search engine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Similar content being viewed by others

Searching the Web for Illegal Content: The Anatomy of a Semantic Search Engine

Chapter © 2015

An Investigative Search Engine for the Human Trafficking Domain

Chapter © 2017

Introducing Solon: A Semantic Platform for Managing Legal Sources

Chapter © 2017

Notes

References

Arapakis I (2015) System and user aspects of web search latency. http://www.slideshare.net/iarapakis/upf15
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York
Google Scholar
Bitcoin (2011) Bitcoin P2P digital currency
Brandes U, Gaertler M, Wagner D (2003) Experiments on graph clustering algorithms. Springer, New York
Book MATH Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117
Article Google Scholar
Camastra F, Ciaramella A, Staiano A (2013) Machine learning and soft computing for ICT security: an overview of current trends. J Ambient Intell Humaniz Comput 4(2):235–247
Article Google Scholar
Cho J, Garcia-Molina H (2002) Parallel crawlers. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 124–135
Corazza O, Assi S, Simonato P, Corkery J, Bersani FS, Demetrovics Z, Stair J, Fergus S, Pezzolesi C, Pasinetti M, Deluca P, Drummond C, Davey Z, Blaszko U, Moskalewicz J, Mervo B, Furia LD, Farre M, Flesland L, Pisarska A, Shapiro H, Siemann H, Skutle A, Sferrazza E, Torrens M, Sambola F, van der Kreeft P, Scherbaum N, Schifano F (2013) Promoting innovation and excellence to face the rapid diffusion of novel psychoactive substances in the EU: the outcomes of the rednet project. Hum Psychopharmacol Clin Exp 28(4):317–323
Article Google Scholar
Corazza O, Valeriani G, Bersani FS, Corkery J, Martinotti G, Bersani G, Schifano F (2014) “Spice”, “Kryptonite”, “Black Mamba”: an overview of brand names and marketing strategies of novel psychoactive substances on the Web. J Psychoact Drugs 46(4):287–294
Article Google Scholar
Deluca P, Davey Z, Corazza O, Furia LD, Farre M, Flesland LH, Mannonen M, Majava A, Peltoniemi T, Pasinetti M, Pezzolesi C, Scherbaum N, Siemann H, Skutle A, Torrens M, van der Kreeft P, Iversen E, Schifano F (2012) Identifying emerging trends in recreational drug use; outcomes from the psychonaut web mapping project. Prog Neuro Psychopharmacol Biol Psychiatr 39(2):221–226 (new drugs of abuse)
Article Google Scholar
Diestel R (2012) Graph theory, Graduate texts in mathematics, vol 173, 4th edn. Springer, Heidelberg
Fruchterman TM, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164
Article Google Scholar
Han X, Ma J, Wu Y, Cui C (2014) A novel machine learning approach to rank web forum posts. Soft Comput 18(5):941–959
Article Google Scholar
Hoque E, Hoeber O, Strong G, Gong M (2013) Combining conceptual query expansion and visual search results exploration for web image retrieval. J Ambient Intell Humaniz Comput 4(3):389–400
Article Google Scholar
Hout MCV, Bingham T (2013a) Silk Road, the virtual drug marketplace: a single case study of user experiences. Int J Drug Policy 24(5):385–391
Hout MCV, Bingham T (2013b) Surfing the Silk Road: a study of users experiences. Int J Drug Policy 24(6):524–529
Hout MCV, Bingham T (2014) Responsible vendors, intelligent consumers: Silk road, the online revolution in drug trading. Int J Drug Policy 25(2):183–189
Article Google Scholar
Jansen BJ (2006) Adversarial information retrieval aspects of sponsored search. In: AIRWeb, pp 33–36
Laura L, Me G (2015) Searching the web for illegal content: the anatomy of a semantic search engine. In: Proceedings of the 10th international conference on global security, safety & sustainability. Springer
Maleki-Dizaji S, Siddiqi J, Soltan-Zadeh Y, Rahman F (2014) Adaptive information retrieval system via modelling user behaviour. J Ambient Intell Humaniz Comput 5(1):105–110
Article Google Scholar
Nikravesh M, Loia V, Azvine B (2002) Fuzzy logic and the internet (flint): Internet, world wide web, and search engines. Soft Comput 6(5):287–299
Article MATH Google Scholar
Ogiela M, Sukowski P (2014) Protocol for irreversible off-line transactions in anonymous electronic currency exchange. Soft Comput 18(12):2587–2594
Article Google Scholar
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/
Pereira RAM, Molinari A, Pasi G (2005) Contextual weighted representations and indexing models for the retrieval of html documents. Soft Comput 9(7):481–492
Article Google Scholar
Tor project (2011) Anonymity online. https://www.torproject.org/. Accessed 20 Sept 2012
United Nations Office on Drugs and Crime (UNODC) (2014) Global synthetic drugs assessment (United Nations publication, Sales No. E.14.XI.6). https://www.unodc.org/documents/scientific/2014_Global_Synthetic_Drugs_Assessment_web.pdf
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, San Francisco
MATH Google Scholar

Download references

Acknowledgments

The EU ISEC programme has funded the 2-year national project SICH for the consortium formed by Expert System and RiSSC Centro Ricerche e Studi su Sicurezza e Criminalità http://www.rissc.it/. A preliminary version of part of this paper appeared in Laura and Me (2015).

Author information

Authors and Affiliations

Department of Computer, Control, and Management Engineering “Antonio Ruberti”, “Sapienza” University of Rome, Via Ariosto 25, 00185, Rome, Italy
Luigi Laura
Research Centre for Transport and Logistics (CTL), “Sapienza” Università di Roma, Rome, Italy
Luigi Laura
CeRSI-Research Center in Information Systems, LUISS Guido Carli University, Rome, Italy
Gianluigi Me

Authors

Luigi Laura
View author publications
You can also search for this author in PubMed Google Scholar
Gianluigi Me
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luigi Laura.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laura, L., Me, G. Searching the Web for illegal content: the anatomy of a semantic search engine. Soft Comput 21, 1245–1252 (2017). https://doi.org/10.1007/s00500-015-1857-4

Download citation

Published: 04 September 2015
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00500-015-1857-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions