Filtrage automatique de courriels Une approche adaptative et multiniveau

Nouali, Omar; Blache, Philippe

doi:10.1007/BF03219858

Filtrage automatique de courriels Une approche adaptative et multiniveau

Email automatic filtering an adaptive and multi-level approach

Published: December 2005

Volume 60, pages 1466–1487, (2005)
Cite this article

Annales Des Télécommunications Aims and scope Submit manuscript

Omar Nouali^1,2 &
Philippe Blache²

77 Accesses
1 Citation
Explore all metrics

Résumé

Cet article propose un système de courriers électroniques paramétrable avec plusieurs niveaux de filtrage: un filtrage simple basé sur l’information contenue dans l’entête du courriel; un filtrage booléen basé sur l’existence ou non de mots clés dans le corps du courriel; un filtrage vectoriel basé sur le poids de contribution des mots clés du courriel; un filtrage approfondi basé sur les propriétés linguistiques caractérisant la structure et le contenu du courriel.

Nous proposons une solution adaptative qui offre au système la possibilité d’apprendre à partir de données, de modifier ses connaissances et de s’adapter à l’évolution des intérêts de l’utilisateur et à la variation de la nature des courriels dans le temps. De plus, nous utilisons un réseau lexical permettant d’améliorer la représentation du courriel en prenant en considération l’aspect sémantique.

Abstract

This article proposes an electronic mail system with several levels of filtering: a simple filtering based on contents analysis of the message header fields; a Boolean filtering based on the existence or not of key words in the body of the courriel; a Vectorial filtering based on weight of key words; a linguistic filtering based on the linguistic properties characterizing structure and contents of courriel.

We propose an adaptive solution by using an automatic learning method which allows the filtering system to learn from data, to modify its knowledge and to adapt to user’s interests. Moreover, we use a lexical network to improve message representation and to take into account the semantic aspect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bibliographie

Arr-Mokhtar (S.),Chanod (J.P.), Xerox Incremental Parser (Xip),Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 72–79, 1997.
Amint (M. R.), Apprentissage automatique et recherche de l’information : application à l’extraction d’information de surface et au résumé de texte.Thèse de doctorat, Université de Paris 6, 2001.
Androutsopoulos (I.),Koutsias (J.),Chandrinos (K. V.),Paliouras (G.),Spyropoulos (C.D.), An evaluation of naïve Bayesian anti-spam filtering. Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (Ecml 2000), Barcelona, Spain, pp. 9–17, 2000.
Apté (C), Damerau (F.), Weiss (S.), Automated Learning of Decision Rules for Text Categorization,Acm Transactions on Information Systems, 12, n^o 3, pp. 233–251, 1994.
Article Google Scholar
Biber (D.), Variation Across Speech and Writing.University Press, Cambridge, 1988.
Bronckart (J. P.),Bain (D.),Schneuwly (B.),Davaud (C),Pasquier (A.), Le fonctionnement des discours : un modèle psychologique et une méthode d’analyse,Lausanne: Delachaux & Niestlé, 1985.
Ben Hazez (S.),Desclés (J.P.),Minel (J.L.), Modèle d’exploration contextuelle pour l’analyse sémantique des textes.Taln 2001, Tours, pp. 73–82, 2001.
Brill (E.), A Simple Rule-based Part of Speech Tagger.Proceedings of the Third Conference on Applied Natural Language Processing,Acl, pp. 152–155, 1992.
Caropreso (M.),Matwtn (S.),Sebastiani (F.), A learner-independent evaluation of the usefulness of statistical phrases for automatic text categorization, Hershey,Us, pp. 78–102, 2001.
Carreras (X.),Marquez (L.), Boosting Trees for Anti-Spam Email Filtering. Proceedings ofRanlp-01, 4th International Conference on Recent Advances in Natural Language Processing, 2001.
Chandrasekar (R.),Srinivas (B.), Using Syntactic Information in Document Filtering : A Comparative Study of Part-of-speech Tagging and super tagging. Proceedings of theRiao-97 Conference, pp. 531–545, 1997.
Collins (M. J.), A New Statistical Parser Based on Bigram Lexical Dependencies. Proceedings of the 34th Annual Meeting of theAcl, Santa Cruz, CA, 1996.
Copeck (T.),Barker (K.),Delisle (S.),Szpakowicz (S.), Automating the Measurement of Linguistic Features to Help Classify Texts as Technical.Taln2000, Lausanne, 2000.
Davalo (E.),Naim (P.), Des Réseaux de Neurones.Edition Eyrolles, 1993.
Desclés (J.P.),Cartier (E.),Jackiewicz (A.),Minel (J.L.), Textual Processing and Contextual Exploration Method,Context’97, Rio de Janeiro, Brésil, pp. 189–197, 1997.
Dreyfus (G.),Martinez (J.M.),Samuelides (M.),Gordon (M.B.),Badran (F.),Thiria (S.),Hérault (L.), Réseaux de neurones méthodologies et applications.Eyrolles, ISBN 2-212-11019-7, 2002.
Garcia (D.), Exploitation pour l’élaboration de requêtes de filtrage de texte, des connaissances causales détecté parCoatis, rifra’98 rencontre internationale sur l’extraction, le filtrage et le résumé automatique, pp. 44–54, 1998.
Habert (B.),Illouz (G.),Lafon (P.),Fleury (S.),Folch (H.),Heiden (S.),Prévost (S.), Profilage de textes : cadre de travail et expérience,Jadt 2000: 5^e Journées Internationales d’Analyse Statistique des Données Textuelles, 2000.
Joachims (T.), A probabilistic analysis of the Rocchio algorithm with tfidf for text categorization. Proceedings ofIcml-97, 14th International Conference on Machine Learning, pp. 143–151, 1997.
Joachims (T.), Text categorization with support vector machines: learning with many relevant features. Proceeding ofEcml-99, 16th European Conference on Machine Learning, pp. 137–142, 1998.
Junker (M.),Abecker (A.), Exploiting Thesaurus Knowledge in Rule Induction for Text Classification. Proceedings of theRanlp-97 Conference, pp.202–207, 1997.
Lewis (D. D.), An evaluation of phrasal and clustered representations on a text categorization task. Proceedings ofSigir-92, 15^th Acm International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, pp. 35–50, 1992.
Lewis (D. D.),Ringuette (M.), Comparison of two learning algorithms for text categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information RetrievalSdair ’94, 1994.
Marcu (D.), From discourse structures to text summaries.Workshop Intelligent Scalable Text Summarization, Madrid, Spain, 1997.
Minel (J.L.),Desclés (J. P.),Cartier (E.),Crispino (G.),Ben Hazez (S.),Jackiewicz (A.), Résumé automatique par filtrage sémantique d’informations dans des textes, Présentation de la plate-forme FilText.Revue Technique et Science Informatique, n^o 3, 2001.
Mc Callum (A.),Nigam (K.), A comparison of event models for naïve Bayes Text classification.Learning for text categorization, 1998.
Miller (G.), WordNet: An On-line Lexical Database.International Journal of Lexicography, 1990.
Namer (F.), Flemm: Un analyseur Flexionnel du Français à base de règles. Traitement automatique des langues pour la recherche d’information, revueT.a.l.,41, n^o2, (Jacquemin Ch., éd.), Paris, pp. 523–547, 2000.
Google Scholar
Nouali (O.), Filtrage d’information textuelle sur les réseaux: une appeoche hybride,thèse de doctorat, université des sciences et technologie d’Alger,Usthb, 2004.
Orasan (C),Krishnamurthy (R.), A corpus-based investigation of junk emails. Proceedings ofLrec-2002, Las Palmas, Spain, 2002.
Poibeau (T.), Nazarenko (A.), L’extraction d’information, une nouvelle conception de la compréhension de texte?tal,40, n^o 1–2, pp. 87–15, 1999.
Google Scholar
Sahami (M.),Dumais (S.),Heckerman (D.),Horvitz (E.), A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization Papers from theAaai Workshop, Madison Wisconsin, pp. 55–62, 1998.
Sebastiani (F.), A Tutorial on Automated Text Categorisation. Proceedings ofAsai-99, 1st Argentinean Symposium on Artificial Intelligence, 1999.
Yang (Y.),Pedersen (J.O.), A comparative Study on Feature Selection in Text Categorization. International Conference on Machine LearningIcml 1997, Nashville,Tn, USA, 1997.

Download references

Author information

Authors and Affiliations

Laboratoire de logiciels de base, CE.R.I.S.T., rue des 3 frères Aïssiou, 16030, Ben Aknoun, Alger, Algérie
Omar Nouali
LPL - Université de Provence, 29, Av. Robert Schuman, F-13621, Aix-en-Provence, France
Omar Nouali & Philippe Blache

Authors

Omar Nouali
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Blache
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nouali, O., Blache, P. Filtrage automatique de courriels Une approche adaptative et multiniveau. Ann. Télécommun. 60, 1466–1487 (2005). https://doi.org/10.1007/BF03219858

Download citation

Received: 25 May 2004
Accepted: 21 April 2005
Issue Date: December 2005
DOI: https://doi.org/10.1007/BF03219858

Mots clés

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Filtrage automatique de courriels Une approche adaptative et multiniveau

Résumé

Abstract

Access this article

Bibliographie

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Mots clés

Key words

Search

Navigation