Natural Language Processing of Semitic Languages pp 371-408 | Cite as
Automatic Summarization
Chapter
First Online:
Abstract
This chapter addresses automatic summarization of Semitic languages. After a presentation of the theoretical background and current challenges of automatic summarization, we present different approaches suggested to cope with these challenges. The main approaches dealing with Semitic languages (mainly Arabic, Hebrew, Maltese and Amharic) are then discussed. Finally, a case study of a specific Arabic automatic summarization system is presented.
Keywords
Source Text Arabic Text Text Summarization Sentence Position Semitic Language
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
- 1.Afantenos SD (2008) Summarizing reports on evolving events – part ii: non-linear evolution. In: Bernadette SE, Zock M (eds) Proceedings of the 5th international workshop on natural language processing and cognitive science (NLPCS 2008), Barcelona, pp 3–12Google Scholar
- 2.Alemany LA, Castellón I, Climent S, Fort MF, Padró L, Rodríguez H (2004) Approaches to text summarization: questions and answers. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial 8(22):79–102Google Scholar
- 3.Alrahabi M, Mourad G, Djioua B (2004) Filtrage sémantique de textes en arabe en vue d’un prototype de résumé automatique. In: dans les actes de la conference JEP/TALN’04, FèsGoogle Scholar
- 4.Alrahabi M, Djioua B, Desclés JP (2006) Annotation sémantique des énonciations en arabe. In: INFORSID’2006, HammametGoogle Scholar
- 5.AlSanie W (2005) Towards an infrastructure for Arabic text summarization using rhetorical structure theory. Master thesis in computer science, King Saud University, RiyadhGoogle Scholar
- 6.Amini MR, Usunier N (2009) Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, SIGIR’09, Boston. ACM, pp 704–705Google Scholar
- 7.Amini M, Tombros A, Usunier N, Lalmas M (2007) Learning-based summarisation of XML documents. Inf Retr 10(3):233–255CrossRefGoogle Scholar
- 8.Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: AND, Singapore, pp 91–97Google Scholar
- 9.Barzilay R, Elhadad M (1997) Using lexical chains for text summarization. In: Proceedings of the ACL/EACL 1997 workshop on intelligent scalable text summarization, Madrid, pp 10–17Google Scholar
- 10.Barzilay R, Lapata M (2005) Collective content selection for concept-to-text generation. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, HLT’05, Vancouver. Association for Computational Linguistics, pp 331–338. http://dx.doi.org/10.3115/1220575.1220617
- 11.Belguith LH, Chaaben N (2004) Implémentation du système morph2 d’analyse morphologique pour l’arabe non voyellé. In: Quatrièmes journées scientifiques des jeunes chercheurs en Génie Electrique et Informatique (GEI’2004), MonastirGoogle Scholar
- 12.Belguith LH, Baccour L, Ghassan M (2005) Segmentation de textes arabes basee sur l’analyse contextuelle des signes de ponctuations et de certaines particules. Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles TALN’2005, Dourdan, vol 1, pp 451–456Google Scholar
- 13.Belguith LH, Aloulou C, Ben Hamadou A (2007) Maspar: De la segmentation à l’analyse syntaxique de textes arabes. In: CEPADUES-Editions (ed) Revue information interaction intelligence I3, vol 2, pp 9–36. ISSN:1630-649x, http://www.revue-i3.org/
- 14.Biadsy F, Hirschberg J, Filatova E (2008) An unsupervised approach to biography production using wikipedia. In: Association for Computational Linguistics, Columbus, pp 807–815Google Scholar
- 15.Blair-Goldensohn S, Evans D, Hatzivassiloglou V, Mckeown K, Nenkova A, Passonneau R, Schiffman B, Schlaikjer A, Siddharthan A, Siegelman S (2004) Columbia University at DUC 2004. In: Proceedings of the document understanding conference, Boston, pp 23–30Google Scholar
- 16.Boudabbous MM, Maaloul MH, Belguith LH (2010) Digital learning for summarizing Arabic documents. In: Proceeding of the 7 th international conference on natural language processing, IceTAL’10, ReykjavikGoogle Scholar
- 17.Boudabbous MM, Keskes I, Maaloul MH, Belguith LH (2011) Automatic summarization of Arabic texts. In: 7th international computing conference in Arabic 2011 (ICCA2011), RiadhGoogle Scholar
- 18.Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Research and development in information retrieval, Melbourne. Association for Computing Machinery, New York, pp 335–336Google Scholar
- 19.Celikyilmaz A, Hakkani-Tur D (2010) A hybrid hierarchical model for multi-document summarization. In: ACL, Uppsala, pp 815–824Google Scholar
- 20.Chaaben N, Belguith LH, Ben Hamadou A (2010) The morph2 new version: a robust morphological analyzer for Arabic texts. In: Actes des 10emes journees internationales d’analyse statistique des données JADT’2010, Rome. http://jadt2010.uniroma1.it/
- 21.Conroy JM, Goldstein J, Schlesinger JD, O’leary DP (2004) Left-brain/right-brain multi-document summarization. In: Proceedings of the document understanding conference DUC’04, BostonGoogle Scholar
- 22.Conroy JM, Schlesinger JD, Kubina J (2011) CLASSY 2011 at TAC: guided and multi-lingual summaries and evaluation metrics. In: Proceedings of TAC’11, GaithersburgGoogle Scholar
- 23.Dang HT, Owczarzak K (2009) Overview of TAC 2009 summarization track. In: Proceedings of the second text analysis conference, GaithersburgGoogle Scholar
- 24.Daumé H III, Echihabi A, Marcu D, Munteanu DS, Soricut R (2002) GLEANS: a generator of logical extracts and abstracts for nice summaries. In: Proceedings of the second document understanding conference (DUC), Philadelphia, pp 9–14Google Scholar
- 25.Desclés JP (1997) Systèmes d’exploration contextuelle. In: Co-texte et calcul du sens – (Claude Guimier). Presses universitaires de Caen, pp 215–232Google Scholar
- 26.Desclés J-P, Minel J-L (2005) Interpréter par exploration contextuelle. In: Corblin F, Gardent C (eds) Interpréter en contexte. Hermès, Paris, pp 305–328Google Scholar
- 27.Douzidia F, Lapalme G (2004) Lakhas, an Arabic summarization system. In: Proceedings of DUC’04, NIST, Boston, pp 128–135Google Scholar
- 28.Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD (2007) QCS: a system for querying, clustering and summarizing documents. Inf Process Manag 43(6)1588–1605. Text summarizationGoogle Scholar
- 29.Edmundson HP (1969) New methods in automatic extracting. J Assoc Comput Mach 16(2):264–285CrossRefMATHGoogle Scholar
- 30.El-Haj M, Hammo B (2008) Evaluation of query-based Arabic text summarization system. In: Proceeding of the IEEE international conference on natural language processing and knowledge engineering, Beijing. IEEE Computer Society, pp 1–7Google Scholar
- 31.Ellouze M (2004) Des schémas rhétoriques pour le contrôle de la cohérence et génération de résumés automatiques d’articles scientifiques. Thèse de doctorat, Ecole Nationale des sciences de l’Informatique, Université de Manouba, TunisGoogle Scholar
- 32.Ercan G (2006) Automated text summarization and keyphrase extraction. Phd thesis, Bilkent UniversityGoogle Scholar
- 33.Erkan G, Radev DR (2004) Lexpagerank: prestige in multi-document text summarization. In: EMNLP, BarcelonaGoogle Scholar
- 34.Filatova E, Hatzivassiloglou V (2003) Domain-independent detection, extraction, and labeling of atomic events. In: Proceedings of the RANLP’03 conference, BorovetzGoogle Scholar
- 35.Fuentes M, Massot M, Rodríguez H, Alonso L (2003) Headline extraction combining statistic and symbolic techniques. In: DUC03, Edmonton. Association for Computational LinguisticsGoogle Scholar
- 36.Gamback B, Asker L (2010) Experiences with developing language processing tools and corpora for Amharic. In: Cunningham P, Cunningham M (eds) Proceedings of IST-Africa 2010, the 5th conference on regional: impact of information society technologies in Africa, Durban. http://www.sics.se/~gamback/publications/istafrica10.pdf
- 37.Giannakopoulos G, Karkaletsis V, Vouros G, Stamatopoulos P (2008) Summarization system evaluation revisited: N-gram graphs. ACM Trans Speech Lang Process 5(3):1–5CrossRefGoogle Scholar
- 38.Goldstein J, Mittal V, Carbonell J, Callan J (2000) Creating and evaluating multi-document sentence extract summaries. In: Proceedings of the ninth international conference on informationand knowledge management, McLean. ACM, New York, pp 165–172Google Scholar
- 39.HaCohen-Kerner Y, Malin E, Chasson I (2003) Summarization of Jewish law articles in Hebrew. In: Nygard KE (ed) Proceedings of the 16th international conference on computer applications in industry and engineering, ISCA, Imperial Palace Hotel, Las Vegas, pp 172–177Google Scholar
- 40.Hahn U (1998) Automatic extracting – a poor man’s approach to automatic abstracting. In: International workshop on extraction, filtering and automatic summarization (RIFRA’98), SfaxGoogle Scholar
- 41.Hahn U, Mani I (2000) The challenges of automatic summarization. Computer 33(11)29–36. http://dx.doi.org/10.1109/2.881692 Google Scholar
- 42.Harabagiu SM, Lacatusu VF, Maiorano SJ (2003) Multi-document summaries based on semantic redundancy. In: FLAIRS conference, St. Augustine, pp 387–391Google Scholar
- 43.Hatzivassiloglou V, Klavans J, Holcombe M, Barzilay R, Kan M, Mckeown K (2001) SIMFINDER: a flexible clustering tool for summarization. In: Proceedings of the NAACL workshop on automatic summarization, Pittsburgh, pp 41–49Google Scholar
- 44.Hmida F, Favre B (2011) LIF at TAC multiling: towards a truly language independent summarizer. In: Proceedings of TAC’11, GaithersburgGoogle Scholar
- 45.Hovy E (1999) Cross-lingual information extraction and automated text summarization. In: Multilingual information management: current levels and future abilities, chap 3. Istituti editoriali e poligrafici internazionali, PisaGoogle Scholar
- 46.Hovy E, Lin CY (1999) Automated text summarization in summarist. In: Mani I, Maybury MT (eds) Advances in automatic text summarization. MIT, CambridgeGoogle Scholar
- 47.Hovy E, Marcu D (1998) Automated text summarization tutorial. In: COLING/ACL’98, MontrealGoogle Scholar
- 48.Hovy E, yew Lin C, Zhou L, Fukumoto J (2006) Automated summarization evaluation with basic elements. In: Proceedings of the fifth conference on language resources and evaluation (LREC’06), GenoaGoogle Scholar
- 49.Jagarlamudi J, Pingali P, Varma V (2007) Capturing sentence prior for query-based multi-document summarization. In: RIAO, PittsburghGoogle Scholar
- 50.Jaoua M, Hamadou AB (2003) Automatic text summarization of scientific articles based on classification of extract’s population. In: Proceedings of the 4th international conference on computational linguistics and intelligent text processing, CICLing’03, Mexico City. Springer, Berlin/Heidelberg, pp 623–634Google Scholar
- 51.Jaoua FK, Belguithand LH, Jaoua M, BenHamadou A (2009) An automatic multi-documents summarization method based on extracts classification. Int J Comput Sci Eng Syst (IJCSES) 3:221–231Google Scholar
- 52.Jones KS (1999) Automatic summarising: factors and directions. In: Advances in automatic text summarization. MIT, Cambridge, pp 1–12Google Scholar
- 53.Keskes I, Boudabous MM, Maaloul MH, Belguith LH (2012) Etude comparative entre trois approches de résume automatique de documents arabes. In: Actes de la conférence conjointe JEP-TALN-RECITAL’2012: TALN, GrenobleGoogle Scholar
- 54.Ku LW, Liang YT, Chen HH (2006) Opinion extraction, summarization and tracking in news and blog corpora. In: AAAI spring symposium: computational approaches to analyzing weblogs, Stanford, pp 100–107Google Scholar
- 55.Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’95, Seattle. ACM, New York, pp 68–73. http://doi.acm.org/10.1145/215206.215333
- 56.Lacatusu VF, Maiorano SJ, Harabagiu SM (2004) Multi-document summarization using multiple-sequence alignment. In: LREC, LisbonGoogle Scholar
- 57.Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Proceedings of ACL workshop on text summarization branches out, Barcelona, p 10Google Scholar
- 58.Lin CY, Hovy E (1997) Identifying topics by position. In: Proceedings of the fifth conference on applied natural language processing, Washington, DC. Morgan Kaufmann, San Francisco, pp 283–290. http://dx.doi.org/10.3115/974557.974599
- 59.Lin CY, Hovy E (2002) Automated multi-document summarization in neats. In: Proceedings of the second international conference on human language technology research, HLT’02, San Diego. Morgan Kaufmann, San Francisco, pp 59–62Google Scholar
- 60.Lin CY, Hovy E (2003) The potential and limitations of automatic sentence extraction for summarization. In: Proceedings of the HLT-NAACL 03 on text summarization workshop, HLT-NAACL-DUC’03, Edmonton, vol 5. Association for Computational Linguistics, Stroudsburg, pp 73–80Google Scholar
- 61.Litvak M, Lipman H, Ben-Gur A, Kisilevich S, Keim DA, Last M (2010) Towards multi-lingual summarization: a comparative analysis of sentence extraction methods on English and Hebrew corpora. In: Proceedings of the 4th international workshop on cross lingual information access, Beijing, pp 61–69. http://bib.dbvis.de/uploadedFiles/219.pdf
- 62.Litvak M, Last M, Friedman M, Kisilevich S (2011) MUSE – a multilingual sentence extractor. In: Computational linguistics & applications (CLA 11), Jachranka. http://bib.dbvis.de/uploadedFiles/362.pdf
- 63.Luhn H (1958) The automatic creation of literature abstracts. IBM J 2:159–165CrossRefMathSciNetGoogle Scholar
- 64.Maaloul MH, Keskes I, Belguith LH (2010) Résume automatique de documents arabes basé sur la technique RST. In: Actes de TALN 2010, MontréalGoogle Scholar
- 65.Maaloul MH, Khemakhem ME, Belguith LH (2008) Al lakas el’eli: un systeme de resume automatique de documents arabes. In: International Business Information Management Association (IBIMA’2008), MarrakeshGoogle Scholar
- 66.Maaloul MH, Keskes I, Belguith LH, Blache P (2010) Automatic summarization of Arabic texts based on rst technique. In: International conference on enterprise information systems (ICEIS) 2, FunchalGoogle Scholar
- 67.Maaloul MH, Ajjel W, Belguith LH (2012) Role of linguistic analysis in detecting rhetorical relations. In: International conference on Arabic language processing, CITALA’2012, RabatGoogle Scholar
- 68.Mani I (2001) Automatic summarization. John Benjamins, Amsterdam/PhiladelphiaCrossRefMATHGoogle Scholar
- 69.Mani I, Bloedorn E (1999) Summarizing similarities and differences among related documents. Inf Retr 1(1–2):35–67CrossRefGoogle Scholar
- 70.Mani I, Maybury MT (2001) Automatic summarization. In: Association for Computational Linguistics, ToulouseCrossRefMATHGoogle Scholar
- 71.Mann WC, Thompson SA (1988) Rhetorical structure theory: toward a functional theory of text organization. Text 8(3):243–281Google Scholar
- 72.Marcu D (2000) The theory and practice of discourse parsing and summarization. MIT, CambridgeMATHGoogle Scholar
- 73.Marcu D, Carlson L, Watanabe M (2000) The automatic translation of discourse structures. In: ANLP, Seattle, pp 9–17Google Scholar
- 74.Mathkour HI, Touir AA, Al-Sanea WA (2008) Parsing Arabic texts using rhetorical structure theory. J Comput Sci 4(9):713–720CrossRefGoogle Scholar
- 75.Maybury MT (ed) (1999) Advances in automatic text summarization. MIT, CambridgeGoogle Scholar
- 76.McKeown KR, Klavans JL, Hatzivassiloglou V, Barzilay R, Eskin E (1999) Towards multidocument summarization by reformulation: progress and prospects. In: Proceedings of the sixteenth national conference on artificial intelligence and the eleventh innovative applications of artificial intelligence conference innovative applications of artificial intelligence, AAAI’99/IAAI’99, Orlando. American Association for Artificial Intelligence, pp 453–460Google Scholar
- 77.Melli G, Shi Z, Wang Y, Liu Y, Sarkar A, Popowich F (2006) Description of SQUASH, the SFU question answering summary handler for the DUC-2006 summarization task. In: Proceedings of the document understanding conference 2006 (DUC’2006), New York CityGoogle Scholar
- 78.Minel JL (2002) Filtrage sémantique: du résume automatique à la fouille de textes. Hermes Science, ParisGoogle Scholar
- 79.Minel JL, Descles JP, Cartier E, Crispino G, Ben Hazez S, Jackiewicz A (2009) Resume automatique par filtrage semantique d’informations dans des textes. Revue Techniques et Sciences InformatiquesGoogle Scholar
- 80.Mori T, Nozawa M, Asada Y (2004) Multi-answer-focused multi-document summarization using a question-answering engine. In: Proceedings of the 20th international conference on computational linguistics, COLING’04, Geneva. Association for Computational LinguisticsGoogle Scholar
- 81.Nenkova A, Passonneau R (2004) Evaluating content selection in summarization: the pyramid method. In: Human language technologies: conference of the North American chapter of the Association of Computational Linguistics HLT/NAACL, Boston, pp 145–152Google Scholar
- 82.Nobata C, Sekine S (2004) CRL/NYU summarization system at DUC-2004. In: DUC’2004, BostonGoogle Scholar
- 83.Ono K, Sumlta K, Miike S (1994) Abstract generation based on rhetorical structure extraction. In: Proceedings of COLING, Kyoto, pp 344–348Google Scholar
- 84.Ou S, Khoo CSG, Goh DH (2008) Design and development of a concept-based multi-document summarization system for research abstracts. J Inf Sci 34(3):308–326CrossRefGoogle Scholar
- 85.Paice CD (1990) Constructing literature abstracts by computer: techniques and prospects. Inf Process Manag 26(1)171–186. Special issue: Natural Language Processing and Information RetrievalGoogle Scholar
- 86.Paice CD, Jones PA (1993) A ‘select and generate’ approach in automatic abstracting. In: Mcenery T, Paice CD (eds) 14th information retrieval colloquium, Lancaster. SpringerGoogle Scholar
- 87.Radev DR (2000) A common theory of information fusion from multiple text sources step one: cross-document structure. In: Proceedings of the 1st SIGdial workshop on discourse and dialogue, SIGDIAL’00, Hong Kong, vol 10. Association for Computational Linguistics, pp 74–83Google Scholar
- 88.Radev DR (2001) Experiments in single and multidocument summarization using mead. In: First document understanding conference, New OrleansGoogle Scholar
- 89.Radev DR, McKeown KR (1998) Generating natural language summaries from multiple on-line sources. Comput Linguist 24(3):470–500. http://dl.acm.org/citation.cfm?id=972749.972755 Google Scholar
- 90.Roussarie L, Amsili P (2002) Discours et compositionnalite. In: Actes de la 9eme Conference sur le Traitement Automatique des Langues Naturelles (TALN 2002), Nancy, vol 1, pp 383–388. http://talana.linguist.jussieu.fr/~laurent/Papiers/Taln2002.ps.gz
- 91.Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(3):193–207. http://dx.doi.org/10.1016/S0306-4573(96)00062-3
- 92.Schlesinger JD, O’Leary DP, Conroy JM (2008) Arabic/English multi-document summarization with classy – the past and the future. In: Gelbukh AF (ed) CICLing, Haifa. Lecture notes in computer science, vol 4919. Springer, pp 568–581Google Scholar
- 93.Sekine S, Nobata C (2003) A survey for multi-document summarization. In: Proceedings of the HLT-NAACL 03 on text summarization workshop, HLT-NAACL-DUC’03, Edmonton, vol 5. Association for Computational Linguistics, Stroudsburg, pp 65–72. http://dx.doi.org/10.3115/1119467.1119476
- 94.Sitbon L (2007) Robustesse en recherche d’information: application a l’accessibilite aux personnes handicapees. PhD thesis, Universite d’AvignonGoogle Scholar
- 95.Sobh I, Darwish N, Fayek M (2007) An optimized dual classification system for Arabic extractive generic text summarization. In: Proceedings of the 7th conference on language engineering, ESLEC’07, CairoGoogle Scholar
- 96.Steinberger J, Kabadjov M, Steinberger R, Tanev H, Turchi M, Vanni Z (2011) JRC’s participation at TAC 2011: guided and multilingual summarization tasks. In: Proceedings of TAC’11, GaithersburgGoogle Scholar
- 97.Teufel S, Moens M (1997) Sentence extraction as a classification task. In: Proceedings of the workshop on intelligent scalable text summarization at the ACL/EACL conference, Madrid, pp 58–65Google Scholar
- 98.Tratz S, Hovy E (2008) Summarisation evaluation using transformed basic elements. In: Proceedings TAC 2008, Gaithersburg, NIST, p 10pGoogle Scholar
- 99.Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retr 2(4):303–336CrossRefGoogle Scholar
- 100.Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf Process Manag 43(6):1606–1618CrossRefGoogle Scholar
- 101.Vella G (2010) Automatic summarization of legal documents. Technical report, Master’s thesis. Deptartment CSAI, University of MaltaGoogle Scholar
- 102.White M, Korelsky T, Cardie C, Ng V, Pierce D, Wagstaff K (2001) Multi-document Summarization via information extraction. In: Proceedings first international conference on human language technology research, San Diego, pp 263–269. http://acl.ldc.upenn.edu/H/H01/H01-1054.pdf
- 103.Yeh JY, Ke HR, Yang WP (2006) Query-focused multidocument summarization based on hybrid relevance analysis and surface feature salience. In: Proceedings of the 6th WSEAS international conference on simulation, modelling and optimization, SMO’06, Lisbon. World Scientific and Engineering Academy and Society (WSEAS), pp 464–469Google Scholar
- 104.Zhou L, Ticrea M, Hovy EH (2004) Multi-document biography summarization. In: EMNLP, Barcelona, pp 434–441Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2014