Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents

Alami, Nabil; En-nahnahi, Noureddine; Ouatik, Said Alaoui; Meknassi, Mohammed

doi:10.1007/s13369-018-3198-y

Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents

Research Article - Computer Engineering and Computer Science
Published: 28 March 2018

Volume 43, pages 7803–7815, (2018)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Nabil Alami¹,
Noureddine En-nahnahi¹,
Said Alaoui Ouatik¹ &
…
Mohammed Meknassi¹

415 Accesses
25 Citations
3 Altmetric
Explore all metrics

Abstract

Traditional Arabic text summarization (ATS) systems are based on bag-of-words representation, which involve a sparse and high-dimensional input data. Thus, dimensionality reduction is greatly needed to increase the power of features discrimination. In this paper, we present a new method for ATS using variational auto-encoder (VAE) model to learn a feature space from a high-dimensional input data. We explore several input representations such as term frequency (tf), tf-idf and both local and global vocabularies. All sentences are ranked according to the latent representation produced by the VAE. We investigate the impact of using VAE with two summarization approaches, graph-based and query-based approaches. Experiments on two benchmark datasets specifically designed for ATS show that the VAE using tf-idf representation of global vocabularies clearly provides a more discriminative feature space and improves the recall of other models. Experiment results confirm that the proposed method leads to better performance than most of the state-of-the-art extractive summarization approaches for both graph-based and query-based summarization approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Statistical and Semantic Analysis for Arabic Text Summarization

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

Article 04 February 2021

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

Article 05 May 2018

References

Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Ferreira, R.; de Souza Cabral, L.; Freitas, F.; Lins, R.D.; de Frana Silva, G.; Simske, S.J.; Favaro, L.: A multi-document summarization system based on statistics and linguistic treatment. Expert Syst. Appl. 41(13), 5780–5787 (2014)
Article Google Scholar
Ferreira, R.; De Souza, L.; Dueire, R.; et al.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40(14), 5755–5764 (2013). https://doi.org/10.1016/j.eswa.2013.04.023
Article Google Scholar
Erkan, G.; Radev, D.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Baralis, E.; Cagliero, L.; Mahoto, N.; Fiori, A.: GRAPHSUM : discovering correlations among multiple terms for graph-based summarization. Inf. Sci. 249, 96–109 (2013). https://doi.org/10.1016/j.ins.2013.06.046
Article MathSciNet Google Scholar
Mihalcea, R.; Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Spain, pp. 404–411 (2004)
Fattah, M.A.: A hybrid machine learning model for multi-document summarization. Appl. Intell. 40(4), 592–600 (2014). https://doi.org/10.1007/s10489-013-0490-0
Article Google Scholar
Alguliyev, R.M.; Aliguliyev, R.M.; Isazade, N.R.: An unsupervised approach to generating generic summaries of documents. Appl. Soft Comput. 34, 236–250 (2015). https://doi.org/10.1016/j.asoc.2015.04.050
Article Google Scholar
Yang, L.; Cai, X.; Zhang, Y.; Shi, P.: Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization. Inf. Sci. 260, 37–50 (2014). https://doi.org/10.1016/j.ins.2013.11.026
Article Google Scholar
Yousefi-Azar, M.; Hamey, L.: Text summarization using unsupervised deep learning. Expert Syst. Appl. 68, 93–105 (2017). https://doi.org/10.1016/j.eswa.2016.10.017
Article Google Scholar
Akbarizadeh, G.: Segmentation of SAR satellite images using cellular learning automata and adaptive chains. J. Remote Sens. Technol. pp. 44–51 (2013). https://doi.org/10.18005/jrst0102003
Akbarizadeh, G.; Moghaddam, A.E.: Detection of lung nodules in CT scans based on unsupervised feature learning and fuzzy inference. J. Med. Imaging Health Inform. 6(2), 477–483 (2016). https://doi.org/10.1166/jmihi.2016.1720
Article Google Scholar
Rahmani, M.; Akbarizadeh, G.: Unsupervised feature learning based on sparse coding and spectral clustering for segmentation of synthetic aperture radar images. IET Comput. Vision 9(5), 629–638 (2015). https://doi.org/10.1049/iet-cvi.2014.0295
Article Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet Google Scholar
Krizhevsky, A.; Sutskever, I.; Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12), Lake Tahoe, Nevada, USA, pp. 1090–1098 (2012)
Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of the 2nd International Conference On Learning Representation (ICLR2014), Banff, Canada (2014)
Donahue, J.; Anne Hendricks, L.; Rohrbach, M.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)
Article Google Scholar
Er, M.J.; Zhang, Y.; Wang, N.; Pratama, M.: Attention pooling-based convolutional neural network for sentence modelling. Inf. Sci. 373, 388–403 (2016). https://doi.org/10.1016/j.ins.2016.08.084
Article Google Scholar
Li, F.; Zhang, M.; Tian, B.; Chen, B.; Fu, G.; Ji, D.: Recognizing irregular entities in biomedical text via deep neural networks. Pattern Recognit. Lett. (2017). https://doi.org/10.1016/j.patrec.2017.06.009
Article Google Scholar
Ayinde, B.O.; Zurada, J.M.: Deep learning of constrained autoencoders for enhanced understanding of data. IEEE Trans. Neural Netw. Learn. Syst. 99, 1–11 (2017). https://doi.org/10.1109/tnnls.2017.2747861
Article Google Scholar
Firat, O.; Cho, K.; Sankaran, B.; Yarman Vural, F.T.; Bengio, Y.: Multi-way, multilingual neural machine translation. Comput. Speech Lang. 45, 236–252 (2017). https://doi.org/10.1016/j.csl.2016.10.006
Article Google Scholar
Zhong, Sh; Liu, Y.; Li, B.; Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42(21), 8146–8155 (2015)
Article Google Scholar
Kingma, D.P.; Welling, M.: Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations, Banff, Canada (2014)
Li, H.; Misra, S.: Prediction of subsurface NMR T2 distributions in a shale petroleum system using variational autoencoder-based neural networks. IEEE Geosci. Remote Sens. Lett. 14(12), 2395–2397 (2017). https://doi.org/10.1109/lgrs.2017.2766130
Article Google Scholar
Akbarizadeh, G.; Tirandaz, Z.; Kooshesh, M.: A new curvelet based texture classification approach for land cover recognition of SAR satellite images. Malays. J. Comput. Sci. 27(3), 218–239 (2014)
Google Scholar
Ahmadi, N.; Akbarizadeh, G.: Hybrid robust iris recognition approach using iris image pre-processing, two-dimensional gabor features and multi-layer perceptron neural network/PSO. IET Biom. (2017). https://doi.org/10.1049/iet-bmt.2017.0041
Article Google Scholar
Wang, L.; Zhang, J.; Liu, P.; Choo, K.-K.R.; Huang, F.: Spectral-spatial multi-feature-based deep learning for hyperspectral remote sensing image classification. Soft. Comput. 21(1), 213–221 (2016). https://doi.org/10.1007/s00500-016-2246-3
Article MATH Google Scholar
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning—ICML ’08. https://doi.org/10.1145/1390156.1390294 (2008)
Noda, K.; Yamaguchi, Y.; Nakadai, K.; Okuno, H.G.; Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2014). https://doi.org/10.1007/s10489-014-0629-7
Article Google Scholar
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Kim, E.; Corte-Real, M.; Baloch, Z.: A deep semantic mobile application for thyroid cytopathology. In: Medical Imaging 2016: PACS and Imaging Informatics: Next Generation and Innovations (2016). https://doi.org/10.1117/12.2216468
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017). https://doi.org/10.1038/nature21056
Article Google Scholar
Gulshan, V.; Peng, L.; Coram, M.; et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22), 2402 (2016). https://doi.org/10.1001/jama.2016.17216
Article Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)
Article Google Scholar
Heu, J.U.; Qasim, I.; Lee, D.H.: FoDoSu: multi-document summarization exploiting semantic analysis based on social Folksonomy. Inf. Process. Manag. 51(1), 212–225 (2015). https://doi.org/10.1016/j.ipm.2014.06.003
Article Google Scholar
Fang, H.; Lu, W.; Wu, F.; Zhang, Y.; Shang, X.; Shao, J.; Zhuang, Y.: Topic aspect-oriented summarization via group selection. Neurocomputing 149, 1613–1619 (2015). https://doi.org/10.1016/j.neucom.2014.08.031
Article Google Scholar
Denil, M.; Demiraj, A.; de Freitas, N.: Extraction of salient sentences from labelled documents. arXiv preprint arXiv:1412.6815 (2014)
Ha, J.W.; Kang, D.; Pyo, H.; Kim, J.: News2Images: automatically summarizing news articles into image-based contents via deep learning. In: 3rd International Workshop on News Recommendation and Analytics (INRA 2015) (with RECSYS 2015), Vienna, Austria (2015)
Cao, Z.; Wei, F.; Dong, L.; Li, S.; Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, pp. 2153–2159 (2015)
Rezende, D.J.; Mohamed, S.; Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML’14), vol. 32, Beijing, China, pp. 1278–1286 (2014)
Hinton, G.E.; Osindero, S.; The, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Hinton, G.E.; Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Kingma, D.P.; Mohamed, S.; Rezende, D.J.; Welling, M.: Semi-supervised learning with deep generative models. In: Proceedings of Neural Information Processing Systems (NIPS’14), pp. 3581–3589 (2014)
El-Haj, M.; Kruschwitz, U.; Fox, C.: Using mechanical Turk to create a corpus of Arabic summaries. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), Valletta, Malta, pp. 36–39, in the Language Resources (LRs) and Human Language Technologies (HLT) for Semitic Languages workshop held in conjunction with the 7th international language resources and evaluation conference (2010)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of workshop on text summarization branches out, post-conference workshop of ACL, pp. 74–81 (2004)
Mashechkin, I.V.; Petrovskiy, M.I.; Popov, D.S.; Tsarev, D.V.: Automatic text summarization using latent semantic analysis. Program. Comput. Softw. 37(6), 299–305 (2011). https://doi.org/10.1134/s0361768811060041
Article MathSciNet MATH Google Scholar
Brin, S.; Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998). https://doi.org/10.1016/s0169-7552(98)00110-x
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science Dhar EL Mahraz, Laboratory of Informatics and Modeling (LIM), Sidi Mohamed Ben Abdellah University, Fez, Morocco
Nabil Alami, Noureddine En-nahnahi, Said Alaoui Ouatik & Mohammed Meknassi

Authors

Nabil Alami
View author publications
You can also search for this author in PubMed Google Scholar
Noureddine En-nahnahi
View author publications
You can also search for this author in PubMed Google Scholar
Said Alaoui Ouatik
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Meknassi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nabil Alami.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alami, N., En-nahnahi, N., Ouatik, S.A. et al. Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents. Arab J Sci Eng 43, 7803–7815 (2018). https://doi.org/10.1007/s13369-018-3198-y

Download citation

Received: 25 August 2017
Accepted: 20 March 2018
Published: 28 March 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s13369-018-3198-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents

Abstract

Access this article

Similar content being viewed by others

Using Statistical and Semantic Analysis for Arabic Text Summarization

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents

Abstract

Access this article

Similar content being viewed by others

Using Statistical and Semantic Analysis for Arabic Text Summarization

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation