Abdelaziz, I., Abdou, S.: Altecondb: a large-vocabulary arabic online handwriting recognition database. arXiv:1412.7626 (2014)
Dobais, M.A.A, Alrasheed, F.A.G., Latif, G., Alzubaidi, L.: Adoptive thresholding and geometric features based physical layout analysis of scanned arabic books. In: 2018 IEEE 2nd international workshop on arabic and derived script analysis and recognition (ASAR), pp. 171–176. IEEE (2018)
Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? Analysis and detection of religious hate speech in the arabic twittersphere. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76 (2018)
Alexey, B., Yao, W.C., Yuan, L.H.: Yolov4: optimal speed and accuracy of object detection. In arXiv:2004.10934 (2020)
Almutairi, A., Almashan, M.: Instance segmentation of newspaper elements using mask R-CNN. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1371–1375. IEEE (2019)
Alshameri, A., Abdou, S., Mostafa, K.: A combined algorithm for layout analysis of Arabic document images and text lines extraction. Int. J. Comput. Appl. 49(23), 30–37 (2012)
ALTEC dataset. http://www.altec-center.org/conference/?page_id=87
Amazon Mechanical Turk. https://www.mturk.com/mturk/welcome
The ASAR Physical Layout Analysis Challenge at the 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition, London, U.K., March 2018. https://asar.ieee.tn/competition/
2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition, London, U.K., March 2018
Asi, A., Cohen, R., Kedem, K., El-Sana, J., Dinstein,I.: A coarse-to-fine approach for layout analysis of ancient manuscripts. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 140–145 (2014)
Barakat, B., Droby, A., Kassis, M., El-Sana, J.: Text line segmentation for challenging handwritten document images using fully convolutional network. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 374–379 (2018)
Barakat, B.K., El-Sana, J.: Binarization free layout analysis for arabic historical documents using fully convolutional networks. In: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pp. 151–155. IEEE (2018)
Belaïd, A., Ouwayed, N.: Segmentation of ancient Arabic documents. In: Märgner, V., El Abed, H. (eds.) Guide to OCR for Arabic Scripts, pp. 103–122. Springer, London (2012)
Chapter
Google Scholar
Boussellaa, W., Zahour, A., Taconet, B., Alimi, A., Benabdelhafid, A.: PRAAD: preprocessing and analysis tool for Arabic ancient documents. In: 9th International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 1058–1062 (2007)
Bukhari, S.S., Azawi, A., Ali, M.I., Shafait, F., Breuel, T.M.: Document image segmentation using discriminative learning over connected components. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, Boston, pp. 183–190 (2010)
Bukhari, S.S., Breuel, T.M., Asi, A., El Sana, J.: Layout analysis for arabic historical document images using machine learning. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 639–644. IEEE (2012)
Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)
Article
Google Scholar
Chen, K., Liu, C.L., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation for historical document images based on superpixel classification with unsupervised feature learning. In: 12th IAPR workshop on document analysis systems (DAS), pp. 299–304 (2016)
Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011–1015 (2015)
Cotterell, R., Callison-Burch, C.: A multi-dialect, multi-genre corpus of informal written arabic. In: LREC, pp. 241–245 (2014)
Cotterell., Ryan, B., Chris, C..: A multi-dialect, multi-genre corpus of informal written Arabic. In: LREC, pp. 241–245 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. Comput. Vis. Pattern Recognit. CVPR 2009, 248–255 (2009)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei, L.F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Abed, H.E., Märgner, V., Kherallah, M., Alimi, A.M.: ICDAR 2009 online arabic handwriting recognition competition. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1388–1392. IEEE (2009)
El-Mawass, N., Alaboodi, S.: Detecting arabic spammers and content polluters on twitter. In: Sixth International Conference on Digital Information Processing and Communications (ICDIPC), pp. 53–58 (2016)
Elanwar, R., Betke, M.: The ASAR 2018 competition on physical layout analysis of scanned arabic books (PLA-SAB 2018). In: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pp. 177–182. IEEE (2018)
Elanwar, R., Qin, W., Betke, M.: Making scanned arabic documents machine accessible using an ensemble of SVM classifiers. Int. J. Doc. Anal. Recognit. (IJDAR) 21(1–2), 59–75 (2018)
Article
Google Scholar
Farra, N., McKeown, K., Habash, N.: Annotating targets of opinions in Arabic using crowdsourcing. In: Second workshop on Arabic natural language processing, pp. 89–98 (2015)
Girshick, Ross.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Hadjar, K., Ingold, R.: Arabic newspaper page segmentation. In: 7th International Conference on Document Analysis and Recognition, pp. 895—899 (2003)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hesham, A.M., Rashwan, M.A.A., Barhamtoshy, H.M.A., Abdou, S.M., Badr, A.A., Farag, I.: Arabic document layout analysis. Pattern Anal. Appl. 20(4), 1275–1287 (2017)
MathSciNet
Article
Google Scholar
Kassis, M., El-Sana, J.: Scribble based interactive page layout segmentation using gabor filter. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 13–18 (2016)
Ibn Khedher, M., Jmila, H., El-Yacoubi, M.A.: Automatic processing of historical arabic documents: a comprehensive survey. Pattern Recognit. 100, 107144 (2020)
Article
Google Scholar
Lawson, N., Eustice, K., Perkowitz, M., Yetisgen-Yildiz, M.: Annotating large email datasets for named entity recognition with Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 71–79 (2010)
LabelMe tool. http://labelme.csail.mit.edu/Release3.0/
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Mahmoud, S.A., Ahmad, I., Khatib, W.G.A., Alshayeb, M., Parvez, M.T., Märgner, V., Fink, G.A.: KHATT: an open arabic offline handwritten text database. Pattern Recognit. 47(3), 1096–1112 (2014)
Article
Google Scholar
Mahmoud, S.A., Luqman, H., Al-Helali, B.M., BinMakhashen, G., Parvez, M.T.: Online-khatt: an open-vocabulary database for arabic online-text processing. Open Cybern. Syst. J. 12(1), 42–59 (2018)
Minghao, L., Yiheng, X., Lei, C., Shaohan, H., Furu, W., Zhoujun, L., Ming, Z.: Docbank: a benchmark dataset for document layout analysis. arXiv:2006.01038 (2020)
Neche, C., Belaid, A., Kacem-Echi, A.: Arabic handwritten documents segmentation into text-lines and words using deep learning. In: International Conference on Document Analysis and Recognition Workshops (ICDARW), pp. 19–24 (2019)
Nikolaou, N., Makridis, M., Gatos, B., Stamatopoulos, N., Papamarkos, N.: Segmentation of historical machine-printed documents using adaptive run length smoothing and skeleton segmentation paths. Image Vis. Comput. 28(4), 590–604 (2010)
Article
Google Scholar
Pastor-Pellicer, J., Afzal, M.Z., Liwicki, M., Castro-Bleda, M.J.: Complete system for text line extraction using convolutional neural networks and water-shed transform. In: 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 30-35 (2016)
Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H.: IFN/ENIT-database of handwritten arabic words. In: Proceedings of CIFED, volume 2, pp. 127–136. Citeseer (2002)
Pletschacher S., Antonacopoulos, A.: The PAGE (page analysis and ground-truth elements) format framework. In: 20th International Conference on Pattern Recognition (ICPR), pp. 257–260 (2010)
PyTorch sytem of libraries and tools for machine learning. https://pytorch.org/ (2020)
Rashtchian, C., Youngand, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 139–147 (2010)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)
Saad, R.S.M., Elanwar, R., Abdel Kader, N.S., Mashali, S., Betke, M., Asar 2018 layout analysis challenge: using random forests to analyze scanned Arabic books. In: 2nd IEEE International Workshop on Arabic and derived Script Analysis and Recognition (ASAR 2018), London, March 2018, 2018. p. 6
Rana S.M.S., Randa I.E., Abdel Kader, N.S., Samia, M., Margrit, B.: BCE-Arabic-v1 dataset: towards interpreting arabic document images for people with visual impairments. In: Proceedings of the 9th ACM International Conference on Pervasive Technologies Related to Assistive Environments, pp. 1–8 (2016)
Shafait, Faisal, Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 941–954 (2008)
Article
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M., Hennebert, J.: A new arabic printed text image database and evaluation protocols. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 946–950. IEEE (2009)
Strassel, S.: Linguistic resources for arabic handwriting recognition. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt (2009)
Studer, L., Alberti, M., Pondenkandath, V., Goktepey, P., Kolonko, T., Fischeryz, A., Liwicki, M., Ingold, R.: A comprehensive study of imagenet pre-training for historical document image analysis. In: 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725 (2019)
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Article
Google Scholar
Wei, H., Seuret, M., Chen, K., Fischer, A., Liwicki, M., Ingold, R.: Selecting autoencoder features for layout analysis of historical documents. In: ACM 3rd International Workshop on Historical Document Imaging and Processing, pp. 55–62 (2015)
Wick, C., Puppe, F.: Fully convolutional neural networks for page segmentation of historical document images. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 287–292. IEEE (2018)
Wray, S., Mubarak, H., Ali,A.: Best practices for crowdsourcing dialectal arabic speech transcription. In: ANLP Workshop, p. 99 (2015)
Wray, S., Mubarak, H., Ali, A.: Best practices for crowdsourcing dialectal arabic speech transcription. In: Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 99–107 (2015)
Zaidan, O.F., Callison-Burch, C.: The Arabic online commentary dataset: an annotated dataset of informal Arabic with high dialectal content. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, 2:37–41 (2011)
Zaidan, O.F., Callison-Burch, C.: Arabic dialect identification. Comput. Linguist. 40(1), 171–202 (2014)
Article
Google Scholar
Zaidan, O.F., Burch, C.C..: The arabic online commentary dataset: an annotated dataset of informal arabic with high dialectal content. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers-volume 2, pp. 37–41. Association for Computational Linguistics (2011)
Zaidan, O.F., Burch, C.C.: Arabic dialect identification. Comput. Linguist. 40(1), 171–202 (2014)
Zhong, X., Jianbin, T., Jimeno, Y.A.: Publaynet: largest dataset ever for document layout analysis. In: 15th International Conference on Document Analysis and Recognition (ICDAR) (2019)