Aiello, M., Monz, C., Todoran, L., Worring, M.: Document understanding for a broad class of documents. Int. J. Doc. Anal. Recogn. 5(1), 1–16 (2002). doi:10.1007/s10032-002-0080-x
Article
MATH
Google Scholar
Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docear’s PDF inspector: title extraction from PDF files. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2013) (2013)
Constantin, A., Pettifer, S., Voronkov, A.: PDFX: fully-automated PDF-to-XML conversion of scientific literature. In: Proceedings of the 13th ACM Symposium on Document, Engineering (2013)
Councill, I.G., Giles, C.L., Kan, M.y.: ParsCit: An Open-Source CRF Reference String Parsing Package. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Tapias, D. (eds.) Proceedings of LREC, vol. 2008, pp. 661–667. Citeseer, European Language Resources Association (ELRA) (2008). doi:10.1.1.150.6790
Dejean, H., Meunier, J.L.: A system for converting PDF documents into structured XML format. In: Document Analysis Systems VII, pp. 129–140 (2006)
Doucet, A., Kazai, G., Colutto, S., Mühlberger, G.: Overview of the ICDAR 2013 competition on book structure extraction. In: Proceedings of the Twelfth International Conference on Document Analysis and Recognition (ICDAR’2013), p. 6. Washington DC, USA (2013)
Esposito, F., Ferilli, S., Basile, T.M.A.: Machine learning for digital document processing: from layout analysis to metadata extraction. World Wide Web Internet Web Inform. Syst. 138(2008), 1–35 (2008). doi:10.1007/978-3-540-76280-5_5
Google Scholar
Ferilli, S., Basile, T., Mauro, N.D.: Markov logic networks for document layout correction. In: Modern Approaches in, Applied Intelligence, pp. 275–284 (2011)
Gao, L., Tang, Z., Lin, X., Liu, Y., Qiu, R., Wang, Y.: Structure extraction from PDF-based book documents. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, pp. 11–20 (2011)
Gorman, L.O., Definitions, A.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)
Article
Google Scholar
Granitzer, M., Hristakeva, M., Knight, R., Jack, K.: A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management. In: Proceedings of the 27th Symposium On Applied Computing, p. to appear. ACM, New York (2012)
Granitzer, M., Hristakeva, M., Knight, R., Jack, K., Kern, R.: A comparison of layout based bibliographic metadata extraction techniques. In: WIMS12—International Conference on Web Intelligence, Mining and Semantics, pp. 19:1–19:8. ACM, New York (2012)
Kern, R., Jack, K., Hristakeva, M., Granitzer, M.: TeamBeam—meta-data extraction from scientific literature. In: 1st International Workshop on Mining Scientific Publications (2012)
Kern, R., Klampfl, S.: Extraction of references using layout and formatting information from scientific articles. D-Lib Magazine 19(9/10) (2013). doi:10.1045/september2013-kern
Klink, S., Dengel, A., Kieninger, T.: Document structure analysis based on layout and textual features. In: Proceedings of International Workshop on Document Analysis Systems (2000)
Lin, X.: Header and footer extraction by page-association. Proc. SPIE 5010, 164–171 (2002). doi:10.1117/12.472833
Article
Google Scholar
Liu, Y., Bai, K., Mitra, P., Giles, C.L.: Improving the table boundary detection in PDFs by fixing the sequence error of the sparse lines. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 1006–1010 (2009). doi:10.1109/ICDAR.2009.138
Liu, Y., Mitra, P., Giles, C.L.: A fast preprocessing method for table boundary detection: narrowing down the sparse lines using solely coordinate information. In: 2008 The Eighth IAPR International Workshop on Document Analysis Systems, pp. 431–438. IEEE (2008). doi:10.1109/DAS.2008.77
Liu, Y., Mitra, P., Giles, C.L.: Identifying table boundaries in digital documents via sparse line detection. In: Proceeding of the 17th ACM conference on Information and knowledge mining CIKM 08, pp. 1311–1320. ACM Press (2008). doi:10.1145/1458082.1458255
Luong, M.T., Nguyen, T.D., Kan, M.Y.: Logical structure recovery in scholarly articles with rich document features. Int. J. Digital Libr. Syst. 1(4), 1–23 (2011). doi:10.4018/jdls.2010100101
Article
Google Scholar
Malerba, D., Ceci, M., Berardi, M.: Machine learning for reading order detection in document image understanding. In: Machine Learning in Document Analysis, pp. 45–69 (2008)
Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. Proc. SPIE 5010(1), 197–207 (2003). doi:10.1117/12.476326
Article
Google Scholar
Meunier, J.L.: Optimized XY-cut for determining a page reading order. In: Eighth International Conference on Document Analysis and Recognition ICDAR05 1, pp. 347–351 (2005). doi:10.1109/ICDAR.2005.182
Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25(7), 10–22 (1992). doi:10.1109/2.144436
Article
Google Scholar
Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: HLTNAACL04, vol. 2004, pp. 329–336 (2004). doi: 10.1.1.10.5644
Ramakrishnan, C., Patnia, A., Hovy, E., Burns, G.A.: Layout-aware text extraction from full-text PDF of scientific articles. Source Code Biol Med 7(1), 7 (2012). doi:10.1186/1751-0473-7-7
Article
Google Scholar
Summers, K.: Automatic discovery of logical document structure. Ph.D. thesis (1998)
Tkaczyk, D., Bolikowski, L., Czeczko, A., Rusek, K.: A modular metadata extraction system for born-digital articles. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 11–16 (2012). doi:10.1109/DAS.2012.4
Tkaczyk, D., Czeczko, A., Rusek, K.: GROTOAP: ground truth for open access publications. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 381–382 (2012)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Doc. Anal. Recogn. 7(1), 1–16 (2004). doi:10.1007/s10032-004-0120-9
Google Scholar
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989). doi:10.1137/0218082
Article
MATH
MathSciNet
Google Scholar