Abstract
Information and knowledge retrieval has been recognized as a key issue in engineering design. A great deal of design-related information used and generated within engineering companies is formally recorded in documents. These documents become more useful if they are structured in a consistent way so that they can be retrieved and their contents accessed more effectively. Achieving useful structure in electronic documents relies on embedding some sort of mark-up or coding that is computer-understandable. Manual mark-up is time-consuming and costly. This paper proposes a knowledge engineering approach to automatic document mark-up employing XML (the eXtensible Mark-up Language) to ’tag’ explicitly the structural information. The focus here is on long and complex engineering documents. A three-level model is explored to achieve automatic semantic mark-up using a set of document decomposition schemes. The model includes a strategic level which identifies document typographical features based on such things as styles, inference or templates; a tactical level to define the rules to realize semantic mark-up according to the document features; and an operational level to perform the computational implementation of the mark-up rules. By making document structure explicit, information retrieval can be made more focused by returning not just whole documents but the document components that are most relevant or of most interest to the engineering designer, and information relevant to the designer’s need both with respect to document structure and content, not content alone. In addition, interpretation of useful structure by the human user can be hardwired into documents, which allows us to move closer to true semantic level retrieval.
Similar content being viewed by others
References
Lowe A, McMahon CA, Culley SJ (2004) Characterising the requirements of engineering information systems. Int J Inf Manage 24:401–422
Gardoni M, Frank C, Vernadat F (2005) Knowledge capitalisation based on textual and graphical semi-structured and non-structured information: case study in an industry research centre at EADS. Comput Ind 56:231–243
Liu S, McMahon CA, Culley SJ (2008) A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management. Comput Ind 59(1):3–16
Taghva K, Beckley R, Cooms J (2006) The effects of OCR on the extraction of private information. Document Analysis Systems VII. Proceedings Lect Notes Comput Sci 3872:348–357
Feldman R, Rosenfeld B, Fresko M (2006) TEG - a hybrid approach to information extraction. Knowl Inform Syst 9(1):1–18
Akhtar S, Reilly RG, Dunnion J (2003) Auto-tagging of text documents into XML. Text, Speech and Dialogue. Proceedings Lecture Notes in Artificial Intelligent 2807:20–26
Wild PJ, McMahon CA, Culley SJ, Darlington MJ, Liu S (2006) Towards a method for profiling engineering documentation. Proceeding of the 9th International Conference of Design, Dubrovnik, May 15–18th
Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2006) A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management. Adv Eng Informat 20:401–413
Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2007) EDCMS: a content management system for engineering documents. Int J Autom Comput 5(1):56–70
Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2006) An approach for document fragment retrieval and its formatting issues in engineering information management. Lect Notes Comput Sci 3981:279–287
IDEF0, Integrated DEFinition methods, http://www.idef.com/idef0.html
CambridgeDocs, http://www.cambridgedocs.com/index.htm
McMahon CA, Lowe A, Culley SJ, Corderoy M, Crossland R, Shan T and Stewart D (2004) Waypoint - an integrated search and retrieval system for engineering documents. J Comput Inform Sci Eng 4(4):329–338
Altova XMLSpy, http://www.altova.com/
Cui H (2005) MARTT: a general approach to automatic mark-up of taxonomic descriptions with XML. Communications of the AIS. Also available on http://cais-acsi.ca/proceedings/2005/cui_2005.pdf
Friedman C, Hripcsak G, Shagina L, Liu HF (1999) Representing information using natural language processing and XML. J Am Med Inform Assoc 6(1):76–87
Abolhassani M, Fuhr N, Govert N (2003) Information extraction and automatic mark-up for XML documents. Intelligent Search on XML Data, Lect Notes Comput Sci 2818:159–174
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, S., McMahon, C.A., Darlington, M.J. et al. An automatic mark-up approach for structured document retrieval in engineering design. Int J Adv Manuf Technol 38, 418–425 (2008). https://doi.org/10.1007/s00170-007-1342-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00170-007-1342-z