Skip to main content
Log in

An automatic mark-up approach for structured document retrieval in engineering design

  • ORIGINAL ARTICLE
  • Published:
The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Abstract

Information and knowledge retrieval has been recognized as a key issue in engineering design. A great deal of design-related information used and generated within engineering companies is formally recorded in documents. These documents become more useful if they are structured in a consistent way so that they can be retrieved and their contents accessed more effectively. Achieving useful structure in electronic documents relies on embedding some sort of mark-up or coding that is computer-understandable. Manual mark-up is time-consuming and costly. This paper proposes a knowledge engineering approach to automatic document mark-up employing XML (the eXtensible Mark-up Language) to ’tag’ explicitly the structural information. The focus here is on long and complex engineering documents. A three-level model is explored to achieve automatic semantic mark-up using a set of document decomposition schemes. The model includes a strategic level which identifies document typographical features based on such things as styles, inference or templates; a tactical level to define the rules to realize semantic mark-up according to the document features; and an operational level to perform the computational implementation of the mark-up rules. By making document structure explicit, information retrieval can be made more focused by returning not just whole documents but the document components that are most relevant or of most interest to the engineering designer, and information relevant to the designer’s need both with respect to document structure and content, not content alone. In addition, interpretation of useful structure by the human user can be hardwired into documents, which allows us to move closer to true semantic level retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lowe A, McMahon CA, Culley SJ (2004) Characterising the requirements of engineering information systems. Int J Inf Manage 24:401–422

    Google Scholar 

  2. Gardoni M, Frank C, Vernadat F (2005) Knowledge capitalisation based on textual and graphical semi-structured and non-structured information: case study in an industry research centre at EADS. Comput Ind 56:231–243

    Google Scholar 

  3. Liu S, McMahon CA, Culley SJ (2008) A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management. Comput Ind 59(1):3–16

    Article  Google Scholar 

  4. Taghva K, Beckley R, Cooms J (2006) The effects of OCR on the extraction of private information. Document Analysis Systems VII. Proceedings Lect Notes Comput Sci 3872:348–357

    Article  Google Scholar 

  5. Feldman R, Rosenfeld B, Fresko M (2006) TEG - a hybrid approach to information extraction. Knowl Inform Syst 9(1):1–18

    Article  Google Scholar 

  6. Akhtar S, Reilly RG, Dunnion J (2003) Auto-tagging of text documents into XML. Text, Speech and Dialogue. Proceedings Lecture Notes in Artificial Intelligent 2807:20–26

    Google Scholar 

  7. Wild PJ, McMahon CA, Culley SJ, Darlington MJ, Liu S (2006) Towards a method for profiling engineering documentation. Proceeding of the 9th International Conference of Design, Dubrovnik, May 15–18th

  8. Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2006) A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management. Adv Eng Informat 20:401–413

    Article  Google Scholar 

  9. Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2007) EDCMS: a content management system for engineering documents. Int J Autom Comput 5(1):56–70

    Article  Google Scholar 

  10. Liu S, McMahon CA, Darlington MJ, Culley SJ, Wild PJ (2006) An approach for document fragment retrieval and its formatting issues in engineering information management. Lect Notes Comput Sci 3981:279–287

    Article  Google Scholar 

  11. IDEF0, Integrated DEFinition methods, http://www.idef.com/idef0.html

  12. CambridgeDocs, http://www.cambridgedocs.com/index.htm

  13. McMahon CA, Lowe A, Culley SJ, Corderoy M, Crossland R, Shan T and Stewart D (2004) Waypoint - an integrated search and retrieval system for engineering documents. J Comput Inform Sci Eng 4(4):329–338

    Article  Google Scholar 

  14. Altova XMLSpy, http://www.altova.com/

  15. Cui H (2005) MARTT: a general approach to automatic mark-up of taxonomic descriptions with XML. Communications of the AIS. Also available on http://cais-acsi.ca/proceedings/2005/cui_2005.pdf

  16. Friedman C, Hripcsak G, Shagina L, Liu HF (1999) Representing information using natural language processing and XML. J Am Med Inform Assoc 6(1):76–87

    Google Scholar 

  17. Abolhassani M, Fuhr N, Govert N (2003) Information extraction and automatic mark-up for XML documents. Intelligent Search on XML Data, Lect Notes Comput Sci 2818:159–174

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, S., McMahon, C.A., Darlington, M.J. et al. An automatic mark-up approach for structured document retrieval in engineering design. Int J Adv Manuf Technol 38, 418–425 (2008). https://doi.org/10.1007/s00170-007-1342-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00170-007-1342-z

Keywords

Navigation