Implementation of an Algorithm to Classify Discourse Segments from Documents for Knowledge Acquisition

Madhusudanan, N.; Chakrabarti, Amaresh; Gurumoorthy, B.

doi:10.1007/978-81-322-2229-3_37

N. Madhusudanan⁴,
Amaresh Chakrabarti⁴ &
B. Gurumoorthy⁴

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 35))

1375 Accesses

Abstract

The overall objective of this paper is to acquire diagnostic knowledge about aircraft assembly in an automated manner, in order to minimize issues from occurring in new, similar situations. This research uses documents, prepared by experts, as a source of knowledge. The first step of the process of knowledge acquisition is segmentation of relevant sections of documents. From many methods that currently exist for such segmentation and classification, one method, namely ‘discourse analysis’ is chosen for analyzing documents (with future knowledge considerations in mind). Using discourse analysis, entities from sentences are extracted to identify what is being discussed in a chunk of text. These entities are then compared to a domain knowledge base, such as an ontology, to see how (semantically) close the discussion is to the domain of interest. A method for such segmentation had been previously proposed, and is summarised here. This paper describes the efforts for partial implementation of this method. Computer-based tools are used for this implementation, such as Natural Language Toolkit, Boxer, and Ontologies. The Natural Language Toolkit is used for performing text processing, such as tokenization; Boxer is used for Discourse Analysis; Ontologies are used as a knowledge base for domain related terminologies. The method calculates a semantic score for each sentence against the terms taken from related domain ontologies. If the sentence has terms matching those in the ontology, that sentence is classified as being related to the domain of aircraft assembly. The implementation is then applied on test documents to evaluate its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Segregating Discourse Segments from Engineering Documents for Knowledge Acquisition

Relying on Discourse Trees to Extract Medical Ontologies from Text

Exploring the Fuzzy Boundaries of Discourse Markers Through Manual and Automatic Annotation

References

Website of IHS Goldfire: http://www.ihs.com/products/design/software-methods/goldfire/index.aspx. Accessed 18 Sept 2014
Madhusudanan, N., Gurumoorthy, B., Charkrabarti, A.: Segregating discourse segments from engineering documents for knowledge acquisition. In: PLM 2014—The IFIP WG 5.1 11th International Conference on Product Lifecycle Management, Yokohoma, Japan, 7–9 July 2014
Google Scholar
Wijewickrema, C.M., Gamage, R.: An ontology based fully automatic document classification system using an existing semi-automatic system. IFLA WLIC 2013—Singapore—Future Libraries: Infinite Possibilities (2013)
Google Scholar
Nyberg, K.: Document Classification Using Machine Learning and Ontologies. M.Sc. thesis, Aalto University, School of Science, Degree Programme of Information Networks, Jan 2011
Google Scholar
Benno, S.: Topic identification: framework and application. In: Proceedings of I-KNOW ’04, Graz, Austria, June 30–July 2 2004
Google Scholar
Reynar, J.C.: Statistical models for topic segmentation. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics (1999)
Google Scholar
WordNet Documentation: http://wordnet.princeton.edu/wordnet/documentation/. Accessed on 18 September 2014
Bos, J.: Wide-coverage semantic analysis with boxer. In: Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computational Linguistics (2008)
Google Scholar
Blackburn, P., Bos, J.: Representation and inference for natural language. A first course in computational semantics, working with discourse representation structures, vol. II. University of Saarland. (Unpublished manuscript)[AGBtM] (1999)
Google Scholar
http://www.nltk.org/howto/wordnet.html. Accessed 26 June 2014
http://protege.stanford.edu/. Accessed 26 June 2014
Ast, M., Glas, M., Roehm, T., Luftfahrt eV.B.: Creating an Ontology for Aircraft Design. Deutsche Gesellschaft für Luft-und Raumfahrt-Lilienthal-Oberth eV, Germany (2014)
Google Scholar
BART—co-reference and anaphora toolkit: http://www.bart-coref.org/. Accessed 26 June 2014
Riveting—Wikipedia, a free encyclopedia: www.en.wikipedia.org/wiki/Rivet. Accessed 04 June 2014

Download references

Acknowledgments

The authors convey their acknowledgements to all the participants in the classification study.

Author information

Authors and Affiliations

Centre for Product Design and Manufacturing, Indian Institute of Science, Bangalore, Karnataka, India
N. Madhusudanan, Amaresh Chakrabarti & B. Gurumoorthy

Authors

N. Madhusudanan
View author publications
You can also search for this author in PubMed Google Scholar
Amaresh Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
B. Gurumoorthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Madhusudanan .

Editor information

Editors and Affiliations

Centre for Product Design and Manufacturing, Indian Institute of Science, Bangalore, India
Amaresh Chakrabarti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madhusudanan, N., Chakrabarti, A., Gurumoorthy, B. (2015). Implementation of an Algorithm to Classify Discourse Segments from Documents for Knowledge Acquisition. In: Chakrabarti, A. (eds) ICoRD’15 – Research into Design Across Boundaries Volume 2. Smart Innovation, Systems and Technologies, vol 35. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2229-3_37

Download citation

DOI: https://doi.org/10.1007/978-81-322-2229-3_37
Published: 24 December 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2228-6
Online ISBN: 978-81-322-2229-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Implementation of an Algorithm to Classify Discourse Segments from Documents for Knowledge Acquisition

Abstract

Access this chapter

Similar content being viewed by others

Segregating Discourse Segments from Engineering Documents for Knowledge Acquisition

Relying on Discourse Trees to Extract Medical Ontologies from Text

Exploring the Fuzzy Boundaries of Discourse Markers Through Manual and Automatic Annotation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Implementation of an Algorithm to Classify Discourse Segments from Documents for Knowledge Acquisition

Abstract

Access this chapter

Similar content being viewed by others

Segregating Discourse Segments from Engineering Documents for Knowledge Acquisition

Relying on Discourse Trees to Extract Medical Ontologies from Text

Exploring the Fuzzy Boundaries of Discourse Markers Through Manual and Automatic Annotation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation