Text Classification Techniques in Oil Industry Applications

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 239)


The development of automatic methods to produce usable structured information from unstructured text sources is extremely valuable to the oil and gas industry. A structured resource would allow researches and industry professionals to write relatively simple queries to retrieve all the information regards transcriptions of any accident. Instead of the thousands of abstracts provided by querying the unstructured corpus, the queries on structured corpus would result in a few hundred well-formed results.

On this paper we propose and evaluate information extraction techniques in occupational health control process, particularly, for the case of automatic detection of accidents from unstructured texts. Our proposal divides the problem in subtasks such as text analysis, recognition and classification of failed occupational health control, resolving accidents.


text classification ontology oil and gas industry 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)Google Scholar
  2. 2.
    Vapnik, V.: The nature of statistical learning theory. Springer (1995)Google Scholar
  3. 3.
    Deerwester, S., Dumais, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  4. 4.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bloehdorn, S., Hotho, A.: Text Classification by Boosting Weak Learners based on Terms and Concepts. In: 4th IEEE International Conference on Data Mining, ICDM 2004 (2004)Google Scholar
  6. 6.
    Nagarajan, M., Sheth, A.P., Aguilera, M., Keeton, K., Merchant, A., Uysal, M.: Altering Document Term Vectors for Classification - Ontologies as Expectations of Co-occurrence. LSDIS Technical Report (November 2006)Google Scholar
  7. 7.
    Fang, J., Guo, L., Wang, X., Yang, N.: Ontology-Based Automatic Classification and Ranking for Web Documents. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), pp. 627–631 (2007)Google Scholar
  8. 8.
    Camous, F., Blott, S., Smeaton, A.F.: Ontology-based MEDLINE document classification. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 439–452. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Gabrilovich, E., Markovitch, S.: Overcomingthe Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: 21st National Conference on Artificial Intelligence, Boston, MA, USA (2006)Google Scholar
  10. 10.
    Wu, S.-H., Tsai, T.-H., Hsu, W.-L.: Text categorization using automatically acquired domain ontology. In: 6th International Workshop on Information Retrieval with Asian Languages, Sapporo, Japan, vol. 11 (2003)Google Scholar
  11. 11.
    Sheth, A.P., Bertram, C., Avant, D., Hammond, B., Kochut, K.J., Warke, Y.: Semantic Content Management for Enterprises and the Web. IEEE Internet Computing (July/August 2002)Google Scholar
  12. 12.
    Hammond, B., Sheth, A.P., Kochut, K.J.: Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content. In: Real World Semantic Web Applications. IOS Press (2002)Google Scholar
  13. 13.
    Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)CrossRefGoogle Scholar
  14. 14.
    Sheth, A.P., Arpinar, I.B., Kashyap, V.: Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships. In: Nikravesh, M., Azvin, B., Yager, R., Zadeh, L. (eds.) Enhancing the Power of the Internet. Stud Fuzz. Springer (2003)Google Scholar
  15. 15.
    Gospodnetic, O., Hatcher, E., McCandless, M.: Lucene in Action, 2nd edn. Manning Publications (2009) ISBN 1-9339-8817-7Google Scholar
  16. 16.
    DicSin: Dicionário de Sinônimos Português Brasil. Apache (2013),

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.ADDLabs, Fluminense Federal UniversityNiteróiBrazil
  2. 2.Dept. of Electrical EngineeringPontifícia Universidade Católica do Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations