Skip to main content
Log in

A data- and ontology-driven text mining-based construction of reliability model to analyze and predict component failures

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A real-life reliability system is proposed by fusing the field warranty failure data with the failure modes extracted from unstructured repair verbatim data by using the ontology-based natural language processing technique to facilitate accurate estimation of component reliability. Traditionally, the reliability estimation process uses the warranty data, but it provides limited support to handle the “failure confounding” problem, whereby different failure modes associated with a component failure are confounded into a single failure mode. The resulting reliability estimation lacks the required level of precision. Because our model takes into account textual failure modes associated with component failures, it enhances the overall reliability estimation. The performance of our system is evaluated with the baseline system for predicting absolute errors by using the real-life data from the automotive domain, e.g., headlamp failure, collected at different miles exposures. In the best case, the absolute errors predicted by our model showed an improvement of 97 % with respect to the baseline model (without considering the failure modes), while in worst case, it was 71 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The experimental results in Jung and Bai [16] show that when age and usage variables are strongly correlated, the performance of univariate and bivariate approaches is comparable. However, in cases of weak correlations, the bivariate approach performs magnitude better than the univariate approach.

  2. Due to the data non-disclosure agreement to the third party, we have given the dummy values of the data collected in the field.

  3. In our domain, the data can be right or left skewed, and to accommodate such a varying nature of data, the Weibull model was used. The Weibull model provided us with the flexibility to model such hazard functions as decreasing, increasing, or constant and to describe different phrases of component’s lifetime.

  4. RDFS is a World Wide Web Consortium (W3C) standard for the specification of meta-data model.

  5. https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

  6. PMI-TR proposed by Turney [41] can be defined as follows: \(Sim(t,t_j )=log_2 (1+\frac{(tANDt_j )^{2}}{hits\left( t \right) .hits(t_j )})\), where \(hits\left( t \right) \) and \(hits(t_j )\) are the number of times \(\left( t \right) \) and \((t_j )\) are observed in the corpus, and hit() is the number of documents where both the tuples are present. The distance proximity between the positions of two words is exploited to construct the tuples

  7. The word widow size is a free parameter, which can be defined in many ways [27]. In our work, we experimented with different word window sizes, such as five, seven, and ten. The large word window sizes, i.e., seven and ten words, yielded noisy results; hence, we settled for the word window of five words while constructing the tuples.

References

  1. Beckett D (eds) (2004) RDF/XML Syntax Specification (Revised), W3C Recommendation, 2004. http://www.w3.org/TR/rdf-syntax-grammar/

  2. Benedittini O, Baines TS, Lightfoot HW, Greenough RM (2009) State-of-the-art in integrated vehicle health management. J Aerosp Eng 223(2):157–170

    Google Scholar 

  3. Boufaden N (2003) An ontology-based semantic tagger for IE system. In: Proceedings of the 41st annual meeting on association for computational linguistics, Stroudsburg, PA, USA, pp 7–14

  4. Brill E (1995) Unsupervised learning of disambiguation rules for part of speech tagging. In: Yarowsky D, Church K (eds) Natural language processing using very large corpora. Kluwer Academic Press, Cambridge, Massachusetts, pp 1–13

  5. Caers JFJM, Zhao XJ, Mooren J, Stulens L, Eggink E (2010) Design for reliability—a reliability engineering framework. In: Proceedings of the 11th international conference on electronic packaging technology & high density packaging, pp 1108–1113

  6. Church K (1988) A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the second conference on applied natural language processing, Stroudsburg, PA, USA, pp 136–143

  7. Charniak E, Hendrickson C, Jacobson N, Perkowitz M (1993) Equations for part of speech tagging. In: Proceedings of the conference of the American Association for Artificial Intelligence, Menlo Park,pp 784–789

  8. Cimiano P, Hotho A, Staab S (2005) Learning concept hierarchies from text corpora using formal concept analysis. J Art Intel Res 24:305–339

    MATH  Google Scholar 

  9. Coit DW, Dey KA (1999) Analysis of grouped data from field-failure reporting systems. Reliab Eng Syst Saf 65(2):95–101

    Article  Google Scholar 

  10. DeRose S (1988) Grammatical category disambiguation by statistical optimization. Comput Linguist 14(1):31–39

    Google Scholar 

  11. Deroualt A, Merialdo B (1986) Natural language modeling for phoneme-to-text transcription. IEEE Trans Pattern Anal Mach Intel 8(6):742–749

    Article  Google Scholar 

  12. Emmanuel R, Schabes Y (1995) Deterministic POS tagging with finite-state transducers. Comput Linguist 21(2):227–253

    Google Scholar 

  13. Greene BB, Rubin GM (1971) Automatic grammatical tagging of English. Department of Linguistics, Brown University, Providence, Rhode Island, Technical report

  14. Hindle D (1989) Acquiring disambiguation rules from text. In: Proceedings of the 27th annual meeting of the association for computational linguistics. Vancouver, British Columbia, pp 118–125

  15. Hunter JJ (1974) Renewal theory in two dimensions: basic results. Adv Appl Probab 6:376–391

    Article  MATH  Google Scholar 

  16. Jung M, Bai DS (2007) Analysis of field data under two-dimensional warranty. Reliab Eng Syst Saf 92(2):135–143

    Article  Google Scholar 

  17. Kalbfleisch JD, Lawless JF (1988) Estimation of reliability in field-performance studies. Technometrics 30:365–388

    MATH  MathSciNet  Google Scholar 

  18. Kalbfleish JD, Lawless JF, Robinson JA (1991) Methods for the analysis and prediction of warranty claims. Technometrics 33:273–285

    Article  Google Scholar 

  19. Karim MR, Suzuki K (2005) Analysis of warranty claim data: a literature review. Int J Qual Reliab Manag 22(7):667–686

    Article  Google Scholar 

  20. Klein S, Simmons R (1963) A computational approach to grammatical coding of English words. J ACM 10(3):334–347

    Article  MATH  Google Scholar 

  21. Ken S (1998) Lazy transformation-based learning. In: Proceedings of the 11th international Florida artificial intelligence research symposium conference, Sanibel Island, Florida, USA, pp 235–239

  22. Kilgarriff A, Rychly P, Smrz P, Tugwell D (2004) The sketch engine. In: Proceedings of Euralex, Lorient, France, pp 105–116

  23. Kleyner A, Sandborn P (2008) Minimizing life cycle cost by managing product dependability via validation plan and warranty return cost. Int J Prod Econ 112(2):796–807

    Article  Google Scholar 

  24. Lawless JF (1983) Statistical methods in reliability. Technometrics 25:305–335

    Article  MATH  MathSciNet  Google Scholar 

  25. Lawless JF, Hu J, Cao J (1995) Methods for the estimation of failure distributions and rates from automobile warranty data. Lifetime Data Anal 1:227–240

    Article  MATH  Google Scholar 

  26. Lawless JF (1998) Statistical analysis of product warranty data. Int Stat Rev 66:41–60

    Article  MATH  Google Scholar 

  27. Lund K, Burgess C (1996) Producing high-dimensional semantic spaces using lexical co-occurrence. Behav Res Methods 28(2):203–208

    Article  Google Scholar 

  28. Majeske KD, Caris TL, Herrin G (1997) Evaluating product and process design changes with warranty data. Int J Prod Econ 50:79–89

    Article  Google Scholar 

  29. Majeske KD (2003) A mixture model for automobile warranty data. Reliab Eng and Sys Saf 81:71–77

    Article  Google Scholar 

  30. Meteer M, Schwartz R, Weischedel R (1991) POST: using probabilities in language processing. In: Proceedings of the twelfth international conference on artificial intelligence, pp 960–965

  31. Mukheerje S, Chakraborty A (2007) Automated fault tree generation: bridging reliability with text mining. In: Proceedings of reliability and maintainability symposium, Orlando FL, pp 83–88

  32. Murthy DNP, Blischke WR (1992) Product warranty management—III: a review of mathematical models. Eur J Oper Res 62:1–34

    Article  Google Scholar 

  33. Murthy DNP, Iskandar BP, Wilson RJ (1995) Two-dimensional failure-free warranty policies: two-dimensional point process models. Oper Res 43(2):356–366

    Article  MATH  Google Scholar 

  34. Ngai G, Radu F (2001) Transformation-based learning in the fast lane.In; Proceedings of the second conference of the North American chapter of the association for computational linguistics, Pittsburgh, PA, pp 1–8

  35. Oh YS, Bai DS (2001) Field data analyses with additional after warranty field-data. Reliab Eng Syst Saf 72(1):1–8

    Article  Google Scholar 

  36. Radu F, Henderson J C, Ngai G (2000) Coaxing confidences from an old friend: probabilistic classifications from transformation rule lists. In: Proceedings of joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pp 26–34

  37. Rajpathak D, Chougule R (2011) A generic ontology development framework for data integration and decision support in a distributed environment. Int J Comput Integr Manuf 24(2):154–170

    Article  Google Scholar 

  38. Rajpathak D, Chougule R, Bandyopadhyay P (2011) A domain specific decision support system for knowledge discovery using association and text mining. Int J Knowl Inf Syst 31(3):405–432

    Article  Google Scholar 

  39. Rajpathak D (2013) An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain. Int J Comput Ind 64(5):565–580

    Article  Google Scholar 

  40. Singpurwalla ND, Wilson SP (1994) Software reliability modeling. Int Stat Rev 62:289–317

    Article  MATH  Google Scholar 

  41. Turney P (2001) Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of the twelfth European conference on machine learning, Freiburg, Germany, pp 491–502

  42. Wasserman GS (1992) An application of dynamic linear models for predicting warranty claims. Comput Ind Eng 22(1):37–47

    Article  MathSciNet  Google Scholar 

  43. Wessel F (2002) Word posterior probabilities for large vocabulary continuous speech recognition. Ph.D. thesis, RWTH Aachen University. Aachen, Germany

Download references

Acknowledgments

The authors would like to thank reviewers and GM’s internal paper review committee for providing valuable comments on the earlier drafts of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dnyanesh Rajpathak.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajpathak, D., De, S. A data- and ontology-driven text mining-based construction of reliability model to analyze and predict component failures. Knowl Inf Syst 46, 87–113 (2016). https://doi.org/10.1007/s10115-014-0806-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0806-3

Keywords

Navigation