Skip to main content

Discovering Potential Clinical Profiles of Multiple Sclerosis from Clinical and Pathological Free Text Data with Constrained Non-negative Matrix Factorization

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9597))

Included in the following conference series:

Abstract

Constrained non-negative matrix factorization (CNMF) is an effective machine learning technique to cluster documents in the presence of class label constraints. In this work, we provide a novel application of this technique in research on neuro-degenerative diseases. Specifically, we consider a dataset of documents from the Netherlands Brain Bank containing free text describing clinical and pathological information about donors affected by Multiple Sclerosis. The goal is to use CNMF for identifying clinical profiles with pathological information as constraints. After pre-processing the documents by means of standard filtering techniques, a feature representation of the documents in terms of bi-grams is constructed. The high dimensional feature space is reduced by applying a trimming procedure. The resulting datasets of clinical and pathological bi-grams are then clustered using non-negative matrix factorization (NMF) and, next, clinical data are clustered using CNMF with constraints induced by the clustering of pathological data. Results indicate the presence of interesting clinical profiles, for instance related to vision or movement problems. In particular, the use of CNMF leads to the identification of a clinical profile related to diabetes mellitus. Pathological characteristics and duration of disease of the identified profiles are analysed. Although highly promising, results of this investigation should be interpreted with care due to the relatively small size of the considered datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See http://www.multiplesclerosis.com/us/treatment.php.

References

  1. Urbach, D., Moore, J.H.: Data mining and the evolution of biological complexity. BioData Min. 4 (2011)

    Google Scholar 

  2. Davis, D., Chawla, N.V.: Exploring and exploiting disease interactions from multi-relational gene and phenotype networks. PloS ONE 6(7), e22670 (2011)

    Article  Google Scholar 

  3. Bell, J.E., et al.: Management of a twenty-first century brain bank: experience in the BrainNet Europe consortium. Acta Neuropathol. 115(5), 497–507 (2008)

    Article  Google Scholar 

  4. Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5–6), 544–557 (2009)

    Article  Google Scholar 

  5. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  6. Wu, H., Liu, Z.: Non-negative matrix factorization with constraints. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp. 506–511 (2010)

    Google Scholar 

  7. Roberts, K., Harabagiu, S.M.: A flexible framework for deriving assertions from electronic medical records. J. Am. Med. Inform. Assoc. 18(5), 568–573 (2011)

    Article  Google Scholar 

  8. Roque, F.S., et al.: Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput. Biol. 7(8), E1002141 (2011)

    Article  Google Scholar 

  9. Hripcsak, G., et al.: Mining complex clinical data for patient safety research: a framework for event discovery. J. Biomed. Inform. 36(1), 120–130 (2003)

    Article  Google Scholar 

  10. Melton, G.B., Hripcsak, G.: Automated detection of adverse events using natural language processing of discharge summaries. J. Am. Med. Inform. Assoc. 12, 448–457 (2005)

    Article  Google Scholar 

  11. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 267–273. ACM (2003)

    Google Scholar 

  12. Huang, X., Zheng, X., Yuan, W., Zhu, S.: Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inf. Sci. 181, 2293–2302 (2012)

    Article  Google Scholar 

  13. Ling, Y., Pan, X., Li, G., Hu, X.: Clinical documents clustering based on medication/symptom names using multi-view nonnegative matrix factorization. IEEE Trans. Nanobiosci. 14(5), 500–504 (2015)

    Article  Google Scholar 

  14. Luo, Y., et al.: Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text. J. Am. Med. Inform. Assoc. 22(5), 1009–1019 (2015)

    Article  Google Scholar 

  15. Bö, L., Geurts, J.J.G., Mörk, S.J., Van der Valk, P.: Grey matter pathology in multiple sclerosis. Acta Neurol. Scand. 113, 48–50 (2006)

    Article  Google Scholar 

  16. Van der Valk, P., De Groot, C.J.A.: Staging of multiple sclerosis (MS) lesions: pathology of the time frame of MS. Neuropathol. Appl. Neurobiol. 26, 2–10 (2000)

    Article  Google Scholar 

  17. Feldman, R., Fresko, M., Kinar, Y., Lindell, Y., Liphstat, O., Rajman, M., Schler, Y., Zamir, O.: Text mining at the term level. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 65–73. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Vavasis, S.A.: On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20(3), 1364–1377 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562. MIT Press (2000)

    Google Scholar 

  20. Meilă, M., Heckerman, D.: An experimental comparison of model-based clustering methods. Mach. Learn. 42(1–2), 9–29 (2001)

    Article  MATH  Google Scholar 

  21. Tettey, P., Simpson, S., Taylor, B.V., van der Mei, I.A.F.: The co-occurrence of multiple sclerosis and type 1 diabetes: shared aetiologic features and clinical implication for MS aetiology. J. Neurol. Sci. 348(1), 126–131 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially funded by the Netherlands Organization for Scientific Research (NWO) within the NWO project 612.001.119.

Author information

Authors and Affiliations

Authors

Consortia

Corresponding author

Correspondence to Elena Marchiori .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Acquarelli, J., The Netherlands Brain Bank., Bianchini, M., Marchiori, E. (2016). Discovering Potential Clinical Profiles of Multiple Sclerosis from Clinical and Pathological Free Text Data with Constrained Non-negative Matrix Factorization. In: Squillero, G., Burelli, P. (eds) Applications of Evolutionary Computation. EvoApplications 2016. Lecture Notes in Computer Science(), vol 9597. Springer, Cham. https://doi.org/10.1007/978-3-319-31204-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31204-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31203-3

  • Online ISBN: 978-3-319-31204-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics