Biomedical Article Classification Using an Agent-Based Model of T-Cell Cross-Regulation

  • Alaa Abi-Haidar
  • Luis M. Rocha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6209)


We propose a novel bio-inspired solution for biomedical article classification. Our method draws from an existing model of T-cell cross-regulation in the vertebrate immune system (IS), which is a complex adaptive system of millions of cells interacting to distinguish between harmless and harmful intruders. Analogously, automatic biomedical article classification assumes that the interaction and co-occurrence of thousands of words in text can be used to identify conceptually-related classes of articles—at a minimum, two classes with relevant and irrelevant articles for a given concept (e.g. articles with protein-protein interaction information). Our agent-based method for document classification expands the existing analytical model of Carneiro et al. [1], by allowing us to deal simultaneously with many distinct T-cell features (epitomes) and their collective dynamics using agent based modeling. We already extended this model to develop a bio-inspired spam-detection system [2, 3]. Here we develop our agent-base model further, and test it on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge [4]. We study several new parameter configurations leading to encouraging results comparable to state-of-the-art classifiers. These results help us understand both T-cell cross-regulation and its applicability to document classification in general. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general.


Artificial Immune System Bio-medical Document Classification T-cell Cross-Regulation Bio-inspired Computing Artificial Intelligence BioCreative 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carneiro, J., Leon, K., Caramalho, Í., van den Dool, C., Gardner, R., Oliveira, V., Bergman, M., Sepúlveda, N., Paixão, T., Faro, J., et al.: When three is not a crowd: a Crossregulation Model of the dynamics and repertoire selection of regulatory CD4 T cells. Immunological Reviews 216(1), 48–68 (2007)Google Scholar
  2. 2.
    Abi-Haidar, A., Rocha, L.: Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift. In: Bentley, P.J., Lee, D., Jung, S. (eds.) ICARIS 2008. LNCS, vol. 5132, p. 36. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Abi-Haidar, A., Rocha, L.: Adaptive spam detection inspired by the immune system. In: Bullock, S., Noble, J., Watson, R., Bedau, M.A. (eds.) Artificial Life XI: Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems, pp. 1–8. MIT Press, Cambridge (2008)Google Scholar
  4. 4.
    Krallinger, M., et al.: The BioCreative II. 5 challenge overview. In: Proc. the BioCreative II. 5 Workshop 2009 on Digital Annotations, pp. 7–9 (2009)Google Scholar
  5. 5.
    Myers, G.: Whole-genome DNA sequencing. Computing in Science & Engineering [see also IEEE Computational Science and Engineering] 1(3), 33–43 (1999)Google Scholar
  6. 6.
    Schena, M., Shalon, D., Davis, R., Brown, P., et al.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science (Washington) 270(5235), 467–470 (1995)CrossRefGoogle Scholar
  7. 7.
    Hunter, L., Cohen, K.: Biomedical Language Processing: What’s Beyond PubMed? Molecular Cell 21(5), 589–594 (2006)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119–129 (2006)CrossRefGoogle Scholar
  10. 10.
    Feldman, R., Sanger, J.: The Text Mining Handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, Cambridge (2006)CrossRefGoogle Scholar
  11. 11.
    Abi-Haidar, A., Kaur, J., Maguitman, A., Radivojac, P., Rechtsteiner, A., Verspoor, K., Wang, Z., Rocha, L.: Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. Genome Biology 9(2), S11 (2008)CrossRefGoogle Scholar
  12. 12.
    Krallinger, M., Valencia, A.: Evaluating the detection and ranking of protein interaction relevant articles: the BioCreative challenge interaction article sub-task (IAS). In: Proceedings of the Second Biocreative Challenge Evaluation Workshop (2007)Google Scholar
  13. 13.
    Kolchinsky, A., Abi-Haidar, A., Kaur, J., Hamed, A., Rocha, L.: Classication of protein-protein interaction documents using text and citation network features (in press)Google Scholar
  14. 14.
    Hofmeyr, S.: An Interpretative Introduction to the Immune System. In: Design Principles for the Immune System and Other Distributed Autonomous Systems (2001)Google Scholar
  15. 15.
    Timmis, J.: Artificial immune systems today and tomorrow. Natural Computing 6(1), 1–18 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Twycross, J., Cayzer, S.: An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK (2002)Google Scholar
  17. 17.
    Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam, CEAS (2006)Google Scholar
  18. 18.
    Joachims, T.: Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  19. 19.
    Abi-Haidar, A., Kaur, J., Maguitman, A., Radivojac, P., Retchsteiner, A., Verspoor, K., Wang, Z., Rocha, L.: Uncovering protein-protein interactions in the bibliome. In: Proceedings of the Second BioCreative Challenge Evaluation Workshop, pp. 247–255 (2007) ISBN 84-933255-6-2Google Scholar
  20. 20.
    Kolchinsky, A., Abi-Haidar, A., Kaur, J., Hamed, A., Rocha, L.: Classification of protein-protein interaction documents using text and citation network features. In: BioCreative II.5 Workshop 2009: Special Session on Digital Annotations, Madrid, Spain, October 7-9, p. 34 (2009)Google Scholar
  21. 21.
    de Sepulveda, N.H.S.: How is the t-cell repertoire shaped (2009)Google Scholar
  22. 22.
    Porter, M.: An algorithm for suffix stripping. In: Program 1966-2006: Celebrating 40 Years of ICT in Libraries, Museums and Archives (2006)Google Scholar
  23. 23.
    Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Alaa Abi-Haidar
    • 1
    • 2
  • Luis M. Rocha
    • 1
    • 2
  1. 1.School of Informatics and ComputingIndiana UniversityBloomingtonUSA
  2. 2.FLAD Computational Biology CollaboratoriumInstituto Gulbenkian de CiênciaOeirasPortugal

Personalised recommendations