Skip to main content
Log in

Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

We present and study an agent-based model of T-Cell cross-regulation in the adaptive immune system, which we apply to binary classification. Our method expands an existing analytical model of T-cell cross-regulation (Carneiro et al. in Immunol Rev 216(1):48–68, 2007) that was used to study the self-organizing dynamics of a single population of T-Cells in interaction with an idealized antigen presenting cell capable of presenting a single antigen. With agent-based modeling we are able to study the self-organizing dynamics of multiple populations of distinct T-cells which interact via antigen presenting cells that present hundreds of distinct antigens. Moreover, we show that such self-organizing dynamics can be guided to produce an effective binary classification of antigens, which is competitive with existing machine learning methods when applied to biomedical text classification. More specifically, here we test our model on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge (Krallinger in The biocreative ii. 5 challenge overview, p 19, 2009). We study the robustness of our model’s parameter configurations, and show that it leads to encouraging results comparable to state-of-the-art classifiers. Our results help us understand both T-cell cross-regulation as a general principle of guided self-organization, as well as its applicability to document classification. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. We use the terminology of self/nonself discrimination, though perhaps a more accurate description is classification of harmless vs. harmful substances; harmless can also include antigens from bacteria that are necessary for vertebrate bodies, and harmful can also include body’s own tumor cells.

  2. A good, though already a bit dated, overview of the vertebrate immune system for the artificial life community is Hofmeyer’s [12].

  3. The simplification of proliferation to mere duplication adopted in the canonical CRM model is maintained in our agent-based model to minimize the number of parameters (excluding proliferation rates) and the parameter search space

  4. Every E f or R f has equal probability of binding to the APC that presents feature f

  5. The list of common (stop) words includes 33 of the most common English words from which we manually excluded the word “with”, as we know it to be of importance to PPI

  6. For feature extraction we used both the training data of Biocreative 2.5 and Biocreative 2 as described in [11]; all classifiers used the exact same feature set.

  7. TF.IDF is a common text weighting measure to evaluate the importance of a feature/word in a document in a corpus. TF stands for term frequency in a document and IDF for inverse document frequency in the corpus.

  8. Notice that this parameter search on the provided labeled training data uses only the information available to the teams participating in Biocreative 2.5 challenge, and none of the test data whose labels were revealed post-challenge.

  9. \(\hbox{F-score} ={\frac{\hbox{2.Precision.Recall}}{\hbox{Precision} + \hbox{Recall}}}\) where \(\hbox{Precision} = {\frac{\hbox{TP}}{\hbox{TP} + \hbox{FP}}}\) and \(\hbox{Recall} ={\frac{\hbox{TP}}{\hbox{TP} + \hbox{FN}}}\). True Positives (TP) and False Positives (FP) are the classifier’s correct and incorrect predictions for relevant documents, while True Negatives (TN) and False Negatives (FN) are the correct and incorrect predictions for irrelevant documents.

References

  1. Carneiro J, Leon K, Caramalho I, van den Dool C, Gardner R, Oliveira V, Bergman ML, Sepúlveda N, Paixão T, Faro J, Demengeot J (2007) When three is not a crowd: a crossregulation model of the dynamics and repertoire selection of regulatory cd4 t cells. Immunol Rev 216(1):48–68

    Google Scholar 

  2. Krallinger M (2009) The biocreative ii. 5 challenge overview, p 19

  3. Hunter L, Cohen KB (2006) Biomedical language processing: what’s beyond pubmed?. Mol Cell 21(5):589–594

    Article  Google Scholar 

  4. Jensen L, Saric J, Bork P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7(2):119–129. doi:10.1038/nrg1768

    Article  Google Scholar 

  5. Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–856

    Article  Google Scholar 

  6. Hersh W, Bhupatiraju RT, Corley S (2004) Enhancing access to the bibliome: the trec genomics track. Medinfo 11(Pt 2):773–777

    Google Scholar 

  7. Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinform 6(Suppl 1):S1

    Article  Google Scholar 

  8. Krallinger M, Valencia A (2007) Evaluating the detection and ranking of protein interaction relevant articles: the biocreative challenge interaction article sub-task (ias). In: Proceedings of the 2nd biocreative challenge evaluation workshop

  9. Feldman R, Sanger J (2006) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, Cambridge

    Book  Google Scholar 

  10. Abi-Haidar A, Kaur J, Maguitman A, Radivojac P, Retchsteiner A, Verspoor K, Wang Z, Rocha LM (2008) Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. p 9(Suppl 2):S11

  11. Kolchinsky A, Abi-Haidar A, Kaur J, Hamed AA, Rocha LM (2010) Classification of protein-protein interaction full-text documents using text and citation network features. IEEE/ACM Trans Comput Biol Bioinform/IEEE, ACM 7(3):400–411. doi:10.1109/TCBB.2010.55. URL http://www.computer.org/portal/web/csdl/doi/10.1109/TCBB.2010.55

  12. Hofmeyr SA (2001) An interpretative introduction to the immune system. Design principles for the immune system and other distributed autonomous systems

  13. Segel LA, Cohen I (2001) Design principles for the immune system and other distributed autonomous systems. Oxford University Press, Oxford

    Google Scholar 

  14. Mitchell M (2006) Complex systems: network thinking. Artif Intell 170(18):1194–1212

    Article  Google Scholar 

  15. Peak D, West JD, Messinger SM, Mott KA (2004) Evidence for complex, collective dynamics and distributed emergent computation in plants. PNAS 101(4):918–922

    Article  Google Scholar 

  16. Helikar T, Konvalina J, Heidel J, Rogers JA (2008) Emergent decision-making in biological signal transduction networks. Proc Natl Acad Sci USA 105(6):1913–1918. doi:10.1073/pnas.0705088105

    Article  Google Scholar 

  17. Walters M, Sperandio V (2006) Quorum sensing in escherichia coli and salmonella. Int J Med Microbiol 296(2–3):125–131. doi:10.1016/j.ijmm.2006.01.041

    Article  Google Scholar 

  18. Pratt SC (2005) Quorum sensing by encounter rates in the ant temnothorax albipennis. Behav Ecol 16(2):488–496. doi:10.1093/beheco/ari0210.1093/beheco/ari020

    Article  Google Scholar 

  19. Crutchfield J, Mitchell M (1995) The evolution of emergent computation. PNAS 92(23)

  20. Rocha LM, Hordijk W (2005) Material representations: from the genetic code to the evolution of cellular automata. Artif Life 11(1–2):189–214

    Article  Google Scholar 

  21. Shalizi C, Haslinger R, Rouquier J-B, Klinkner K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73

  22. Timmis J (2007) Artificial immune systems today and tomorrow. Nat Comput 6(1):1–18

    Article  MathSciNet  MATH  Google Scholar 

  23. Twycross J, Cayzer S (2002) An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK

  24. Dasgupta D, Nino F (2008) Immunological computation: theory and applications. AUERBACH

  25. Garrett SM (2003) A paratope is not an epitope: implications for immune networks and clonal selection. pp 217–228

  26. Abi-Haidar A, Rocha LM (2008) Artificial immune systems (Proc. ICARIS), pp 36–47

  27. Abi-Haidar A, Rocha LM (2008) Artificial life XI: 11th international conference on the simulation and synthesis of living systems. MIT Press, Cambridge, pp 1–9

    Google Scholar 

  28. Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):200415

    Google Scholar 

  29. Paul WE, Technologies IO (1993) Fundamental immunology. Raven Press, New York

    Google Scholar 

  30. Burnet SFM (1959) The clonal selection theory of acquired immunity. Vanderbilt University Press, Nashville

    Google Scholar 

  31. De Castro LN, Timmis J (2002) Artificial immune systems: a new computational intelligence approach. Springer, Berlin

    MATH  Google Scholar 

  32. Sepulveda NH (2009) How is the t-cell repertoire shaped. Ph.D. thesis, Instituto Gulbenkian de Ciencia

  33. Abi-Haidar A, Rocha LM (2010) ICARIS 2010: Proceedings of the 9th international conference on artificial immune systems. In: pp 237–249

  34. Abi-Haidar A, Rocha LM (2010) Artificial life XII: twelfth international conference on the simulation and synthesis of living systems. In: pp 706–713

  35. Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam (CEAS)

  36. Joachims T (2002) Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer, Dordrecht

    Google Scholar 

  37. Porter MF (1980) An algorithm for suffix stripping. Program 13(3):130–137

    Google Scholar 

  38. Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation, pp 1015–1021

Download references

Acknowledgments

This work was partially supported by a grant from the FLAD Computational Biology Collaboratorium at the Instituto Gulbenkian de Ciencia in Portugal. We also thank the ICARIS2010 committee board for encouraging this work. We acknowledge the computational resources provided by Indiana University used to conduct the simulations we report.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alaa Abi-Haidar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abi-Haidar, A., Rocha, L.M. Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics. Evol. Intel. 4, 69–80 (2011). https://doi.org/10.1007/s12065-011-0052-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-011-0052-5

Keywords

Navigation