Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

Abi-Haidar, Alaa; Rocha, Luis M.

doi:10.1007/s12065-011-0052-5

Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

Special Issue
Published: 05 February 2011

Volume 4, pages 69–80, (2011)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Alaa Abi-Haidar^1,2 &
Luis M. Rocha^1,2

152 Accesses
4 Citations
Explore all metrics

Abstract

We present and study an agent-based model of T-Cell cross-regulation in the adaptive immune system, which we apply to binary classification. Our method expands an existing analytical model of T-cell cross-regulation (Carneiro et al. in Immunol Rev 216(1):48–68, 2007) that was used to study the self-organizing dynamics of a single population of T-Cells in interaction with an idealized antigen presenting cell capable of presenting a single antigen. With agent-based modeling we are able to study the self-organizing dynamics of multiple populations of distinct T-cells which interact via antigen presenting cells that present hundreds of distinct antigens. Moreover, we show that such self-organizing dynamics can be guided to produce an effective binary classification of antigens, which is competitive with existing machine learning methods when applied to biomedical text classification. More specifically, here we test our model on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge (Krallinger in The biocreative ii. 5 challenge overview, p 19, 2009). We study the robustness of our model’s parameter configurations, and show that it leads to encouraging results comparable to state-of-the-art classifiers. Our results help us understand both T-cell cross-regulation as a general principle of guided self-organization, as well as its applicability to document classification. Therefore, we show that our bio-inspired algorithm is a promising novel method for biomedical article classification and for binary document classification in general.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Affinity Groups: A Linguistic Analysis for Social Network Groups Identification

Twitter User Clustering Based on Their Preferences and the Louvain Algorithm

Mining Interesting Topics in Twitter Communities

Notes

We use the terminology of self/nonself discrimination, though perhaps a more accurate description is classification of harmless vs. harmful substances; harmless can also include antigens from bacteria that are necessary for vertebrate bodies, and harmful can also include body’s own tumor cells.
A good, though already a bit dated, overview of the vertebrate immune system for the artificial life community is Hofmeyer’s [12].
The simplification of proliferation to mere duplication adopted in the canonical CRM model is maintained in our agent-based model to minimize the number of parameters (excluding proliferation rates) and the parameter search space
Every E _f or R _f has equal probability of binding to the APC that presents feature f
The list of common (stop) words includes 33 of the most common English words from which we manually excluded the word “with”, as we know it to be of importance to PPI
For feature extraction we used both the training data of Biocreative 2.5 and Biocreative 2 as described in [11]; all classifiers used the exact same feature set.
TF.IDF is a common text weighting measure to evaluate the importance of a feature/word in a document in a corpus. TF stands for term frequency in a document and IDF for inverse document frequency in the corpus.
Notice that this parameter search on the provided labeled training data uses only the information available to the teams participating in Biocreative 2.5 challenge, and none of the test data whose labels were revealed post-challenge.
\(\hbox{F-score} ={\frac{\hbox{2.Precision.Recall}}{\hbox{Precision} + \hbox{Recall}}}\) where \(\hbox{Precision} = {\frac{\hbox{TP}}{\hbox{TP} + \hbox{FP}}}\) and \(\hbox{Recall} ={\frac{\hbox{TP}}{\hbox{TP} + \hbox{FN}}}\). True Positives (TP) and False Positives (FP) are the classifier’s correct and incorrect predictions for relevant documents, while True Negatives (TN) and False Negatives (FN) are the correct and incorrect predictions for irrelevant documents.

References

Carneiro J, Leon K, Caramalho I, van den Dool C, Gardner R, Oliveira V, Bergman ML, Sepúlveda N, Paixão T, Faro J, Demengeot J (2007) When three is not a crowd: a crossregulation model of the dynamics and repertoire selection of regulatory cd4 t cells. Immunol Rev 216(1):48–68
Google Scholar
Krallinger M (2009) The biocreative ii. 5 challenge overview, p 19
Hunter L, Cohen KB (2006) Biomedical language processing: what’s beyond pubmed?. Mol Cell 21(5):589–594
Article Google Scholar
Jensen L, Saric J, Bork P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7(2):119–129. doi:10.1038/nrg1768
Article Google Scholar
Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–856
Article Google Scholar
Hersh W, Bhupatiraju RT, Corley S (2004) Enhancing access to the bibliome: the trec genomics track. Medinfo 11(Pt 2):773–777
Google Scholar
Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinform 6(Suppl 1):S1
Article Google Scholar
Krallinger M, Valencia A (2007) Evaluating the detection and ranking of protein interaction relevant articles: the biocreative challenge interaction article sub-task (ias). In: Proceedings of the 2nd biocreative challenge evaluation workshop
Feldman R, Sanger J (2006) The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press, Cambridge
Book Google Scholar
Abi-Haidar A, Kaur J, Maguitman A, Radivojac P, Retchsteiner A, Verspoor K, Wang Z, Rocha LM (2008) Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks. p 9(Suppl 2):S11
Kolchinsky A, Abi-Haidar A, Kaur J, Hamed AA, Rocha LM (2010) Classification of protein-protein interaction full-text documents using text and citation network features. IEEE/ACM Trans Comput Biol Bioinform/IEEE, ACM 7(3):400–411. doi:10.1109/TCBB.2010.55. URL http://www.computer.org/portal/web/csdl/doi/10.1109/TCBB.2010.55
Hofmeyr SA (2001) An interpretative introduction to the immune system. Design principles for the immune system and other distributed autonomous systems
Segel LA, Cohen I (2001) Design principles for the immune system and other distributed autonomous systems. Oxford University Press, Oxford
Google Scholar
Mitchell M (2006) Complex systems: network thinking. Artif Intell 170(18):1194–1212
Article Google Scholar
Peak D, West JD, Messinger SM, Mott KA (2004) Evidence for complex, collective dynamics and distributed emergent computation in plants. PNAS 101(4):918–922
Article Google Scholar
Helikar T, Konvalina J, Heidel J, Rogers JA (2008) Emergent decision-making in biological signal transduction networks. Proc Natl Acad Sci USA 105(6):1913–1918. doi:10.1073/pnas.0705088105
Article Google Scholar
Walters M, Sperandio V (2006) Quorum sensing in escherichia coli and salmonella. Int J Med Microbiol 296(2–3):125–131. doi:10.1016/j.ijmm.2006.01.041
Article Google Scholar
Pratt SC (2005) Quorum sensing by encounter rates in the ant temnothorax albipennis. Behav Ecol 16(2):488–496. doi:10.1093/beheco/ari0210.1093/beheco/ari020
Article Google Scholar
Crutchfield J, Mitchell M (1995) The evolution of emergent computation. PNAS 92(23)
Rocha LM, Hordijk W (2005) Material representations: from the genetic code to the evolution of cellular automata. Artif Life 11(1–2):189–214
Article Google Scholar
Shalizi C, Haslinger R, Rouquier J-B, Klinkner K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73
Timmis J (2007) Artificial immune systems today and tomorrow. Nat Comput 6(1):1–18
Article MathSciNet MATH Google Scholar
Twycross J, Cayzer S (2002) An immune system approach to document classification. Master’s thesis, COGS, University of Sussex, UK
Dasgupta D, Nino F (2008) Immunological computation: theory and applications. AUERBACH
Garrett SM (2003) A paratope is not an epitope: implications for immune networks and clonal selection. pp 217–228
Abi-Haidar A, Rocha LM (2008) Artificial immune systems (Proc. ICARIS), pp 36–47
Abi-Haidar A, Rocha LM (2008) Artificial life XI: 11th international conference on the simulation and synthesis of living systems. MIT Press, Cambridge, pp 1–9
Google Scholar
Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):200415
Google Scholar
Paul WE, Technologies IO (1993) Fundamental immunology. Raven Press, New York
Google Scholar
Burnet SFM (1959) The clonal selection theory of acquired immunity. Vanderbilt University Press, Nashville
Google Scholar
De Castro LN, Timmis J (2002) Artificial immune systems: a new computational intelligence approach. Springer, Berlin
MATH Google Scholar
Sepulveda NH (2009) How is the t-cell repertoire shaped. Ph.D. thesis, Instituto Gulbenkian de Ciencia
Abi-Haidar A, Rocha LM (2010) ICARIS 2010: Proceedings of the 9th international conference on artificial immune systems. In: pp 237–249
Abi-Haidar A, Rocha LM (2010) Artificial life XII: twelfth international conference on the simulation and synthesis of living systems. In: pp 706–713
Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with Naive Bayes–Which Naive Bayes? In: Third Conference on Email and Anti-Spam (CEAS)
Joachims T (2002) Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer, Dordrecht
Google Scholar
Porter MF (1980) An algorithm for suffix stripping. Program 13(3):130–137
Google Scholar
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation, pp 1015–1021

Download references

Acknowledgments

This work was partially supported by a grant from the FLAD Computational Biology Collaboratorium at the Instituto Gulbenkian de Ciencia in Portugal. We also thank the ICARIS2010 committee board for encouraging this work. We acknowledge the computational resources provided by Indiana University used to conduct the simulations we report.

Author information

Authors and Affiliations

School of Informatics and Computing, Indiana University, Bloomington, IN, 47401, USA
Alaa Abi-Haidar & Luis M. Rocha
FLAD Computational Biology Collaboratorium, Instituto Gulbenkian de Ciência, Oeiras, Portugal
Alaa Abi-Haidar & Luis M. Rocha

Authors

Alaa Abi-Haidar
View author publications
You can also search for this author in PubMed Google Scholar
Luis M. Rocha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alaa Abi-Haidar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abi-Haidar, A., Rocha, L.M. Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics. Evol. Intel. 4, 69–80 (2011). https://doi.org/10.1007/s12065-011-0052-5

Download citation

Received: 01 October 2010
Revised: 17 January 2011
Accepted: 18 January 2011
Published: 05 February 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s12065-011-0052-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

Abstract

Access this article

Similar content being viewed by others

Affinity Groups: A Linguistic Analysis for Social Network Groups Identification

Twitter User Clustering Based on Their Preferences and the Louvain Algorithm

Mining Interesting Topics in Twitter Communities

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

Abstract

Access this article

Similar content being viewed by others

Affinity Groups: A Linguistic Analysis for Social Network Groups Identification

Twitter User Clustering Based on Their Preferences and the Louvain Algorithm

Mining Interesting Topics in Twitter Communities

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation