Bringing Named Entity Recognition on Drupal Content Management System

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 294)

Abstract

Content management systems and frameworks (CMS/F) play a key role in Web development. They support common Web operations and provide for a number of optional modules to implement customized functionalities. Given the increasing demand for text mining (TM) applications, it seems logical that CMS/F extend their offer of TM modules. In this regard, this work contributes to Drupal CMS/F with modules that support customized named entity recognition and enable the construction of domain-specific document search engines. Implementation relies on well-recognized Apache Information Retrieval and TM initiatives, namely Apache Lucene, Apache Solr and Apache Unstructured Information Management Architecture (UIMA). As proof of concept, we present here the development of a Drupal CMS/F that retrieves biomedical articles and performs automatic recognition of organism names to enable further organism-driven document screening.

Keywords

Drupal text mining named entity recognition Apache Lucene Apache Solr Apache UIMA 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kano, Y., Baumgartner, W.A., McCrohon, L., et al.: U-Compare: share and compare text mining tools with UIMA. Bioinformatics 25, 1997–1998 (2009), doi:10.1093/bioinformatics/btp289Google Scholar
  2. 2.
    Fan, W., Wallace, L., Rich, S., Zhang, Z.: Tapping the power of text mining. Commun. ACM 49, 76–82 (2006), doi:10.1145/1151030.1151032CrossRefGoogle Scholar
  3. 3.
    Gemert, J.: Van Text Mining Tools on the Internet An overview. Univ. Amsterdam 25, 1–75 (2000)Google Scholar
  4. 4.
    Lourenço, A., Carreira, R., Carneiro, S., et al.: @Note: A workbench for biomedical text mining. J. Biomed. Inform. 42, 710–720 (2009), doi:10.1016/j.jbi.2009.04.002CrossRefGoogle Scholar
  5. 5.
    Hucka, M., Finney, A., Sauro, H.: A medium for representation and exchange of biochemical network models (2003)Google Scholar
  6. 6.
    Lu, Z., Hirschman, L.: Biocuration workflows and text mining: overview of the BioCreative, Workshop Track II. Database (Oxford) 2012:bas043 (2012), doi:10.1093/database/bas043Google Scholar
  7. 7.
    Feinerer, I., Hornik, K., Meyer, D.: Text Mining Infrastructure in R. J. Stat. Softw. 25, 1–54 (2008), doi:citeulike-article-id:2842334 Google Scholar
  8. 8.
    Fernández-Suárez, X.M., Rigden, D.J., Galperin, M.Y.: The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection. Nucleic Acids Res. 42, 1–6 (2014), doi:10.1093/nar/gkt1282Google Scholar
  9. 9.
    Papanicolaou, A., Heckel, D.G.: The GMOD Drupal bioinformatic server framework. Bioinformatics 26, 3119–3124 (2010), doi:10.1093bioinformatics/btq599 Google Scholar
  10. 10.
    Decker, S., Melnik, S., van Harmelen, F., et al.: The Semantic Web: the roles of XML and RDF. IEEE Internet Comput. 4, 63–73 (2000), doi:10.1109/4236.877487CrossRefGoogle Scholar
  11. 11.
    Rebholz-Schuhmann, D., Kafkas, S., Kim, J.-H., et al.: Monitoring named entity recognition: The League Table. J. Biomed Semantics 4, 19 (2013), doi:10.1186/2041-1480-4-19CrossRefGoogle Scholar
  12. 12.
    Rzhetsky, A., Seringhaus, M., Gerstein, M.B.: Getting started in text mining: Part two. PLoS Comput. Biol. 5, e1000411 (2009), doi:10.1371/journal.pcbi.1000411Google Scholar
  13. 13.
    Gerner, M., Nenadic, G., Bergman, C.M.: LINNAEUS: A species name identification system for biomedical literature. BMC Bioinformatics 11, 85 (2010), doi:10.1186/1471-2105-11-85CrossRefGoogle Scholar
  14. 14.
    Fielding, R.T., Kaiser, G.: The Apache HTTP Server Project. IEEE Internet Comput. (1997), doi:10.1109/4236.612229Google Scholar
  15. 15.
  16. 16.
    Smiley, D., Pugh, E.: Apache Solr 3 Enterprise Search Server, p. 418 (2011)Google Scholar
  17. 17.
    McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, Second Edition: Covers Apache Lucene 3.0, p. 475 (2010)Google Scholar
  18. 18.
    Konchady, M.: Building Search Applications: Lucene, LingPipe, and Gate, p. 448 (2008)Google Scholar
  19. 19.
    Ferrucci, D., Lally, A.: UIMA: An architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. (2004)Google Scholar
  20. 20.
    Rak, R., Rowley, A., Ananiadou, S.: Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench. In: LREC (2012)Google Scholar
  21. 21.
    Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinformatics 10, 46 (2009), doi:10.1186/1471-2105-10-46CrossRefGoogle Scholar
  22. 22.
    Baumgartner, W.A., Cohen, K.B., Hunter, L.: An open-source framework for large-scale, flexible evaluation of biomedical text mining systems. J. Biomed. Discov. Collab. 3(1) (2008), doi:10.1186/1747-5333-3-1Google Scholar
  23. 23.
    Móra, G.: Concept identification by machine learning aided dictionary-based named entity recognition and rule-based entity normalisation. Second CALBC WorkGoogle Scholar
  24. 24.
    Kumar, J.: Apache Solr PHP Integration, p. 118 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.ESEI - Escuela Superior de Ingeniería Informática, Edificio PolitécnicoUniversity of VigoOurenseSpain
  2. 2.IBB - Institute for Biotechnology and Bioengineering, Centre of Biological EngineeringUniversity of MinhoBragaPortugal

Personalised recommendations