Abstract
Content management systems and frameworks (CMS/F) play a key role in Web development. They support common Web operations and provide for a number of optional modules to implement customized functionalities. Given the increasing demand for text mining (TM) applications, it seems logical that CMS/F extend their offer of TM modules. In this regard, this work contributes to Drupal CMS/F with modules that support customized named entity recognition and enable the construction of domain-specific document search engines. Implementation relies on well-recognized Apache Information Retrieval and TM initiatives, namely Apache Lucene, Apache Solr and Apache Unstructured Information Management Architecture (UIMA). As proof of concept, we present here the development of a Drupal CMS/F that retrieves biomedical articles and performs automatic recognition of organism names to enable further organism-driven document screening.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kano, Y., Baumgartner, W.A., McCrohon, L., et al.: U-Compare: share and compare text mining tools with UIMA. Bioinformatics 25, 1997–1998 (2009), doi:10.1093/bioinformatics/btp289
Fan, W., Wallace, L., Rich, S., Zhang, Z.: Tapping the power of text mining. Commun. ACM 49, 76–82 (2006), doi:10.1145/1151030.1151032
Gemert, J.: Van Text Mining Tools on the Internet An overview. Univ. Amsterdam 25, 1–75 (2000)
Lourenço, A., Carreira, R., Carneiro, S., et al.: @Note: A workbench for biomedical text mining. J. Biomed. Inform. 42, 710–720 (2009), doi:10.1016/j.jbi.2009.04.002
Hucka, M., Finney, A., Sauro, H.: A medium for representation and exchange of biochemical network models (2003)
Lu, Z., Hirschman, L.: Biocuration workflows and text mining: overview of the BioCreative, Workshop Track II. Database (Oxford) 2012:bas043 (2012), doi:10.1093/database/bas043
Feinerer, I., Hornik, K., Meyer, D.: Text Mining Infrastructure in R. J. Stat. Softw. 25, 1–54 (2008), doi:citeulike-article-id:2842334
Fernández-Suárez, X.M., Rigden, D.J., Galperin, M.Y.: The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection. Nucleic Acids Res. 42, 1–6 (2014), doi:10.1093/nar/gkt1282
Papanicolaou, A., Heckel, D.G.: The GMOD Drupal bioinformatic server framework. Bioinformatics 26, 3119–3124 (2010), doi:10.1093bioinformatics/btq599
Decker, S., Melnik, S., van Harmelen, F., et al.: The Semantic Web: the roles of XML and RDF. IEEE Internet Comput. 4, 63–73 (2000), doi:10.1109/4236.877487
Rebholz-Schuhmann, D., Kafkas, S., Kim, J.-H., et al.: Monitoring named entity recognition: The League Table. J. Biomed Semantics 4, 19 (2013), doi:10.1186/2041-1480-4-19
Rzhetsky, A., Seringhaus, M., Gerstein, M.B.: Getting started in text mining: Part two. PLoS Comput. Biol. 5, e1000411 (2009), doi:10.1371/journal.pcbi.1000411
Gerner, M., Nenadic, G., Bergman, C.M.: LINNAEUS: A species name identification system for biomedical literature. BMC Bioinformatics 11, 85 (2010), doi:10.1186/1471-2105-11-85
Fielding, R.T., Kaiser, G.: The Apache HTTP Server Project. IEEE Internet Comput. (1997), doi:10.1109/4236.612229
Web server | Drupal.org., https://drupal.org/requirements/webserver
Smiley, D., Pugh, E.: Apache Solr 3 Enterprise Search Server, p. 418 (2011)
McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, Second Edition: Covers Apache Lucene 3.0, p. 475 (2010)
Konchady, M.: Building Search Applications: Lucene, LingPipe, and Gate, p. 448 (2008)
Ferrucci, D., Lally, A.: UIMA: An architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. (2004)
Rak, R., Rowley, A., Ananiadou, S.: Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench. In: LREC (2012)
Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinformatics 10, 46 (2009), doi:10.1186/1471-2105-10-46
Baumgartner, W.A., Cohen, K.B., Hunter, L.: An open-source framework for large-scale, flexible evaluation of biomedical text mining systems. J. Biomed. Discov. Collab. 3(1) (2008), doi:10.1186/1747-5333-3-1
Móra, G.: Concept identification by machine learning aided dictionary-based named entity recognition and rule-based entity normalisation. Second CALBC Work
Kumar, J.: Apache Solr PHP Integration, p. 118 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ferrnandes, J., Lourenço, A. (2014). Bringing Named Entity Recognition on Drupal Content Management System. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-07581-5_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07580-8
Online ISBN: 978-3-319-07581-5
eBook Packages: EngineeringEngineering (R0)