Bookmarklet-Triggered Literature Metadata Extraction System Using Cloud Plugins

  • Kun Ma
  • Ajith Abraham
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 355)


In this paper, a bookmarklet-triggered literature metadata extraction system using cloud plugins is designed to find metadata of the publisher Web pages. First, we propose selector-syntax extractors using CSS-like syntax. Furthermore, we deploy them in the cloud. Finally, a bookmarklet-triggered way is proposed to execute cloud script to extract metadata of current Web pages. Compared with current methods, this system works across browser platforms with flexibility and extensibility and without installing additional plugins.


Metadata extraction Content negotiation Bookmarklet Cloud computing Plugins 



This work was supported by the Doctoral Fund of University of Jinan (XBS1237) and the Shandong Provincial Natural Science Foundation (ZR2014FQ029).


  1. 1.
    Laender, A.H., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. ACM Sigmod Record 31(2), 84–93 (2002)CrossRefGoogle Scholar
  2. 2.
    Ferrara, E., De Meo, P., Fiumara, G., Baumgartner, R.: Web data extraction, applications and techniques: a survey. arXiv preprint arXiv:1207.0246 (2012)Google Scholar
  3. 3.
    Movshovitz-Attias, D., Cohen, W.W.: Alignment-hmm-based extraction of abbreviations from biomedical text. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, Association for Computational Linguistics (2012) pp. 47–55Google Scholar
  4. 4.
    Crescenzi, V., Mecca, G., Merialdo, P., et al.: Roadrunner: Towards automatic data extraction from large web sites. VLDB. 1, 109–118 (2001)Google Scholar
  5. 5.
    Ma, K., Yang, B.: A simple scheme for bibliography acquisition using doi content negotiation proxy. Electron. Libr. 32(6), 806–824 (2014)CrossRefGoogle Scholar
  6. 6.
    Ma, K., Yang, B., Chen, G.: Doi proxy framework for automated entering and validation of scientific papers. In: Web-Age Information Management. Springer (2013) pp. 799–801Google Scholar
  7. 7.
    Sullivan, S.J.: An archival/records management perspective on pdf/a. Rec. Manag. J. 16(1), 51–56 (2006)Google Scholar
  8. 8.
    Ma, K., Zhang, L.: Bookmarklet-triggered unified literature sharing system in the cloud. Int. J. Comput. Appl. 5(4), 217–226 (2014)Google Scholar
  9. 9.
    Kushmerick, N.: Wrapper induction for information extraction. PhD thesis, University of Washington (1997)Google Scholar
  10. 10.
    Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. Proc. VLDB Endowment 4(4), 219–230 (2011)CrossRefGoogle Scholar
  11. 11.
    Xu, Z., Yan, D.: Designing and implementing of the webpage information extracting model based on tags. In: International Conference on Intelligence Science and Information Engineering (ISIE), IEEE (2011) pp. 273–275Google Scholar
  12. 12.
    Embley, D.W., Campbell, D.M., Smith, R.D., Liddle, S.W.: Ontology-based extraction and structuring of information from data-rich unstructured documents. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, ACM (1998) pp. 52–59Google Scholar
  13. 13.
    Flesca, S., Furche, T., Oro, L.: Reasoning and Ontologies in Data Extraction. Springer, Berlin (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingUniversity of JinanJinanChina
  2. 2.Machine Intelligence Research LabsScientific Network for Innovation and Research ExcellenceAuburnUSA

Personalised recommendations