A Tool for Converting Different Data Representation Formats

  • Sanjay Chatterji
  • Subrangshu Sengupta
  • Bagadhi Gopal Rao
  • Debarghya Banerjee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8891)

Abstract

Recently, data analysis and processing is one of the most interesting and demanding fields in both academics and industries. There are large numbers of tools openly available in web. But, different tools take inputs and return outputs in different data representation formats. To build the appropriate converter for a pair of data representation formats, we need both sufficient time and in depth knowledge of the formats. Here, we discuss CoNLL, SSF, XML and JSON data representation formats and develop a tool for conversion between them. Other conversions will be included in the extended version.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Loper, E., Bird, S.: Nltk: The natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)Google Scholar
  2. 2.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)Google Scholar
  3. 3.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 134–141. Association for Computational Linguistics (2003)Google Scholar
  4. 4.
    Kumar, P., Ahmad, R., Chaudhary, B., Sinha, M.: Enriched dashboard: An integration and visualization tool for distributed nlp systems on heterogeneous platforms. In: 2013 13th International Conference on Computational Science and Its Applications (ICCSA), pp. 105–114 (2013)Google Scholar
  5. 5.
    Bharati, A., Sangal, R., Sharma, D.M.: Ssf: Shakti standard format guide. Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India, pp. 1–25 (2007)Google Scholar
  6. 6.
    Bharati, A., Sangal, R., Sharma, D., Singh, A.K.: Ssf: A common representation scheme for language analysis for language technology infrastructure development. In: COLING 2014, p. 66 (2014)Google Scholar
  7. 7.
    Saxena, A., Madhyasta, P.S., Nivre, J.: Building the uppsala hindi-swedish-english parallel treebankGoogle Scholar
  8. 8.
    Agarwal, R.: Automatic Error Detection for Treebank Validation. PhD thesis, International Institute of Information Technology Hyderabad (2012)Google Scholar
  9. 9.
    Gade, R.P.: Dependency parsing approaches for Indian Languages: Hindi and Sanskrit. PhD thesis, International Institute of Information Technology Hyderabad (2014)Google Scholar
  10. 10.
    Tammewar, S.J.N.J.A., Sharma, R.A.B.D.M.: Exploring semantic information in hindi wordnet for hindi dependency parsingGoogle Scholar
  11. 11.
    Krishnarao, A.A., Gahlot, H., Srinet, A., Kushwaha, D.S.: A comparison of performance of sequential learning algorithms on the task of named entity recognition for indian languages. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009, Part I. LNCS, vol. 5544, pp. 123–132. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Crockford, D.: Json: The fat-free alternative to xml. In: Proc. of XML, vol. (2006)Google Scholar
  13. 13.
    Ecma, E.: 262: Ecmascript language specification. ECMA (European Association for Standardizing Information and Communication Systems), pub-ECMA: adr (1999)Google Scholar
  14. 14.
    Tong, K.: Migrating data using an intermediate self-describing format. US Patent 7,290,003 (2007)Google Scholar
  15. 15.
    Clark, J., Tong, K., Wu, X., Vong, F.: Dynamically pipelined data migration. US Patent 7,299,237 (2007)Google Scholar
  16. 16.
    Gupta, R., Goyal, P., Diwakar, S.: Transliteration among indian languages using wx notation. g Semantic Approaches in Natural Language Processing, 147 (2010)Google Scholar
  17. 17.
    Sharma, S., Bora, N., Halder, M.: English-hindi transliteration using statistical machine translation in different notation. Training 20000(297380), 20000 (2012)Google Scholar
  18. 18.
    Buchholz, S., Marsi, E.: Conll-x shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)Google Scholar
  19. 19.
    Leacock, C., Towell, G., Voorhees, E.: Corpus-based statistical sense resolution. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 260–265 (1993)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sanjay Chatterji
    • 1
  • Subrangshu Sengupta
    • 1
  • Bagadhi Gopal Rao
    • 1
  • Debarghya Banerjee
    • 1
  1. 1.Samsung R&D Institute IndiaBangaloreIndia

Personalised recommendations