Abstract
Recently, data analysis and processing is one of the most interesting and demanding fields in both academics and industries. There are large numbers of tools openly available in web. But, different tools take inputs and return outputs in different data representation formats. To build the appropriate converter for a pair of data representation formats, we need both sufficient time and in depth knowledge of the formats. Here, we discuss CoNLL, SSF, XML and JSON data representation formats and develop a tool for conversion between them. Other conversions will be included in the extended version.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Loper, E., Bird, S.: Nltk: The natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 134–141. Association for Computational Linguistics (2003)
Kumar, P., Ahmad, R., Chaudhary, B., Sinha, M.: Enriched dashboard: An integration and visualization tool for distributed nlp systems on heterogeneous platforms. In: 2013 13th International Conference on Computational Science and Its Applications (ICCSA), pp. 105–114 (2013)
Bharati, A., Sangal, R., Sharma, D.M.: Ssf: Shakti standard format guide. Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India, pp. 1–25 (2007)
Bharati, A., Sangal, R., Sharma, D., Singh, A.K.: Ssf: A common representation scheme for language analysis for language technology infrastructure development. In: COLING 2014, p. 66 (2014)
Saxena, A., Madhyasta, P.S., Nivre, J.: Building the uppsala hindi-swedish-english parallel treebank
Agarwal, R.: Automatic Error Detection for Treebank Validation. PhD thesis, International Institute of Information Technology Hyderabad (2012)
Gade, R.P.: Dependency parsing approaches for Indian Languages: Hindi and Sanskrit. PhD thesis, International Institute of Information Technology Hyderabad (2014)
Tammewar, S.J.N.J.A., Sharma, R.A.B.D.M.: Exploring semantic information in hindi wordnet for hindi dependency parsing
Krishnarao, A.A., Gahlot, H., Srinet, A., Kushwaha, D.S.: A comparison of performance of sequential learning algorithms on the task of named entity recognition for indian languages. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009, Part I. LNCS, vol. 5544, pp. 123–132. Springer, Heidelberg (2009)
Crockford, D.: Json: The fat-free alternative to xml. In: Proc. of XML, vol. (2006)
Ecma, E.: 262: Ecmascript language specification. ECMA (European Association for Standardizing Information and Communication Systems), pub-ECMA: adr (1999)
Tong, K.: Migrating data using an intermediate self-describing format. US Patent 7,290,003 (2007)
Clark, J., Tong, K., Wu, X., Vong, F.: Dynamically pipelined data migration. US Patent 7,299,237 (2007)
Gupta, R., Goyal, P., Diwakar, S.: Transliteration among indian languages using wx notation. g Semantic Approaches in Natural Language Processing, 147 (2010)
Sharma, S., Bora, N., Halder, M.: English-hindi transliteration using statistical machine translation in different notation. Training 20000(297380), 20000 (2012)
Buchholz, S., Marsi, E.: Conll-x shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)
Leacock, C., Towell, G., Voorhees, E.: Corpus-based statistical sense resolution. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 260–265 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Chatterji, S., Sengupta, S., Rao, B.G., Banerjee, D. (2014). A Tool for Converting Different Data Representation Formats. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-13817-6_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13816-9
Online ISBN: 978-3-319-13817-6
eBook Packages: Computer ScienceComputer Science (R0)