A Tool for Converting Different Data Representation Formats

Chatterji, Sanjay; Sengupta, Subrangshu; Rao, Bagadhi Gopal; Banerjee, Debarghya

doi:10.1007/978-3-319-13817-6_28

A Tool for Converting Different Data Representation Formats

Sanjay Chatterji²¹,
Subrangshu Sengupta²¹,
Bagadhi Gopal Rao²¹ &
…
Debarghya Banerjee²¹

Conference paper

1707 Accesses
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8891))

Abstract

Recently, data analysis and processing is one of the most interesting and demanding fields in both academics and industries. There are large numbers of tools openly available in web. But, different tools take inputs and return outputs in different data representation formats. To build the appropriate converter for a pair of data representation formats, we need both sufficient time and in depth knowledge of the formats. Here, we discuss CoNLL, SSF, XML and JSON data representation formats and develop a tool for conversion between them. Other conversions will be included in the extended version.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Loper, E., Bird, S.: Nltk: The natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 134–141. Association for Computational Linguistics (2003)
Google Scholar
Kumar, P., Ahmad, R., Chaudhary, B., Sinha, M.: Enriched dashboard: An integration and visualization tool for distributed nlp systems on heterogeneous platforms. In: 2013 13th International Conference on Computational Science and Its Applications (ICCSA), pp. 105–114 (2013)
Google Scholar
Bharati, A., Sangal, R., Sharma, D.M.: Ssf: Shakti standard format guide. Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India, pp. 1–25 (2007)
Google Scholar
Bharati, A., Sangal, R., Sharma, D., Singh, A.K.: Ssf: A common representation scheme for language analysis for language technology infrastructure development. In: COLING 2014, p. 66 (2014)
Google Scholar
Saxena, A., Madhyasta, P.S., Nivre, J.: Building the uppsala hindi-swedish-english parallel treebank
Google Scholar
Agarwal, R.: Automatic Error Detection for Treebank Validation. PhD thesis, International Institute of Information Technology Hyderabad (2012)
Google Scholar
Gade, R.P.: Dependency parsing approaches for Indian Languages: Hindi and Sanskrit. PhD thesis, International Institute of Information Technology Hyderabad (2014)
Google Scholar
Tammewar, S.J.N.J.A., Sharma, R.A.B.D.M.: Exploring semantic information in hindi wordnet for hindi dependency parsing
Google Scholar
Krishnarao, A.A., Gahlot, H., Srinet, A., Kushwaha, D.S.: A comparison of performance of sequential learning algorithms on the task of named entity recognition for indian languages. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009, Part I. LNCS, vol. 5544, pp. 123–132. Springer, Heidelberg (2009)
Chapter Google Scholar
Crockford, D.: Json: The fat-free alternative to xml. In: Proc. of XML, vol. (2006)
Google Scholar
Ecma, E.: 262: Ecmascript language specification. ECMA (European Association for Standardizing Information and Communication Systems), pub-ECMA: adr (1999)
Google Scholar
Tong, K.: Migrating data using an intermediate self-describing format. US Patent 7,290,003 (2007)
Google Scholar
Clark, J., Tong, K., Wu, X., Vong, F.: Dynamically pipelined data migration. US Patent 7,299,237 (2007)
Google Scholar
Gupta, R., Goyal, P., Diwakar, S.: Transliteration among indian languages using wx notation. g Semantic Approaches in Natural Language Processing, 147 (2010)
Google Scholar
Sharma, S., Bora, N., Halder, M.: English-hindi transliteration using statistical machine translation in different notation. Training 20000(297380), 20000 (2012)
Google Scholar
Buchholz, S., Marsi, E.: Conll-x shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)
Google Scholar
Leacock, C., Towell, G., Voorhees, E.: Corpus-based statistical sense resolution. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 260–265 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Samsung R&D Institute India, Bangalore, India
Sanjay Chatterji, Subrangshu Sengupta, Bagadhi Gopal Rao & Debarghya Banerjee

Authors

Sanjay Chatterji
View author publications
You can also search for this author in PubMed Google Scholar
Subrangshu Sengupta
View author publications
You can also search for this author in PubMed Google Scholar
Bagadhi Gopal Rao
View author publications
You can also search for this author in PubMed Google Scholar
Debarghya Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University College Cork, 011927, Cork, Ireland
Rajendra Prasath & Philip O’Reilly &
V.H.N.Senthikumara Nadar College, 626 001, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatterji, S., Sengupta, S., Rao, B.G., Banerjee, D. (2014). A Tool for Converting Different Data Representation Formats. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8891. Springer, Cham. https://doi.org/10.1007/978-3-319-13817-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-13817-6_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13816-9
Online ISBN: 978-3-319-13817-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics