Apache Avro

  • Deepak Vohra


Apache Avro is a compact binary data serialization format providing varied data structures. Avro uses JSON notation schemas to serialize/deserialize data. Avro data is stored in a container file (an .avro file) and its schema (the .avsc file) is stored with the data file. Unlike some other similar systems such as Protocol buffers, Avro does not require code generation and uses dynamic typing. Data is untagged because the schema is accompanied with the data, resulting in a compact data file. Avro supports versioning; different versions (having different columns) of Avro data files may coexist along with their schemas. Another benefit of Avro is interoperability with other languages because of its efficient binary format. The Apache Hadoop ecosystem supports Apache Avro in several of its projects. Apache Hive provides support to store a table as Avro. The Apache sqoop import command supports importing relational data to an Avro data file. Apache Flume supports Avro as a source and sink type.


Type String Code String Category String External Table Heap Space 

Copyright information

© Deepak Vohra 2016

Authors and Affiliations

  • Deepak Vohra
    • 1
  1. 1.Apt 105White RockCanada

Personalised recommendations