Different file formats used in Hadoop and HBase

I have been investigating different file formats used in Hadoop and HBase to understand how these file formats assist in the speedup that we’ve all witnessed in this Hadoop big data world. Also, I recommend all java developer to dig into Hadoop and HBase source code because you will definitely learn a lot and improve your java skills.

File formats used in Hadoop are SequenceFile, TFile, and Avro file whereas HFile is used exclusively in HBase.

I found an interesting and detailed explanation of the internal structure of HFile representation in the following blog.

http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html

Enjoy

BigDataExplorer