Different file formats used in Hadoop and HBase

I have been investigating different file formats used in Hadoop and HBase to understand how these file formats assist in the speedup that we’ve all witnessed in this Hadoop big data world. Also, I recommend all java developer to dig into Hadoop and HBase source code because you will definitely learn a lot and improve your java skills.

File formats used in Hadoop are SequenceFile, TFile, and Avro file whereas HFile is used exclusively in HBase.

I found an interesting and detailed explanation of the internal structure of HFile representation in the following blog.

http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html

Enjoy

BigDataExplorer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s