Cache Management in HDFS

The popularity of Spark has spurred  great interest in providing memory cache capability in HDFS. With the new release of Hadoop 2.3.0, centralized cache management has been introduced.

See the following JIRA.

You can view the design doc here. It is an excellent read. Kudos to both Andrew Wang and Colin Patrick McCabe. They are the instrumental and key contributors of this feature.

This cache memory feature allows memory local HDFS clients to use zero copy read (mlock native calls) to access these memory cached data to further speed up read operation. The mlock and mlockall native calls tell the system to lock to a specified memory range. This prevents the allocated memory from page fault.

You can learn about mlock from the following page.

You can also find the Java native IO mlock and munlock call to memory mapped data in

and the corresponding C native code

* Used to manipulate the operating system cache.
public static class CacheManipulator {
public void mlock(String identifier, ByteBuffer buffer,
long len) throws IOException {
POSIX.mlock(buffer, len);

public long getMemlockLimit() {
return NativeIO.getMemlockLimit();

public long getOperatingSystemPageSize() {
return NativeIO.getOperatingSystemPageSize();

public void posixFadviseIfPossible(String identifier,
FileDescriptor fd, long offset, long len, int flags)
throws NativeIOException {
NativeIO.POSIX.posixFadviseIfPossible(identifier, fd, offset,
len, flags);

public boolean verifyCanMlock() {
return NativeIO.isAvailable();

NameNode has to keep track of the memory cache replicas at different DataNodes with this feature. The cache blocks infos are exchanged between them via heartbeat protocol. As a result, additional metadata e.g.. cachedHosts array is introduced in BlockLocation as seen below.

public class BlockLocation {
  private String[] hosts; // Datanode hostnames
  <strong>private String[] cachedHosts; // Datanode hostnames with a cached replica</strong>
  private String[] names; // Datanode IP:xferPort for accessing the block
  private String[] topologyPaths; // Full path name in network topology
  private long offset;  // Offset of the block in the file
  private long length;
  private boolean corrupt;

Stay tuned for more infos…

Quick and Easy Macbook Pro Development Setup

I just bought a new Macbook Pro to replace my old one. I documented the following steps to set it up for Java development.

1) Download iTerm2

2) Download color schemes for iTerm2
4) Set up JAVA_HOME environment variable. On terminal, type the following commands
a) echo export “JAVA_HOME=\$(/usr/libexec/java_home)” >> ~/.bash_profile
b) source ~/.bash_profile
5) Install homebrew
on terminal, run the following command
6) Install maven
brew install maven
Run the following command to check the maven version
mvn -version
7) Download and Install Eclipse Kepler
Download the google java coding style xml for Eclipse
Run the command
svn checkout google-styleguide-read-only
You will find the eclipse java style xml at
Go to Eclipse, and install the above eclipse-java-google-style.xml. Follow the below steps
Click on Window -> Preferences -> Java -> Code Style -> Formatter
import the google code style
  • To install Eclipse Color Themes, get the following plugin
8) Install Git
brew install git
9) Install Maven
brew install maven
10) Install Tmux
brew install tmux
You could use the following handy tmux.conf provided by Christian Pelczarski
You might also want to use to following useful applications on your Mac
Dropbox, Evernote, Wunderlist