September | 2014 | Wei Shung Chung

The popularity of Spark has spurred great interest in providing memory cache capability in HDFS. With the new release of Hadoop 2.3.0, centralized cache management has been introduced.

See the following JIRA.

https://issues.apache.org/jira/browse/HDFS-4949

You can view the design doc here. It is an excellent read. Kudos to both Andrew Wang and Colin Patrick McCabe. They are the instrumental and key contributors of this feature.

This cache memory feature allows memory local HDFS clients to use zero copy read (mlock native calls) to access these memory cached data to further speed up read operation. The mlock and mlockall native calls tell the system to lock to a specified memory range. This prevents the allocated memory from page fault.

You can learn about mlock from the following page.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Reference_Guide/sect-Realtime_Reference_Guide-Memory_allocation-Using_mlock_to_avoid_memory_faults.html

You can also find the Java native IO mlock and munlock call to memory mapped data in org.apache.hadoop.io.nativeio.NativeIO.java

https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java

and the corresponding C native code
https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c

/**
* Used to manipulate the operating system cache.
*/
@VisibleForTesting
public static class CacheManipulator {
public void mlock(String identifier, ByteBuffer buffer,
long len) throws IOException {
POSIX.mlock(buffer, len);
}

public long getMemlockLimit() {
return NativeIO.getMemlockLimit();
}

public long getOperatingSystemPageSize() {
return NativeIO.getOperatingSystemPageSize();
}

public void posixFadviseIfPossible(String identifier,
FileDescriptor fd, long offset, long len, int flags)
throws NativeIOException {
NativeIO.POSIX.posixFadviseIfPossible(identifier, fd, offset,
len, flags);
}

public boolean verifyCanMlock() {
return NativeIO.isAvailable();
}
}

NameNode has to keep track of the memory cache replicas at different DataNodes with this feature. The cache blocks infos are exchanged between them via heartbeat protocol. As a result, additional metadata e.g.. cachedHosts array is introduced in BlockLocation as seen below.


public class BlockLocation {
  private String[] hosts; // Datanode hostnames
  <strong>private String[] cachedHosts; // Datanode hostnames with a cached replica</strong>
  private String[] names; // Datanode IP:xferPort for accessing the block
  private String[] topologyPaths; // Full path name in network topology
  private long offset;  // Offset of the block in the file
  private long length;
  private boolean corrupt;

Stay tuned for more infos…

I just bought a new Macbook Pro to replace my old one. I documented the following steps to set it up for Java development.

1) Download iTerm2

https://iterm2.com/downloads.html

2) Download color schemes for iTerm2

http://iterm2colorschemes.com

3) Download java from http://support.apple.com/kb/dl1572

4) Set up JAVA_HOME environment variable. On terminal, type the following commands

a) echo export “JAVA_HOME=\$(/usr/libexec/java_home)” >> ~/.bash_profile

b) source ~/.bash_profile

5) Install homebrew

on terminal, run the following command

ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”

6) Install maven

brew install maven

Run the following command to check the maven version

mvn -version

7) Download and Install Eclipse Kepler

https://www.eclipse.org/kepler/

Download the google java coding style xml for Eclipse

Run the command

svn checkout http://google-styleguide.googlecode.com/svn/trunk/ google-styleguide-read-only

You will find the eclipse java style xml at

google-styleguide-read-only/eclipse-java-google-style.xml

Go to Eclipse, and install the above eclipse-java-google-style.xml. Follow the below steps

Click on Window -> Preferences -> Java -> Code Style -> Formatter
import the google code style

To install Eclipse Color Themes, get the following plugin http://eclipsecolorthemes.org/?view=plugin

8) Install Git

brew install git

9) Install Maven

brew install maven

10) Install Tmux

brew install tmux

You could use the following handy tmux.conf provided by Christian Pelczarski

https://github.com/minimul/dotfiles/blob/master/.tmux.conf

You might also want to use to following useful applications on your Mac

Dropbox, Evernote, Wunderlist

Wei Shung Chung

Wei Shung Chung – Hadoop, HBase, MapReduce, Spark, Spark ML, Machine Learning, Deep Learning

Monthly Archives: September 2014

Cache Management in HDFS

Quick and Easy Macbook Pro Development Setup