Friday, August 1, 2014

Size of container

In Hadoop YARN, a container is a sub unit of a physical data node. The size of a container affect the performance of MapReduce greatly especially when the application itself supports multiple threads.

Let us say we have one DataNode with 4 cores and 8 GB memory, now we want to run BWA with input "A_1.fastq", what are the options? 

1) 1 container per DataNode. This container has all 4 cores and 6.4 GB memory (we do not want to starve the host DataNode). So we have only one BWA process running like "bwa mem -t 4 ... A_1.fastq" with 6.4GB available memory per BWA process. 

2) 4 container, each container has 1 core and 1.6 GB memory. so we have to split the "A_1.fastq" into "A_1_1.fastq" ... "A_4_1.fastq", then start 4 parallel BWA processes running like "bwa mem -t 1 ... A_1_1.fastq" and "bwa mem -t 1 ... A_2_1.fastq", etc. with 1.6GB available memory per process. 

Finally we have to merge the resulting SAM files. Since our goal is to optimize the execution, so the question is "Which one is faster?" 

Before jumping to the answer, now we have to consider: 

1) Smaller available memory means the input FASTQ files must be small, otherwise the process will fail. 

2) The overhead. splitting and merging will add some time to the overall running time and every BWA process has to load genome index into memory before mapping the reads. 

Since BWA itself supports multiple threads, it seems like the best way is option (1) - one container per DataNode. Is it the best solution? No! Why? Because the ApplicationMaster itself will occupy one container. 

Assume we have 5 DataNodes, each DataNode has one container. When we start a YARN-based MapReduce application, the ResourceManager will find a container to host the ApplicationMaster, 
subsequently the ApplicationMaster will start computing containers. ApplicationMaster itself will occupy a full container. As a result, we only have 4 computing containers. It is a waste of computing resource since we know ApplicationMaster does not need that much resource (4 cores and 7G memory). This figure shows a node in red box, is running ApplicationMaster without doing the "real computation"







If we "ssh" into that "ApplicationMaster" node, we can see it is running a process named "MRAppMaster".














In another word we wasted 20% of the computing resource. It is not a problem if you are running a 100-node cluster in that case only 1% resource was "wasted". However we do not need a big boss if we are a small team. 

Considering two containers per DataNode? As a result we will have 2*5 = 10 containers in total with 10% of the containers were wasted. But we come into the multiple container problem again - the overhead... 

This is just one the of tradeoff or balancing problems that we have encountered here and there.


No comments:

Post a Comment