An Overview of Hadoop Scheduler Algorithms

  •  Faten Hamad    


Hadoop is a cloud computing open source system, used in large-scale data processing. It became the basic computing platforms for many internet companies. With Hadoop platform users can develop the cloud computing application and then submit the task to the platform. Hadoop has a strong fault tolerance, and can easily increase the number of cluster nodes, using linear expansion of the cluster size, so that clusters can process larger datasets. However Hadoop has some shortcomings, especially in the actual use of the process of exposure to the MapReduce scheduler, which calls for more researches on Hadoop scheduling algorithms.

This survey provides an overview of the default Hadoop scheduler algorithms and the problem they have. It also compare between five Hadoop framework scheduling algorithms in term of the default scheduler algorithm to be enhanced, the proposed scheduler algorithm, type of cluster applied either heterogeneous or homogeneous, methodology, and clusters classification based on performance evaluation. Finally, a new algorithm based on capacity scheduling and use of perspective resource utilization to enhance Hadoop scheduling is proposed.

This work is licensed under a Creative Commons Attribution 4.0 License.