An Overview of Hadoop Scheduler Algorithms
- Faten Hamad
Abstract
Hadoop is a cloud computing open source system, used in large-scale data processing. It became the basic computing platforms for many internet companies. With Hadoop platform users can develop the cloud computing application and then submit the task to the platform. Hadoop has a strong fault tolerance, and can easily increase the number of cluster nodes, using linear expansion of the cluster size, so that clusters can process larger datasets. However Hadoop has some shortcomings, especially in the actual use of the process of exposure to the MapReduce scheduler, which calls for more researches on Hadoop scheduling algorithms.
This survey provides an overview of the default Hadoop scheduler algorithms and the problem they have. It also compare between five Hadoop framework scheduling algorithms in term of the default scheduler algorithm to be enhanced, the proposed scheduler algorithm, type of cluster applied either heterogeneous or homogeneous, methodology, and clusters classification based on performance evaluation. Finally, a new algorithm based on capacity scheduling and use of perspective resource utilization to enhance Hadoop scheduling is proposed.
- Full Text: PDF
- DOI:10.5539/mas.v12n8p69
Journal Metrics
(The data was calculated based on Google Scholar Citations)
h5-index (July 2022): N/A
h5-median(July 2022): N/A
Index
- Aerospace Database
- American International Standards Institute (AISI)
- BASE (Bielefeld Academic Search Engine)
- CAB Abstracts
- CiteFactor
- CNKI Scholar
- Elektronische Zeitschriftenbibliothek (EZB)
- Excellence in Research for Australia (ERA)
- JournalGuide
- JournalSeek
- LOCKSS
- MIAR
- NewJour
- Norwegian Centre for Research Data (NSD)
- Open J-Gate
- Polska Bibliografia Naukowa
- ResearchGate
- SHERPA/RoMEO
- Standard Periodical Directory
- Ulrich's
- Universe Digital Library
- WorldCat
- ZbMATH
Contact
- Sunny LeeEditorial Assistant
- mas@ccsenet.org