Hadoop Based Data Intensive Computation on IaaS Cloud Platforms

Full Text: PDF &nbsp;
DOI: 10.5539/cis.v8n3p103

Sanjay Ahuja

doi:10.5539/cis.v8n3p103

Hadoop Based Data Intensive Computation on IaaS Cloud Platforms

Sanjay Ahuja

Abstract

Cloud computing is a relatively new form of computing, which uses virtualized resources and is dynamically scalable and is often provided as pay for use service over the Internet or Intranet or both. With increasing demand for data storage in the cloud, study of data intensive applications is becoming a primary focus. Data intensive applications are those which involve a high CPU usage, processsing large volumes of data typically in size of hundreds of gigabytes, terabytes, or petabytes. This study was conducted on Amazon's Elastic Cloud Compute (EC2) and Amazon Elastic Map Reduce (EMR) using HiBench Hadoop Benchmark Suite. HiBench is a Hadoop benchmark suite and is used for performing and evaluating Hadoop based data intensive computation on both these cloud paltforms. Both quantitative and qualitative comparison was performed on both Amazon EC2 and Amazon EMR, including a study of their pricing models and measures are suggested for future studies and research.