Apache Hadoop

Apache Hadoop
Write once , Read Many Times!

About Apache Hadoop

Apache Hadoop® is Built for Big Data, Insights and Innovation. Learn More Today. Cost-Effective Solution. Simple Programming Models.

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Offering Highly Reliable
Distributed Processing
Providing Cost-Effective Solution

Hadoop Topics

The following are the things covered under Hadoop.

HDFS

Hadoop Distributed File System is the core component or you can say, the backbone of Hadoop Ecosystem.

YARN

YARN as the brain of your Hadoop Ecosystem. It performs all your processing activities by allocating resources and scheduling tasks.

MAPREDUCE

MapReduce is a software framework which helps in writing applications that processes large data sets using distributed and parallel algorithms inside Hadoop environment.

APACHE PIG

PIG has two parts: Pig Latin, the language and the pig runtime, for the execution environment. You can better understand it as Java and JVM.

APACHE HIVE

Facebook created HIVE for people who are fluent with SQL. Thus, HIVE makes them feel at home while working in a Hadoop Ecosystem.

APACHE HBASE

HBase is an open source, non-relational distributed database. In other words, it is a NoSQL database.

Course Contents

The following are the course contents offered for Big Data / Apache Hadoop

Introduction to Big Data and Hadoop
Getting Started with Hadoop
Introduction to Big Data Stack and Spark

The Motivation for Hadoop
Hadoop Overview
Data Storage: HDFS
Distributed Data Processing: Map Reduce
Data Processing and Analysis: Pig
Data Integration: Sqoop & Flume
Other Hadoop Data Tools & EcoSystem
Hive as Data warehouse
HBase as NoSQL
Oozie for workflow Management & Scheduling

Cluster Computing and Hadoop Clusters
Hadoop Components and the Hadoop Ecosphere
What Do Hadoop Administrators Do?
Key Differences between Hadoop 1 and Hadoop 2
Distributed Data Processing: MapReduce and Spark
Data Integration: Apache Sqoop
Key Areas of Hadoop Administration

Distributed Computing and Hadoop
Hadoop 2 Architecture
Data Storage – the Hadoop Distributed File System

HDFS – Hadoop Distributed File System
HDFS Architecture
Hadoop1.x Components
Namenode
Fault tolerance & High availability
Failure handling - FSImage
HDFS Commands

Hadoop Distributions and Installation Types
Understanding the Configuration files
Configuration Property names and Values
Setting Up a Portable Hadoop File System
Setting up a Pseudo-Distributed Hadoop 2 Cluster
Performing the Initial Hadoop Configuration
Operating the New Hadoop Cluster
Hands-On Exercise

Planning your Hadoop Cluster
Going from a Single Rack to Multiple Racks
Creating a Multi-Node Cluster
Modifying the Hadoop Configuration
Starting up the Cluster
Configuring Hadoop Services
Hands-On Exercise

Map Reduce Anatomy
Map Reduce Examples
Running MapReduce programs in Hadoop
Hadoop2.x Components
Block size and performance
YARN
Hadoop 2.x vs Hadoop 1.x
Hands-On Exercise

Single Node setup
Hands-On Exercise
Multi Node setup
Scaling up/down Hadoop cluster
Replication distribution and automatic discovery
Hands-On Exercise

Using Combiners
Reducing Intermediate Data with Combiners
Using The Distributed Cache
Logging
Splittable File Formats
Determining the Optimal Number of Reducers
Map-Only MapReduce Jobs
Hands-On Exercise

SQOOP Introcution & Architecture
Importing RDB data to HDFS
Importing RDB data to Hive

Apache Pig Introduction
Apache Pig Setup
Apache Pig Commands
FILTER
Structured(including XML/JSON) data processing using Apache Pig
Parameter substitution
Macros in Pig
Unstructured data processing using Apache Pig
Best Practices for Pig
Pig UDF
PIG Advanced

Flume Introduction
Flume with Local
Flume with HDFS
Flume with Hive
Flume with HBASE

Apache Hive - Introduction
Apache Hive - Setup
Managed tables & external tables
Apache Hive - Commands

Unstructured Data Handling with BigData Tools
HandsOn Use Case PoC
Best practices of monitoring a Hadoop cluster
Using logs and stack traces for monitoring and troubleshooting
Using open-source tools to monitor Hadoop cluster

Download

Download Apache Hadoop course plan

WeCanDoNow

Apache Hadoop
Write once , Read Many Times!

About Apache Hadoop

Apache Hadoop® is Built for Big Data, Insights and Innovation. Learn More Today. Cost-Effective Solution. Simple Programming Models.

Hadoop Topics

HDFS

YARN

MAPREDUCE

APACHE PIG

APACHE HIVE

APACHE HBASE

Course Contents

Download

WeCanDoNow

Useful Links

Contact Us

Apache Hadoop Write once , Read Many Times!

About Apache Hadoop

Apache Hadoop® is Built for Big Data, Insights and Innovation. Learn More Today. Cost-Effective Solution. Simple Programming Models.

HDFS

YARN

MAPREDUCE

APACHE PIG

APACHE HIVE

APACHE HBASE

Download

Apache Hadoop
Write once , Read Many Times!