Apache Hadoop Training | BigData training in Chennai

Apache Hadoop
Write once , Read Many Times!

About Apache Hadoop

Apache Hadoop® is Built for Big Data, Insights and Innovation. Learn More Today. Cost-Effective Solution. Simple Programming Models.

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

  • Offering Highly Reliable
  • Distributed Processing
  • Providing Cost-Effective Solution

Hadoop Topics

The following are the things covered under Hadoop.

HDFS

Hadoop Distributed File System is the core component or you can say, the backbone of Hadoop Ecosystem.

    YARN

    YARN as the brain of your Hadoop Ecosystem. It performs all your processing activities by allocating resources and scheduling tasks.

      MAPREDUCE

      MapReduce is a software framework which helps in writing applications that processes large data sets using distributed and parallel algorithms inside Hadoop environment.

        APACHE PIG

        PIG has two parts: Pig Latin, the language and the pig runtime, for the execution environment. You can better understand it as Java and JVM.

          APACHE HIVE

          Facebook created HIVE for people who are fluent with SQL. Thus, HIVE makes them feel at home while working in a Hadoop Ecosystem.

            APACHE HBASE

            HBase is an open source, non-relational distributed database. In other words, it is a NoSQL database.

              Course Contents

              The following are the course contents offered for Big Data / Apache Hadoop

              • Introduction to Big Data and Hadoop
              • Getting Started with Hadoop
              • Introduction to Big Data Stack and Spark
              • The Motivation for Hadoop
              • Hadoop Overview
              • Data Storage: HDFS
              • Distributed Data Processing: Map Reduce
              • Data Processing and Analysis: Pig
              • Data Integration: Sqoop & Flume
              • Other Hadoop Data Tools & EcoSystem
              • Hive as Data warehouse
              • HBase as NoSQL
              • Oozie for workflow Management & Scheduling
              • Cluster Computing and Hadoop Clusters
              • Hadoop Components and the Hadoop Ecosphere
              • What Do Hadoop Administrators Do?
              • Key Differences between Hadoop 1 and Hadoop 2
              • Distributed Data Processing: MapReduce and Spark
              • Data Integration: Apache Sqoop
              • Key Areas of Hadoop Administration
              • Distributed Computing and Hadoop
              • Hadoop 2 Architecture
              • Data Storage – the Hadoop Distributed File System
              • HDFS – Hadoop Distributed File System
              • HDFS Architecture
              • Hadoop1.x Components
              • Namenode
              • Fault tolerance & High availability
              • Failure handling - FSImage
              • HDFS Commands
              • Hadoop Distributions and Installation Types
              • Understanding the Configuration files
              • Configuration Property names and Values
              • Setting Up a Portable Hadoop File System
              • Setting up a Pseudo-Distributed Hadoop 2 Cluster
              • Performing the Initial Hadoop Configuration
              • Operating the New Hadoop Cluster
              • Hands-On Exercise
              • Planning your Hadoop Cluster
              • Going from a Single Rack to Multiple Racks
              • Creating a Multi-Node Cluster
              • Modifying the Hadoop Configuration
              • Starting up the Cluster
              • Configuring Hadoop Services
              • Hands-On Exercise
              • Map Reduce Anatomy
              • Map Reduce Examples
              • Running MapReduce programs in Hadoop
              • Hadoop2.x Components
              • Block size and performance
              • YARN
              • Hadoop 2.x vs Hadoop 1.x
              • Hands-On Exercise
              • Single Node setup
              • Hands-On Exercise
              • Multi Node setup
              • Scaling up/down Hadoop cluster
              • Replication distribution and automatic discovery
              • Hands-On Exercise
              • Using Combiners
              • Reducing Intermediate Data with Combiners
              • Using The Distributed Cache
              • Logging
              • Splittable File Formats
              • Determining the Optimal Number of Reducers
              • Map-Only MapReduce Jobs
              • Hands-On Exercise
              • SQOOP Introcution & Architecture
              • Importing RDB data to HDFS
              • Importing RDB data to Hive
              • Apache Pig Introduction
              • Apache Pig Setup
              • Apache Pig Commands
              • FILTER
              • Structured(including XML/JSON) data processing using Apache Pig
              • Parameter substitution
              • Macros in Pig
              • Unstructured data processing using Apache Pig
              • Best Practices for Pig
              • Pig UDF
              • PIG Advanced
              • Flume Introduction
              • Flume with Local
              • Flume with HDFS
              • Flume with Hive
              • Flume with HBASE
              • Apache Hive - Introduction
              • Apache Hive - Setup
              • Managed tables & external tables
              • Apache Hive - Commands
              • Unstructured Data Handling with BigData Tools
              • HandsOn Use Case PoC
              • Best practices of monitoring a Hadoop cluster
              • Using logs and stack traces for monitoring and troubleshooting
              • Using open-source tools to monitor Hadoop cluster

              Download

              Download Apache Hadoop course plan

              Designed by BootstrapMade