Courses

We tailored our courses to address the Hadoop objectives of every data professionals from developers, administrators, analysts and data scientists to decision makers!

Training highlights

 

Promotions & Discounts

  • 1A

    Early Bird Registration (2 weeks prior to start date)

    10%
  • +
  • 1B

    Group Registrations (3+)

    additional 5%
  • 2A

    Community support(Students and Unemployed)

    15%
  • +
  • 2B

    Group Registrations (3+)

    additional 5%

   * Discount Group 1 & Group 2 cannot be combined.   * Discount amount will be refunded on first day of class.

Big Data Courses


Objective

This course is designed for IT professionals - developers, non-Hadoop engineers, managers & freshers to dive deeply into Hadoop Stack detailed design and hands-on exercises. After completion of this course, you will be able to design and implement BigData solutions to solve complex data problems.

Target Audience

Developers, Team Leads, Data Scientists, Managers, Analysts or anyone who wants to get hands-on knowledge on the Hadoop stack.

Pre-requisite

Basic programming knowledge in java or other language is preferred.

Course Outline

Getting Started with Hadoop

  • Introducing the MapReduce Model
  • Introducing Hadoop
  • Installing Hadoop
  • Running Hadoop Examples and Tests

The Hadoop Distributed Filesystem

  • The Design of HDFS
  • HDFS Concepts
  • Blocks Namenodes and Datanodes
  • HDFS High-Availability
  • The Command-Line Interface
  • Basic Filesystem Operations
  • Working with files in HDFS

Components of Hadoop

  • NameNode
  • JobTracker
  • Secondary NameNode
  • Datanode
  • TaskTracker
  • Anatomy of a MapReduce program
  • Reading and writing

Starting Hadoop

  • The building blocks of Hadoop
  • Setting up SSH for a Hadoop cluster
  • Running Hadoop
  • Web-based cluster UI
  • Advanced HDFS
  • Benchmarking HDFS
  • Adding a new DataNode
  • Decommissioning DataNodes

The Basics of a MapReduce Job

  • The Parts of a Hadoop MapReduce Job
  • Configuring a Job
  • Running a Job

How MapReduce Works

  • MapReduce Types and Formats
  • Input Formats
  • Text Input
  • Multiple Inputs
  • Database Input (and Output)
  • Output Formats

Setting Up a Hadoop Cluster

  • Setting Up a Hadoop Cluster
  • Managing the Hadoop System
  • Monitoring
  • Performance Tuning
  • Best practices

Running Hadoop in the cloud

  • Introducing Amazon Web Services
  • Setting up AWS
  • Setting up Hadoop on EC2
  • Running MapReduce programs on EC2 Cleaning up
  • shutting down your EC2 instances
  • Amazon Elastic MapReduce and other AWS services
  • Introduction to Amazon Elastic Map Reduce(EMR)
  • Running a MapReduce job on the Cloud

Hive

  • Installing Hive
  • The Hive Shell
  • Running Hive
  • Configuring Hive
  • Hive Services
  • The Metastore
  • Comparison with Traditional Databases
  • Schema on Read Versus Schema on Write
  • Updates, Transactions, and Indexes
  • Running a SQL-style query with Hive
  • Performing a join with Hive
  • Case Study & Example

Programming with Pig

  • Installing Pig
  • Running Pig
  • Set operations (join, union)
  • Sorting with Pig
  • Speaking Pig Latin
  • Working with user-defined functions
  • Working with scripts
  • Case Study & Example

Other Hadoop eco-System

  • ZooKeeper
  • Sqoop
  • HBASE

HBASE

  • Installation
  • Configuration
  • Basic Hadoop/ZooKeeper/HBase configurations
  • HBase Versus RDBMS

Sqoop

  • A Sample Import
  • Working with Imported Data
  • Importing into HDFS
  • Importing into HBase

Hadoop Distributions

  • Apache Hadoop – The Core Distribution
  • CDH (Cloudera Distribution of Apache Hadoop)
  • Microsoft HDInsight

Course Length: 32 hours / 4 days (available in weekdays or weekends)

Course Fee: USD 2690.00

Objective

This course is designed for system administrators and others responsible for managing Apache Hadoop clusters in production or development environments.

Target Audience

This course is appropriate for system administrators who will be setting up or maintaining a Hadoop cluster.

Pre-requisite

Basic Linux system administration experience is a prerequisite for this training session. Prior knowledge of Hadoop is not required.

Course Outline

The Case for Apache Hadoop

  • A Brief History of Hadoop
  • Core Hadoop Components
  • Fundamental Concepts

The Hadoop Distributed File System

  • HDFS Features
  • HDFS Design Assumptions
  • Overview of HDFS Architecture
  • Writing and Reading Files
  • NameNode Considerations
  • An Overview of HDFS Security
  • Hands-On Exercise

MapReduce

  • What Is MapReduce?
  • Features of MapReduce
  • Basic MapReduce Concepts
  • Architectural Overview
  • MapReduce Version 2
  • Failure Recovery
  • Hands-On Exercise

An Overview of the Hadoop Ecosystem

  • What is the Hadoop Ecosystem?
  • Integration Tools
  • Analysis Tools
  • Data Storage and Retrieval Tools

Planning your Hadoop Cluster

  • General Planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes

Hadoop Installation

  • Deployment Types
  • Installing Hadoop
  • Using Cloudera Manager for Easy Installation
  • Basic Configuration Parameters
  • Hands-On Exercise

Advanced Configuration

  • Advanced Parameters
  • Configuring Rack Awareness
  • Configuring Federation
  • Configuring High Availability
  • Using Configuration

Management Tools Hadoop Security

  • Why Hadoop Security Is Important
  • Hadoop’s Security System Concepts
  • What Kerberos Is and How it Works
  • Configuring Kerberos Security
  • Integrating a Secure Cluster with Other Systems

Managing and Scheduling Jobs

  • Managing Running Jobs
  • Hands-On Exercise
  • The FIFO Scheduler
  • The FairScheduler
  • Configuring the FairScheduler
  • Hands-On Exercise

Cluster Maintenance

  • Checking HDFS Status
  • Hands-On Exercise
  • Copying Data Between Clusters
  • Adding and Removing

Cluster Nodes

  • Rebalancing the Cluster
  • Hands-On Exercise
  • NameNode Metadata Backup
  • Cluster Upgrading

Cluster Monitoring and Troubleshooting

  • General System Monitoring
  • Managing Hadoop’s Log Files
  • Using the NameNode and JobTracker Web UIs
  • Hands-On Exercise
  • Cluster Monitoring with Ganglia
  • Common Troubleshooting Issues
  • Benchmarking Your Cluster

Populating HDFS From External Sources

  • An Overview of Flume
  • Hands-On Exercise
  • An Overview of Sqoop
  • Best Practices for Importing Data

Installing and Managing Other Hadoop Projects

  • Hive
  • Pig
  • HBase

Course Length: 24 hours / 3 days (available in weekdays or weekends)

Course Fee: USD 2190.00

Objective

While Big data and Hadoop industry is fast growing, it is hard to identify & understand the right tools.This course is intended to cover the basics of various big data tools and vendors.

Target Audience

Decision-makers, Team Leads, Managers, Analysts or anyone who wants to know about the BigData stack available in market.

Course Outline

Course Length: 9 hours (available in 2 evenings or 1 day)

Course Fee: USD 800.00

  • Intro to BigData
  • Big Data Stack
  • Why traditional systems fail?
  • 3 Vs - Variety - Volume & Velocity
  • Intro to Hadoop
  • MapReduce
  • HDFS
  • Communicating the logic to Hadoop
  • What is native API
  • What is Streaming API
  • What are Eco-System tools
  • How to use non-java programming with Hadoop
  • Everything is from HDFS
  • HDFS Replacements and Support
  • Apache Pig
  • Transformation using Apache Pig
  • Apache Hive
  • Analysis using Apache Hive
  • Batch Processing vs Realtime
  • Realtime vs near Realtime
  • NOSQL Databases
  • Why RDBMS is not enough?
  • RDBMS vs NOSQL
  • Extracting and Loading Data
  • What is Sqoop?
  • When to use Sqoop
  • Owned vs acquired data
  • Collecting Unstructured data from non-RDBMS source
  • Why choose HBase?
  • Online Systems with HBase
  • Hive vs HBase
  • Compare PC with Laptop, not printer.
  • HBase vs HDFS
  • What is Casandra?
  • Cassandra Features
  • What is MongoDB
  • Cassandra vs HBase vs MongoDB
  • Apache Hadoop - the vanilla version.
  • Apache Hadoop vs CDH vs HDP
  • Hadoop - classic vs YARN
  • what is YARN?
  • Benefits of YARN?
  • Which version of Hadoop is good for you?
  • What is HDInsight
  • What is IBM Infosphere BigInsights
  • Performance Benefits

Highlights


TRAINERS

  • Industry Experts with 6+ years of experience in providing
  • Hadoop Training & Consultation
  • Leading Big Data Solutions Providers
  • Creators of Products InfraStudio, DataFlow Engine

CLASS

  • Well balanced hours to cover theory and exercises
  • Limited class members for better attention
  • Instructor led Hands-on Exercises to gain confidence.
  • Post training: Free cloud access for one month - you can continue work on your exercises and experiments.

ENVIRONMENT

W e do not just believing in giving you the best instructors in Industry and well form courses, we also believe in total experience throughout the classroom in order for you to focus on the training. We setup the classroom in a comfortable, relaxed environment at Lake Bellevue. We also provide Espresso, Tea & Refreshment drinks throughout the day to keep your mind fresh. You also have access to phone, internet, and admin services. We are located at downtown Bellevue: right next to I-405 for easy commute and access to all Bellevue’s finest restaurants and services.

CORPORATE TRAINING

T raining should not be one-size-fits-all. We work with you to understand your unique needs and develop a solution that's right for you. Our training programs are designed to be both flexible and scalable, providing you with an optimal and worry-free training solution, whether you’re training a local and/or a globally distributed audience. Train large or small group privately, efficiently, and cost effectively with our On-Site training program. You'll receive expert instruction and tailored curriculum delivered by our lead instructor at your location and/or live over the Internet – your choice.
Register Now ›