Big Data Programming

To stay competitive a business needs to know as much as it can about people, the environment it's operating in, and who and where the competitors are. The amount of data companies collect keeps growing. There is an urgent need of a strategy to make sense of it all. Star Big Data Programming is a certification course that will help learners master the skills they need to establish a successful career as a data engineer. The program will help the learners master the skills on HDFS, MapReduce, HBase, Hive, Pig, Yarn, Oozie, Flume and Sqoop using real-time use cases from retail, social media, aviation, tourism, and finance industries. It equips the learners with in-depth knowledge of writing code using the MapReduce framework and managing large data sets with HBase.

Audience

Intermediate

Big Data Programming Course Objectives

In this course, you will learn about:

Big data and its business applications
Apache Hadoop and its big data eco-system
Deploying Hadoop in a clustered environment
Interacting with No-SQL databases
Managing key Hadoop components (HDFS, YARN and Hive)
Spark - the next-generation computational framework
Installing and working with Hadoop
Hadoop related technologies – Avro, Flume, Sqoop, Pig, Oozie, etc
Advanced topics like Hadoop security, Cloudera, IBM InfoSphere and more

Course Outcome

After competing this course, you will be able to:

Understand the finer nuances of the Big Data technology
Deal with Big Data related tools, platforms, and their architecture to store, program, process, and manage the data
Deploy Hadoop and its related technologies
Use the Hadoop ecosystem to manage your data
Deploy machine learning concepts with Mahout

Table Of Contents Outline

Introducing Data and Big Data
Identifying the Business Applications of Big Data
Big Data and Hadoop
HDFS - Storing Data in Hadoop
Introduction to MapReduce
YARN and MapReduce - Processing Data in Hadoop
Developing a First Application for MapReduce
Exploring the Working of a MapReduce Process
Avro
Parquet
Flume - Service for Streaming Event Data
Sqoop (MySQL to Hadoop)
Apache Pig
Hive – Data Warehouse
Oozie– Workflow Scheduler
Exploring Crunch - Joining and Data Integration
Exploring Spark and Scala
Exploring HBase - Big Data Store
Zookeeper - Coordination Service for Distributed Applications
Exploring Storm
Machine Learning with Mahout
Interacting with NoSQL Databases
Hadoop and Security
Apache Drill and Google BigQuery
Exploring Cloudera
Exploring Hortonworks
HDInsight
IBM Infosphere
Hadoop and AWS
Appendix- Exploring Pivotal HD Case Studies

Labs

Chapter 1. Setting up the required environment for Apache Hadoop installation

Chapter 2. Installing the Single-Node Hadoop configuration on the system

Chapter 3. Exploring the Web-Based User Interface of Hadoop Cluster

Chapter 4. Implementing Map-Reduce Program for Word Count

Chapter 5. Implementing Basic Pig Latin Script

Chapter 6. Implementing Basic Hive Query Language Operations

Chapter 7. Using Apache Flume to fetch open-source user tweets from Twitter

Exam Details

Exam Codes	Big Data Programming S07-116 (Academy customers use the same codes)
Launch Date	Jul 01 2017
Number of Questions	75
Type of Questions	MULTIPLE CHOICE
Length of Test	150 Minutes
Passing Score	70%
Recommended Experience	Any Graduate professionals with knowledge in Java programming background are eligible for learning Big Data Hadoop Training. A basic knowledge of any programming language like Java, C or Python and Linux is always an added advantage and also strong knowledge on Concepts of OOPs.
Languages	English