对developer和Administrator来说,Hadoop都有怎样的要求 | 张恒镭的博客

对developer和Administrator来说,Hadoop都有怎样的要求

时间:13-11-27 栏目:hadoop 作者:恒镭, 张 评论:0 点击: 2,629 次

目前对于Hadoop 来说,两类人才使用的最多.一个是developer开发者,一个是Administrator 系统管理员.他们肯定都有不通的需求和要求.

cloudera 官方的培训说明来看一下 对这两类人才都有神马的不同的标准.

1 Administraor

Training for Apache Hadoop(

Configuring, Deploying, and Maintaining a Hadoop Cluster) 配置,部署,维护集群是hadoop管理员的主要职责。

Audience & Prerequisites

This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.

前提要求基本的linux经验和hadoop 经验。

培训的目标是:

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • The internals of MapReduce and HDFS and how to build Hadoop architecture
  • Proper cluster configuration and deployment to integrate with systems and hardware in the data center
  • How to load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop
  • Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster
  • Installing and implementing Kerberos-based security for your cluster
  • Best practices for preparing and maintaining Apache Hadoop in production
  • Troubleshooting, diagnosing, tuning, and solving Hadoop issues。

下面是Cloudera 课程的基本内容

Introduction
The Case for Apache Hadoop
> Why Hadoop?
> A Brief History of Hadoop
> Core Hadoop Components
> Fundamental Concepts
HDFS
> HDFS Features
> Writing and Reading Files
> NameNode Considerations
> Overview of HDFS Security
> Using the Namenode Web UI
> Using the Hadoop File Shell
Getting Data into HDFS
> Ingesting Data from External Sources with
Flume
> Ingesting Data from Relational
> Databases with Sqoop
> REST Interfaces
> Best Practices for Importing Data
MapReduce
> What Is MapReduce?
> Features of MapReduce
> Basic Concepts
> Architectural Overview
> MapReduce Version 2
> Failure Recovery
> Using the JobTracker Web UI
Planning Your Hadoop Cluster
> General Planning Considerations
> Choosing the Right Hardware
> Network Considerations
> Configuring Nodes
> Planning for Cluster Management
Hadoop Installation and Initial
Configuration
> Deployment Types
> Installing Hadoop
> Specifying the Hadoop Configuration
> Performing Initial HDFS Configuration
> Performing Initial MapReduce Configuration
> Log File Locations
Installing and Configuring Hive, Impala,
and Pig
> Hive
> Impala
> Pig
Hadoop Clients
> What is a Hadoop Client?
> Installing and Configuring Hadoop Clients
> Installing and Configuring Hue
> Hue Authentication and Configuration
Cloudera Manager
> The Motivation for Cloudera Manager
> Cloudera Manager Features
> Standard and Enterprise Versions
> Cloudera Manager Topology
> Installing Cloudera Manager
> Installing Hadoop Using Cloudera Manager
> Performing Basic Administration Tasks
> Using Cloudera Manager
Advanced Cluster Configuration
> Advanced Configuration Parameters
> Configuring Hadoop Ports
> Explicitly Including and Excluding Hosts
> Configuring HDFS for Rack Awareness
> Configuring HDFS High Availability
Hadoop Security
> Why Hadoop Security Is Important
> Hadoop’s Security System Concepts
> What Kerberos Is and How it Works
> Securing a Hadoop Cluster with Kerberos
Managing and Scheduling Jobs
> Managing Running Jobs
> Scheduling Hadoop Jobs
> Configuring the FairScheduler
Cluster Maintenance
> Checking HDFS Status
> Copying Data Between Clusters
> Adding and Removing Cluster Nodes
> Rebalancing the Cluster
> NameNode Metadata Backup
> Cluster Upgrading
Cluster Monitoring and Troubleshooting
> General System Monitoring
> Managing Hadoop’s Log Files
> Monitoring Hadoop Clusters
> Common Troubleshooting Issues
Conclusion

2 developer

Cloudera Developer Training for Apache Hadoop

(Building Powerful Data Applications with MapReduce) 编写强悍的MapReduce数据处理程序。

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • The internals of MapReduce and HDFS and how to write MapReduce code
  • Best practices for Hadoop development, debugging, and implementation of workflows and common algorithms
  • How to leverage Hive, Pig, Sqoop, Flume, Oozie, Mahout, and other Hadoop ecosystem projects
  • Optimal hardware configurations and network considerations for integrating a Hadoop cluster with the data center
  • Writing and executing joins to link data sets in MapReduce
  • Advanced Hadoop API topics required for real-world data analysis

 

开发者的课程框架:

Introduction
The Motivation for Hadoop
> Problems with Traditional
Large-Scale Systems
> Requirements for a New Approach
>  Introducing  Hadoop
Hadoop: Basic Concepts
>  The Hadoop Project and
Hadoop Components
> The Hadoop Distributed File System
> Hands-On Exercise: Using HDFS
> How MapReduce Works
>  Hands-On  Exercise:
Running a MapReduce Job
> How a Hadoop Cluster Operates
> Other Hadoop Ecosystem Projects
Writing a MapReduce Program
> The MapReduce Flow
> Basic MapReduce API Concepts
>  Writing MapReduce Drivers, Mappers and
Reducers in Java
>  Writing Mappers and Reducers in Other
Languages Using the Streaming API
>  Speeding Up Hadoop Development by Using
Eclipse
>  Hands-On Exercise: Writing a MapReduce
Program
> Differences Between the Old and New
MapReduce APIs
Unit Testing MapReduce Programs
> Unit Testing
>  The JUnit and MRUnit Testing Frameworks
> Writing Unit Tests with MRUnit
>  Hands-On Exercise: Writing Unit Tests with
the MRUnit Framework
Delving Deeper into the Hadoop API
> Using the ToolRunner Class
> Decreasing the Amount of Intermediate
Data with Combiners
> Hands-On Exercise: Writing and Implementing a Combiner
> Setting Up and Tearing Down Mappers and
Reducers by Using the Configure and Close
Methods
>  Writing Custom Partitioners for Better
Load Balancing
>  Hands-On Exercise: Writing a Partitioner
> Accessing HDFS Programmatically
> Using The Distributed Cache
>  Using the Hadoop API’s Library of Mappers,
Reducers and Partitioners
Practical Development Tips
and Techniques
> Strategies for Debugging
MapReduce Code
>  Testing MapReduce Code Locally
by Using LocalJobReducer
> Writing and Viewing Log Files
>  Retrieving Job Information
with Counters
>  Determining the Optimal Number
of Reducers for a Job
>  Creating Map-Only MapReduce Jobs
>  Hands-On Exercise: Using Counters and a
Map-Only Job
Data Input and Output
>  Creating Custom Writable and WritableComparable Implementations
> Saving Binary Data Using SequenceFile and
Avro Data Files
>  Implementing Custom Input Formats and
Output Formats
>  Issues to Consider When Using File
Compression
>  Hands-On Exercise: Using SequenceFiles
and File Compression
Common MapReduce Algorithms
> Sorting and Searching Large Data Sets
> Performing a Secondary Sort
> Indexing Data
>  Hands-On Exercise: Creating an
Inverted Index
>  Computing Term Frequency — Inverse
Document Frequency
> Calculating Word Co-Occurrence
>  Hands-On Exercise: Calculating Word
Co-Occurrence
> Hands-On Exercise: Implementing Word
Co-Occurrence with a Customer WritableComparable
Joining Data Sets in MapReduce Jobs
> Writing a Map-Side Join
> Writing a Reduce-Side Join
Integrating Hadoop into the
Enterprise Workflow
>  Integrating Hadoop into an
Existing Enterprise
>  Loading Data from an RDBMS into HDFS by
Using Sqoop
>  Hands-On Exercise: Importing Data
with Sqoop
>  Managing Real-Time Data Using Flume
>  Accessing HDFS from Legacy
Systems with FuseDFS and HttpFS
Machine Learning and Mahout
>  Introduction to Machine Learning
>  Using  Mahout
>  Hands-On Exercise: Using a
Mahout Recommender
An Introduction to Hive and Pig
> The Motivation for Hive and Pig
>  Hive  Basics
>  Hands-On Exercise: Manipulating Data
with Hive
>  Pig  Basics
>  Hands-On Exercise: Using Pig to Retrieve
Movie Names from Our Recommender
>  Choosing  Between Hive and Pig
An Introduction to Oozie
> Introduction to Oozie
> Creating Oozie Workflows
>  Hands-On Exercise: Running
an Oozie Workflow
Conclusion
Appendix: Graph Processing in
MapReduce

 

官方的认证需求:

Cloudera Certified Developer for Apache Hadoop (CCDH)
Establish yourself as a trusted and valuable resource by completing the certification exam for Apache Hadoop
developers. CCDH certifies your technical knowledge, skill, and ability to write, maintain, and optimize Apache
Hadoop development projects. The exam can be demanding and will test your fluency with concepts and
terminology in the following areas:
Core Hadoop Concepts
Recognize and identify Apache Hadoop daemons and how they function both in data storage and processing. Understand how Apache
Hadoop exploits data locality. Determine the challenges to large-scale computational models and how distributed systems attempt to
overcome various challenges posed by the scenario.
Storing Files in Hadoop
Analyze the benefits and challenges of the HDFS architecture, including how HDFS implements file sizes, block sizes, and block
abstraction. Understand default replication values and storage requirements for replication. Determine how HDFS stores, reads, and
writes files.
Job Configuration and Submission
Construct proper job configuration parameters, including using JobConf and appropriate properties. Identify the correct procedures for
MapReduce job submission.
Job Execution Environment
Determine the lifecycle of a Mapper and the lifecycle of a Reducer in a MapReduce job. Understand the key fault tolerance principles
at work in a MapReduce job. Identify the role of Apache Hadoop Classes, Interfaces, and Methods. Understand how speculative
execution exploits differences in machine configurations and capabilities in a parallel environment and how and when it runs.
Job Lifecycle
Analyze the order of operations in a MapReduce job, how data moves from place to place, how partitioners and combiners
function, and the sort and shuffle process.
Data Processing
Analyze and determine the relationship of input keys to output keys in terms of both type and number, the sorting of keys, and the
sorting of values. Identify the number, type, and value of emitted keys and values from the Mappers as well as the emitted data from
each Reducer and the number and contents of the output file.
Key and Value Types
Analyze and determine which of Hadoop’s data types for keys and values are appropriate for a job. Understand common key and value
types in the MapReduce framework and the interfaces they implement.
Common Algorithms and Design Patterns
Evaluate whether an algorithm is well-suited for expression in MapReduce. Understand implementation, limitations, and strategies
for joining datasets in MapReduce. Analyze the role of DistributedCache and Counters.
The Hadoop Ecosystem
Analyze a workflow scenario and determine how and when to leverage ecosystems projects, including Apache Hive, Pig, Sqoop, and
Oozie. Understand how Hadoop Streaming might apply to a job workflow。

声明: 本文由( 恒镭, 张 )原创编译,转载请保留链接: 对developer和Administrator来说,Hadoop都有怎样的要求

对developer和Administrator来说,Hadoop都有怎样的要求:等您坐沙发呢!

发表评论




------====== 本站公告 ======------
欢迎关注我的博客。

其他