hadoop翻译文档之二-单节点安装(cdh3u6) | 张恒镭的博客

hadoop翻译文档之二-单节点安装(cdh3u6)

时间:13-12-03 栏目:hadoop 作者:恒镭, 张 评论:0 点击: 2,609 次

hadoop翻译文档之二-

本文档介绍了如何设置和配置单节点Hadoop的安装,以便使用Hadoop MapReduce和Hadoop分布式文件系统(HDFS)快速执行简单的操作。

Single Node Setup


Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

 

目的:

本文档介绍了如何设置和配置单节点Hadoop的安装,以便使用Hadoop MapReduce和Hadoop分布式文件系统(HDFS)快速执行简单的操作。


Prerequisites


Supported Platforms
  • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
  • Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

先决条件
支持的平台
GNU / Linux的支持作为开发和生产平台。 Hadoop的已被证明在GNU / Linux集群有2000个节点。
Win32的支持作为开发平台。分布式操作尚未充分在Win32上进行测试,所以它不回合作为一个生产平台。

 


Required Software

Required software for Linux and Windows include:

  1. JavaTM 1.6.x, preferably from Sun, must be installed.
  2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Additional requirements for Windows include:

  1. Cygwin - Required for shell support in addition to the required software above.

 

所需软件
对于Linux和Windows所需软件包括:
1、JavaTM1.6.x版,最好是SUN公司的,而且必须安装。
2、SSH必须安装和sshd必须运行使用该管理远程Hadoop守护进程的Hadoop的脚本。
附加要求适用于Windows,包括:
Cygwin的- 除了上面所需要的软件,还需要支持shell。

 


Installing Software

If your cluster doesn't have the requisite software you will need to install it.

For example on Ubuntu Linux:

$ sudo apt-get install ssh
$ sudo apt-get install rsync

 

安装软件

对unbutu来说 需要安装ssh和rsync

$ sudo apt-get install ssh
$ sudo apt-get install rsync

 

On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:

  • openssh - the Net category

在Windows中,如果您在安装了Cygwin没有安装所需的软件,你需要启动Cygwin安装程序,并选择软件包:执行openssh - the Net category。

 


Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.下载一个文档的发行版。

 


Prepare to Start the Hadoop Cluster 开始安装hadoop集群

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

解压下载的HADOOP集群软件,打开文件目录,编辑文件conf/hadoop-env.sh,至少修改JAVA_HOME路径,设置为JAVA JDK的根路径。

Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.

执行如下命令:$ bin/hadoop。 这会展示hadoop脚本的使用文档。

Now you are ready to start your Hadoop cluster in one of the three supported modes:

  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode

在你可以启动你的HADOOP集群了,支持三种模式。

  • 本地模式
  • 伪分布模式
  • 完全分布模式


Standalone Operation 本机模式

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging. 默认方式,HADOOP被配置为非分布模式,作为一个单独的JAVA程式,单机模式适合调试。

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

下面的例子拷贝一个被解压的conf文件夹作为input,查找和显示给定的正则式的匹配。output被用于输出的文件夹。

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*


Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

伪分布式模式

HADOOP能运行于单节点于伪分布模式,每个HADOOP的坚守进程运行于分开的JAVA进程。

配置如下:


Configuration

Use the following:
conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

conf/hdfs-site.xml: 主要副本数设置为1

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands: 配置ssh
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa      这行命令-p设置密码为空,当然你也可以设置密码
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

格式化分布式系统

执行hadoop守候程序

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

HADOOP的守护进程日志输出被写到HADOOP_LOG_DIR目录,默认是HADOOP目录下的log目录。

打开浏览器查看:NameNode and the JobTracker

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

下面是两种方式查看结果。

Examine the output files:get出来到本地

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

or

View the output files on the distributed filesystem: 直接用cat命令。
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with: 结束。
$ bin/stop-all.sh

Fully-Distributed Operation完全分布式 安装参考下文集群安装。

For information on setting up fully-distributed, non-trivial clusters see Cluster Setup.

Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

声明: 本文由( 恒镭, 张 )原创编译,转载请保留链接: hadoop翻译文档之二-单节点安装(cdh3u6)

hadoop翻译文档之二-单节点安装(cdh3u6):等您坐沙发呢!

发表评论




------====== 本站公告 ======------
欢迎关注我的博客。

其他