Hadoop

创新互联-专业网站定制、快速模板网站建设、高性价比上虞网站开发、企业建站全套包干低至880元,成熟完善的模板库,直接使用。一站式上虞网站制作公司更省心,省钱,快速模板网站建设找我们,业务覆盖上虞地区。费用合理售后完善,十年实体公司更值得信赖。
安装 Ubuntu环境
192.168.1.64    HNClient
192.168.1.65    HNName
SUSE,Ubuntu的vi不能使用退格键删除数据
删除的时候,要按ESC,再按X才能删除数据
插入数据,使用i
在当前行之下新开一行,使用o
在HNClient上操作
norman@HNClient:~$ sudo vi /etc/hostname
norman@HNClient:~$ HNClient
norman@HNClient:~$ sudo apt-get install openssh-server
norman@HNClient:~$ sudo vi /etc/hosts
192.168.1.64    HNClient
192.168.1.65    HNName
norman@HNClient:~$ ssh-keygen     (下面直接默认回车)
Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:rj3kM5OeqxceqGP6DcofXa+hZFReLQmKqksqoYL+YH4 norman@HNClient
The key's randomart image is:
+---[RSA 2048]----+
|        .        |
|     . . . o     |
|    . . . + .    |
|   .   o . .     |
|  .   ..S        |
|..   o.o+.       |
|+=  o.++o+.      |
|Xo.E+ +X+       |
|oo.=+  |
+----[SHA256]-----+
norman@HNClient:~$  ssh localhost (ssh localhost,还是需要密码认证)
norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
251 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:14:08 2018 from 192.168.1.65
norman@HNClient:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
norman@HNClient:~$  ssh localhost  (ssh localhost,不需要密码认证了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
251 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:18:02 2018 from 127.0.0.1
norman@HNClient:~$ ssh HNName (ssh HNName,还是需要密码认证)
norman@hnname's password:
norman@HNClient:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@HNName
norman@HNClient:~$ ssh HNName (ssh HNName,不需要密码就能登陆HNName了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
254 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:23:21 2018 from 192.168.1.64
norman@HNName:~$
在HNName上操作
norman@HNName:~$ sudo vi /etc/hosts
192.168.1.64    HNClient
192.168.1.65    HNName
norman@HNName:~$ ssh-keygen               (下面直接默认回车)
Generating public/private rsa key pair.
Enter file in which to save the key (/home/norman/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/norman/.ssh/id_rsa.
Your public key has been saved in /home/norman/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:YXrPGdhKYkPsAroDlIZJ4sYdbrpHyvaMQccMV3GJn9I norman@HNName
The key's randomart image is:
+---[RSA 2048]----+
|.. . oo..        |
|.+ oo..         |
|oO.=  = +        |
|+.B. + E +       |
|oo =. B S o      |
|+.=  o = + o     |
|o .    . +      |
|..*              |
| . o             |
+----[SHA256]-----+
norman@HNClient:~$ ssh localhost          (ssh localhost,还是需要密码认证)
norman@localhost's password:
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
251 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 22:55:29 2018 from 127.0.0.1
norman@HNName:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
norman@HNName:~$ ssh localhost  (ssh localhost,不需要密码认证了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
254 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:00:28 2018 from 127.0.0.1
norman@HNName:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub norman@hnclient
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/norman/.ssh/id_rsa.pub"
The authenticity of host 'hnclient (192.168.1.64)' can't be established.
ECDSA key fingerprint is SHA256:w5dwBrXor00JfFtpGXc0G/+deJJwmAxKmjXE32InhgA.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
norman@hnclient's password:
Number of key(s) added: 1
Now try logging into the machine, with:   "ssh 'norman@hnclient'"
and check to make sure that only the key(s) you wanted were added.
norman@HNName:~$ ssh hnclient  (ssh hnclient,不需要密码就能登陆hnclient了)
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-36-generic i686)
251 packages can be updated.
79 updates are security updates.
New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Oct 31 23:05:13 2018 from 192.168.1.58
norman@HNClient:~$ exit
norman@HNName:~$ sudo apt-get install openjdk-7-jdk
[sudo] password for norman:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package openjdk-7-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'openjdk-7-jdk' has no installation candidate
是因为Ubuntu16.04的安装源已经默认没有openjdk7了,所以要自己手动添加仓库,如下:
norman@HNName:~$ sudo add-apt-repository ppa:openjdk-r/ppa (添加oracle openjdk ppa source)( add-apt-repository ppa: xxx/ppa 这句话的意思是获取最新的个人软件包档案源,将其添加至当前apt库中,并自动导入公钥。)
norman@HNName:~$ sudo apt-get update
norman@HNName:~$ sudo apt-get install openjdk-7-jdk
norman@HNName:~$ java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-3)
OpenJDK Client VM (build 24.95-b01, mixed mode, sharing)
norman@HNName:~$ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0-bin.tar.gz
norman@HNName:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNName:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNName:~$  dir /usr/local/hadoop
bin          hadoop-ant-1.2.0.jar          hadoop-tools-1.2.0.jar  NOTICE.txt
build.xml    hadoop-client-1.2.0.jar       ivy                     README.txt
c++          hadoop-core-1.2.0.jar         ivy.xml                 sbin
CHANGES.txt  hadoop-examples-1.2.0.jar     lib                     share
conf         hadoop-minicluster-1.2.0.jar  libexec                 src
contrib      hadoop-test-1.2.0.jar         LICENSE.txt             webapps
norman@HNName:~$ sudo vi $HOME/.bashrc           (末尾添加以下)
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
norman@HNName:~$ exec bash
norman@HNName:~$ $PATH
norman@HNName:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh
( The java implementation to use.  Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386
( Extra Java runtime options.  Empty by default.     设置禁用IPv6)
export HADOOP_OPTS=-Djava.net.preferIP4Stack=true
Installing Apache Hadoop (Single Node)
norman@HNName:~$ sudo vi /usr/local/hadoop/conf/core-site.xml
norman@HNName:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml
norman@HNName:~$ sudo mkdir /usr/local/hadoop/tmp
norman@HNName:~$ sudo chown norman /usr/local/hadoop/tmp
norman@HNName:~$ hadoop namenode -format     (能看到以下说明成功)
18/11/01 19:07:36 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
norman@HNName:~$ hadoop-daemons.sh start namenode  (出以下错误)
localhost: mkdir: cannot create directory ?usr/local/hadoop/libexec/../logs? Permission denied
localhost: chown: cannot access '/usr/local/hadoop/libexec/../logs': No such file or directory
localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 137: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: head: cannot open '/usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out' for reading: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 147: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
localhost: /usr/local/hadoop/bin/hadoop-daemon.sh: line 148: /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out: No such file or directory
norman@HNName:~$ ll /usr/local
total 44
drwxr-xr-x 11 root root 4096 Nov  1 02:02 ./
drwxr-xr-x 11 root root 4096 Feb 28  2018 ../
drwxr-xr-x  2 root root 4096 Feb 28  2018 bin/
drwxr-xr-x  2 root root 4096 Feb 28  2018 etc/
drwxr-xr-x  2 root root 4096 Feb 28  2018 games/
drwxr-xr-x 15 root root 4096 Nov  1 20:05 hadoop/
drwxr-xr-x  2 root root 4096 Feb 28  2018 include/
drwxr-xr-x  4 root root 4096 Feb 28  2018 lib/
lrwxrwxrwx  1 root root    9 Jul 26 23:29 man -> share/man/
drwxr-xr-x  2 root root 4096 Feb 28  2018 sbin/
drwxr-xr-x  8 root root 4096 Feb 28  2018 share/
drwxr-xr-x  2 root root 4096 Feb 28  2018 src/
norman@HNName:~$ sudo chown norman /usr/local/hadoop
norman@HNName:~$ ll /usr/local
total 44
drwxr-xr-x 11 root   root 4096 Nov  1 02:02 ./
drwxr-xr-x 11 root   root 4096 Feb 28  2018 ../
drwxr-xr-x  2 root   root 4096 Feb 28  2018 bin/
drwxr-xr-x  2 root   root 4096 Feb 28  2018 etc/
drwxr-xr-x  2 root   root 4096 Feb 28  2018 games/
drwxr-xr-x 15 norman root 4096 Nov  1 20:05 hadoop/
drwxr-xr-x  2 root   root 4096 Feb 28  2018 include/
drwxr-xr-x  4 root   root 4096 Feb 28  2018 lib/
lrwxrwxrwx  1 root   root    9 Jul 26 23:29 man -> share/man/
drwxr-xr-x  2 root   root 4096 Feb 28  2018 sbin/
drwxr-xr-x  8 root   root 4096 Feb 28  2018 share/
drwxr-xr-x  2 root   root 4096 Feb 28  2018 src/
norman@HNName:~$ hadoop-daemons.sh start namenode
localhost: starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-norman-namenode-HNName.out
norman@HNName:~$ start-all.sh
norman@HNName:~$ jps
23297 DataNode
23610 TaskTracker
23484 JobTracker
23739 Jps
23102 NameNode
23416 SecondaryNameNode
norman@HNName:~$ dir /usr/local/hadoop/bin
hadoop            hadoop-daemon.sh   rcc        start-all.sh       start-dfs.sh               start-mapred.sh  stop-balancer.sh  stop-jobhistoryserver.sh  task-controller
hadoop-config.sh  hadoop-daemons.sh  slaves.sh  start-balancer.sh  start-jobhistoryserver.sh  stop-all.sh      stop-dfs.sh       stop-mapred.sh
http://192.168.1.65:50070/dfshealth.jsp
http://192.168.1.65:50030/jobtracker.jsp
http://192.168.1.65:50060/tasktracker.jsp
Managing HDFS
http://www.gutenberg.org/files/2600/2600-0.txt  (下载文本文件)
复制网页内容到war_and_peace.txt
https://www.ncdc.noaa.gov/orders/qclcd/ (下载任意数据)
QCLCD201701.zip,QCLCD201702.zip,然后解压出201701hourly.txt, 201702hourly.txt
在HNClient上操作
将数据 war_and_peace.txt 放到 /home/norman/data/book
将数据201701hourly.txt,201702hourly.txt放到 /home/norman/data/weather
norman@HNClient:~$ sudo mkdir -p /home/norman/data/book
norman@HNClient:~$ sudo mkdir -p /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/weather
norman@HNClient:~$ sudo chown norman /home/norman/data/book
norman@HNClient:~$ sudo add-apt-repository ppa:openjdk-r/ppa
norman@HNClient:~$ sudo apt-get update
norman@HNClient:~$ sudo apt-get install openjdk-7-jdk
norman@HNClient:~$ java -version
norman@HNClient:~$ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-1.2.0-bin.tar.gz
norman@HNClient:~$ tar -zxvf hadoop-1.2.0-bin.tar.gz
norman@HNClient:~$ sudo cp -r hadoop-1.2.0 /usr/local/hadoop
norman@HNClient:~$ sudo vi $HOME/.bashrc           (末尾添加以下)
export HADOOP_PREFIX=/usr/local/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
norman@HNClient:~$ exec bash
norman@HNClient:~$ $PATH
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/hadoop-env.sh
(The java implementation to use.  Required.)
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386
( Extra Java runtime options.  Empty by default.     设置禁用IPv6)
export HADOOP_OPTS=-Djava.net.preferIP4Stack=true
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/core-site.xml
norman@HNClient:~$ sudo vi /usr/local/hadoop/conf/mapred-site.xml
norman@HNClient:~$ hadoop fs -mkdir test
norman@HNClient:~$ hadoop fs -ls
Found 1 items
drwxr-xr-x   - norman supergroup          0 2018-11-02 01:17 /user/norman/test
norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/small
norman@HNClient:~$ hadoop fs -mkdir hdfs://hnname:10001/data/big
网页打开http://192.168.1.65:50070
http://192.168.1.65:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/
norman@HNClient:~$ hadoop fs -rmr test             (测试删除)
Deleted hdfs://HNName:10001/user/norman/test
norman@HNClient:~$ hadoop fs -moveFromLocal /home/norman/data/book/war_and_peace.txt hdfs://hnname:10001/data/small/war_and_peace.txt
可以看到以下数据
norman@HNClient:~$ hadoop fs -copyToLocal hdfs://hnname:10001/data/small/war_and_peace.txt /home/norman/data/book/war_and_peace.bak.txt (测试复制到本地)
norman@HNClient:~$ hadoop fs -put /home/norman/data/weather hdfs://hnname:10001/data/big
可以看到以下数据
norman@HNClient:~$ hadoop dfsadmin -report
Configured Capacity: 19033165824 (17.73 GB)
Present Capacity: 13114503168 (12.21 GB)
DFS Remaining: 12005150720 (11.18 GB)
DFS Used: 1109352448 (1.03 GB)
DFS Used%: 8.46%
Under replicated blocks: 19
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 1 (1 total, 0 dead)
Name: 192.168.1.65:50010
Decommission Status : Normal
Configured Capacity: 19033165824 (17.73 GB)
DFS Used: 1109352448 (1.03 GB)
Non DFS Used: 5918662656 (5.51 GB)
DFS Remaining: 12005150720(11.18 GB)
DFS Used%: 5.83%
DFS Remaining%: 63.07%
Last contact: Fri Nov 02 01:49:43 GMT-08:00 2018
norman@HNClient:~$ hadoop dfsadmin -safemode enter    (upgrade的时候,需要用到safemode)
Safe mode is ON
norman@HNClient:~$ hadoop dfsadmin -safemode leave
Safe mode is OFF
在HNName上操作
norman@HNName:~$ hadoop fsck -blocks
Status: HEALTHY
Total size:    1100586452 B
Total dirs:    13
Total files:   4
Total blocks (validated):      19 (avg. block size 57925602 B)
Minimally replicated blocks:   19 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       19 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.0
Corrupt blocks:                0
Missing replicas:              38 (200.0 %)
Number of data-nodes:          1
Number of racks:               1
FSCK ended at Fri Nov 02 01:54:46 GMT-08:00 2018 in 1049 milliseconds
The filesystem under path '/' is HEALTHY
norman@HNName:~$ hadoop fsck /data/big
Status: HEALTHY
Total size:    1097339705 B
Total dirs:    2
Total files:   2
Total blocks (validated):      17 (avg. block size 64549394 B)
Minimally replicated blocks:   17 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       17 (100.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     1.0
Corrupt blocks:                0
Missing replicas:              34 (200.0 %)
Number of data-nodes:          1
Number of racks:               1
FSCK ended at Fri Nov 02 19:33:55 GMT-08:00 2018 in 14 milliseconds
The filesystem under path '/data/big' is HEALTHY