首页 技术 正文
技术 2022年11月6日
0 收藏 716 点赞 559 浏览 3552 个字

接上篇《hadoop 2.7.3本地环境运行官方wordcount》。继续在本地模式下测试,本次使用hdfs.

2 本地模式使用fs计数wodcount

上面是直接使用的是linux的文件系统。现在使用hadoop fs。在本地模式下,hadoop fs其实也是使用的linux的fs。下面示例说明:

2.1 验证FS

cd /home/jungle/hadoop/hadoop-local
ls -l
total 116
drwxr-xr-x. 2 jungle jungle 4096 Jan 6 15:06 bin
drwxrwxr-x. 4 jungle jungle 31 Jan 6 16:53 dataLocal
drwxr-xr-x. 3 jungle jungle 19 Jan 6 14:56 etc
drwxr-xr-x. 2 jungle jungle 101 Jan 6 14:56 include
drwxr-xr-x. 3 jungle jungle 19 Jan 6 14:56 lib
drwxr-xr-x. 2 jungle jungle 4096 Jan 6 14:56 libexec
-rw-r--r--. 1 jungle jungle 84854 Jan 6 14:56 LICENSE.txt
-rw-r--r--. 1 jungle jungle 14978 Jan 6 14:56 NOTICE.txt
-rw-r--r--. 1 jungle jungle 1366 Jan 6 14:56 README.txt
drwxr-xr-x. 2 jungle jungle 4096 Jan 6 14:56 sbin
drwxr-xr-x. 4 jungle jungle 29 Jan 6 14:56 sharehadoop fs -ls /
Found 20 items
-rw-r--r-- 1 root root 0 2016-12-30 12:26 /1
dr-xr-xr-x - root root 45056 2016-12-30 13:06 /bin
dr-xr-xr-x - root root 4096 2016-12-29 20:09 /boot
drwxr-xr-x - root root 3120 2017-01-06 18:31 /dev
drwxr-xr-x - root root 8192 2017-01-06 18:32 /etc
drwxr-xr-x - root root 19 2016-11-05 23:38 /home
dr-xr-xr-x - root root 4096 2016-12-30 12:29 /lib
dr-xr-xr-x - root root 81920 2016-12-30 13:04 /lib64
drwxr-xr-x - root root 6 2016-11-05 23:38 /media
# ...# 等同 ls -l /home/jungle/hadoop/hadoop-local
hadoop fs -ls /home/jungle/hadoop/hadoop-local
Found 11 items
-rw-r--r-- 1 jungle jungle 84854 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/LICENSE.txt
-rw-r--r-- 1 jungle jungle 14978 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/NOTICE.txt
-rw-r--r-- 1 jungle jungle 1366 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/README.txt
drwxr-xr-x - jungle jungle 4096 2017-01-06 15:06 /home/jungle/hadoop/hadoop-local/bin
drwxrwxr-x - jungle jungle 31 2017-01-06 16:53 /home/jungle/hadoop/hadoop-local/dataLocal
drwxr-xr-x - jungle jungle 19 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/etc
drwxr-xr-x - jungle jungle 101 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/include
drwxr-xr-x - jungle jungle 19 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/lib
drwxr-xr-x - jungle jungle 4096 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/libexec
drwxr-xr-x - jungle jungle 4096 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/sbin
drwxr-xr-x - jungle jungle 29 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/share

从上面可以看出。hadoop fs -ls /home/jungle/hadoop/hadoop-local和linux的命令ls /home/jungle/hadoop/hadoop-local是等效的。

2.2 准备数据

下面基于上次实例的原始数据,将其copy到hdfs上。

hadoop fs -mkdir -p ./dataHdfs/input hadoop fs -ls .
Found 12 items
drwxrwxr-x - jungle jungle 18 2017-01-06 18:44 dataHdfs
drwxrwxr-x - jungle jungle 31 2017-01-06 16:53 dataLocal
# ...hadoop fs -ls ./dataHdfs/
Found 1 items
drwxrwxr-x - jungle jungle 6 2017-01-06 18:44 dataHdfs/inputhadoop fs -put
-put: Not enough arguments: expected 1 but got 0
Usage: hadoop fs [generic options] -put [-f] [-p] [-l] <localsrc> ... <dst># 将本地文件,put到hdfs上,实际效果等同于linux下的copy
hadoop fs -put dataLocal/input/ ./dataHdfs/
ls -l dataHdfs/
total 0
drwxrwxr-x. 2 jungle jungle 80 Jan 6 18:51 inputls -l dataHdfs/input/
total 8
-rw-r--r--. 1 jungle jungle 37 Jan 6 18:51 file1.txt
-rw-r--r--. 1 jungle jungle 70 Jan 6 18:51 file2.txthadoop fs -ls ./dataHdfs/
Found 1 items
drwxrwxr-x - jungle jungle 80 2017-01-06 18:51 dataHdfs/inputhadoop fs -ls ./dataHdfs/input/
Found 2 items
-rw-r--r-- 1 jungle jungle 37 2017-01-06 18:51 dataHdfs/input/file1.txt
-rw-r--r-- 1 jungle jungle 70 2017-01-06 18:51 dataHdfs/input/file2.txt

2.3 执行wordcount

hadoop jar /home/jungle/hadoop/hadoop-local/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount dataHdfs/input/ dataHdfs/output
# 这里的input, output目录,即可以理解成hdfs里的目录,也可以理解成linux里的目录。 cat dataHdfs/output/part-r-00000
I1
am1
bye2
great1
hadoop.3
hello3
is1
jungle.2
software1
the1
world.2md5sum dataLocal/outout/part-r-00000 dataHdfs/output/part-r-00000
68956fd01404e5fc79e8f84e148f19e8 dataLocal/outout/part-r-00000
68956fd01404e5fc79e8f84e148f19e8 dataHdfs/output/part-r-00000

可见与上篇中 dataLocal/下的结果是相同的。

相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,082
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,556
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,406
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,179
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:7,815
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:4,898