沉沙
2018-09-25
来源 :
阅读 2375
评论 0
摘要:本篇教程探讨了大数据技术 hadoop运行wordcount实例,希望阅读本篇文章以后大家有所收获,帮助大家对大数据技术的理解更加深入。
本篇教程探讨了大数据技术 hadoop运行wordcount实例,希望阅读本篇文章以后大家有所收获,帮助大家对大数据技术的理解更加深入。
<
1.查看hadoop版本
[hadoop@ltt1 sbin]$ hadoop version
Hadoop 2.6.0-cdh5.12.0
Subversion //github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd
Compiled by jenkins on 2017-06-29T11:33Z
Compiled with protoc 2.5.0
From source with checksum 7c45ae7a4592ce5af86bc4598c5b4
This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.0-cdh5.12.0.jar
2.通过hadoop自带的jar文件,可以简单测试一些功能。
查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表
[hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
3.在hdfs上创建文件夹
hadoop fs -mkdir /input
4.查看hdfs的更目录列表
[hadoop@ltt1 ~]$ hadoop fs -ls /Found 2 itemsdrwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /inputdrwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp
5.上传本地文件到hdfs
hadoop fs -put $HADOOP_HOME/*.txt /input
6.查看hdfs上input目录下文件
[hadoop@ltt1 ~]$ hadoop fs -ls /input
Found 3 items
-rw-r--r-- 2 hadoop supergroup 85063 2017-09-17 08:15 /input/LICENSE.txt
-rw-r--r-- 2 hadoop supergroup 14978 2017-09-17 08:15 /input/NOTICE.txt
-rw-r--r-- 2 hadoop supergroup 1366 2017-09-17 08:15 /input/README.txt
7.wordcount简单测试。
[hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount /input /output
17/09/17 08:19:12 INFO input.FileInputFormat: Total input paths to process : 3
17/09/17 08:19:13 INFO mapreduce.JobSubmitter: number of splits:3
17/09/17 08:19:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_0002
17/09/17 08:19:14 INFO impl.YarnClientImpl: Submitted application application_1505605169997_0002
17/09/17 08:19:14 INFO mapreduce.Job: The url to track the job: //ltt1.bg.cn:9180/proxy/application_1505605169997_0002/
17/09/17 08:19:14 INFO mapreduce.Job: Running job: job_1505605169997_0002
17/09/17 08:19:27 INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false
17/09/17 08:19:27 INFO mapreduce.Job: map 0% reduce 0%
17/09/17 08:19:39 INFO mapreduce.Job: map 33% reduce 0%
17/09/17 08:19:48 INFO mapreduce.Job: map 100% reduce 0%
17/09/17 08:19:50 INFO mapreduce.Job: map 100% reduce 100%
17/09/17 08:19:50 INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully
17/09/17 08:19:50 INFO mapreduce.Job: Counters: 50>> //www.cnblogs.com/tijun/ <<
File System Counters
FILE: Number of bytes read=42705
FILE: Number of bytes written=588235
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=101699
HDFS: Number of bytes written=30167
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=2
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=47617
Total time spent by all reduces in occupied slots (ms)=8244
Total time spent by all map tasks (ms)=47617
Total time spent by all reduce tasks (ms)=8244
Total vcore-milliseconds taken by all map tasks=47617
Total vcore-milliseconds taken by all reduce tasks=8244
Total megabyte-milliseconds taken by all map tasks=48759808
Total megabyte-milliseconds taken by all reduce tasks=8441856
Map-Reduce Framework
Map input records=2035
Map output records=14239
Map output bytes=155828
Map output materialized bytes=42717
Input split bytes=292
Combine input records=14239
Combine output records=2653
Reduce input groups=2402
Reduce shuffle bytes=42717
Reduce input records=2653
Reduce output records=2402
Spilled Records=5306
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=881
CPU time spent (ms)=22320
Physical memory (bytes) snapshot=690192384
Virtual memory (bytes) snapshot=10862809088
Total committed heap usage (bytes)=380243968
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=101407
File Output Format Counters
Bytes Written=30167
8.查看wordcount运行结果(由于结果太长,只举出了部分结果)
[hadoop@ltt1 ~]$ hadoop fs -cat /output/*
worldwide, 4
would 1
writing 2
writing, 4
written 19
xmlenc 1
year 1
you 12
your 5
zlib 1
252.227-7014(a)(1)) 1
§ 1
“AS 1
“Contributor 1
“Contributor” 1
“Covered 1
“Executable” 1
“Initial 1
“Larger 1
“Licensable” 1
“License” 1
“Modifications” 1
“Original 1
“Participant”) 1
“Patent 1
“Source 1
“Your”) 1
“You” 2
“commercial 3
“control” 1
>> //www.cnblogs.com/tijun/ <<
至此,通过一个wordcount的一个小栗子,简介实践了一下hdfs的创建文件夹,上传文件,查看目录,运行wordcount实例。
本文由职坐标整理发布,学习更多的大数据技术相关知识,请关注职坐标大技术云计算大技术技术频道!
喜欢 | 0
不喜欢 | 0
您输入的评论内容中包含违禁敏感词
我知道了

请输入正确的手机号码
请输入正确的验证码
您今天的短信下发次数太多了,明天再试试吧!
我们会在第一时间安排职业规划师联系您!
您也可以联系我们的职业规划师咨询:
版权所有 职坐标-一站式AI+学习就业服务平台 沪ICP备13042190号-4
上海海同信息科技有限公司 Copyright ©2015 www.zhizuobiao.com,All Rights Reserved.
沪公网安备 31011502005948号