大数据技术学习（3）YARN集群部署-职坐标

大数据技术学习（3）YARN集群部署

沉沙 2018-10-08 来源：阅读 1621 评论 0

摘要：本篇教程探讨了大数据技术学习（3）YARN集群部署，希望阅读本篇文章以后大家有所收获，帮助大家对大数据技术的理解更加深入。

本篇教程探讨了大数据技术学习（3）YARN集群部署，希望阅读本篇文章以后大家有所收获，帮助大家对大数据技术的理解更加深入。

一　　概述
YARN是一个资源管理、任务调度的框架，采用master/slave架构，主要包含三大模块：ResourceManager（RM）、NodeManager（NM）、ApplicationMaster（AM）。
>ResourceManager负责所有资源的监控、分配和管理，运行在主节点；
>NodeManager负责每一个节点的维护，运行在从节点；
>ApplicationMaster负责每一个具体应用程序的调度和协调，只有在有任务正在执行时存在。
对于所有的applications，RM拥有绝对的控制权和对资源的分配权。而每个AM则会和RM协商资源，同时和NodeManager通信来执行和监控task。
二　　运行流程
1‘　　client向RM提交应用程序，其中包括启动该应用的ApplicationMaster的必须信息，例如ApplicationMaster程序、启动ApplicationMaster的命令、用户程序等。
2’　　ResourceManager启动一个container用于运行ApplicationMaster。
3‘　　启动中的ApplicationMaster向ResourceManager注册自己，启动成功后与RM保持心跳。
4’　　ApplicationMaster向ResourceManager发送请求，申请相应数目的container。
5‘　　ResourceManager返回ApplicationMaster的申请的containers信息。申请成功的container，由ApplicationMaster进行初始化。container的启动信息初始化后，AM与对应的NodeManager通信，要求NM启动container。AM与NM保持心跳，从而对NM上运行的任务进行监控和管理。
6’　　container运行期间，ApplicationMaster对container进行监控。container通过RPC协议向对应的AM汇报自己的进度和状态等信息。
7‘　　应用运行期间，client直接与AM通信获取应用的状态、进度更新等信息。
8’　　应用运行结束后，ApplicationMaster向ResourceManager注销自己，并允许属于它的container被收回。
三　　管理YARN集群
1‘　　配置YARN集群
　　  >切换到master服务器上，前提是HDFS结点已经启动，方法见上一篇文章>>　//www.cnblogs.com/1996swg/p/7286136.html
　　  >指定YARN主节点，编辑文件“/usr/cstor/hadoop/etc/hadoop/yarn-site.xml”，将如下内容嵌入此文件里configuration标签间：
yarn.resourcemanager.hostnamemaster
yarn.nodemanager.aux-servicesmapreduce_shuffle
　　 yarn-site.xml是YARN守护进程的配置文件。第一句配置了ResourceManager的主机名，第二句配置了节点管理器运行的附加服务为mapreduce_shuffle，只有这样才可以运行MapReduce程序。
　　　
　　　>将配置好的YARN配置文件拷贝至slaveX、client
　　　　命令如下：　查看子集　cat  ~/data/4/machines
　　　　　　　　　　拷贝到子集　for  x  in  `cat ~/data/4/machines` ; do  echo  $x ; scp  /usr/cstor/hadoop/etc/hadoop/yarn-site.xml  $x:/usr/cstor/hadoop/etc/hadoop/  ; done;
　　　>确认已配置slaves文件，在master机器上查看;
　　　>统一启动YARN，命令   /usr/cstor/hadoop/sbin/start-yarn.sh   如图所示
　　　　
　　>验证用  jps  命令，在其余子集上同时验证，如图所示验证成功
　　　　
2’　　在client机上提交DistributedShell任务
　　　　　　distributedshell，可以看做YARN编程中的“hello world”，主要功能是并行执行用户提供的shell命令或者shell脚本。
　　　　　　-jar指定了包含ApplicationMaster的jar文件，-shell_command指定了需要被ApplicationMaster执行的Shell命令。
　　　　　　在上再打开一个client 的连接，执行：
　　　　　　　　/usr/cstor/hadoop/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client  -jar   /usr/cstor/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar    -shell_command  uptime
　　　　　　运行结果显示：　　　　

1 17/08/05 02:51:34 INFO distributedshell.Client: Initializing Client
2 17/08/05 02:51:34 INFO distributedshell.Client: Running Client
3 17/08/05 02:51:34 INFO client.RMProxy: Connecting to ResourceManager at master/10.1.21.27:8032
4 17/08/05 02:51:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
5 17/08/05 02:51:34 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=3
6 17/08/05 02:51:34 INFO distributedshell.Client: Got Cluster node info from ASM
7 17/08/05 02:51:34 INFO distributedshell.Client: Got node report from ASM for, nodeId=slave1:42602, nodeAddressslave1:8042, nodeRackName/default-rack, nodeNumContainers0
8 17/08/05 02:51:34 INFO distributedshell.Client: Got node report from ASM for, nodeId=slave2:57070, nodeAddressslave2:8042, nodeRackName/default-rack, nodeNumContainers0
9 17/08/05 02:51:34 INFO distributedshell.Client: Got node report from ASM for, nodeId=slave3:38580, nodeAddressslave3:8042, nodeRackName/default-rack, nodeNumContainers0
10 17/08/05 02:51:34 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
11 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS
12 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE
13 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
14 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
15 17/08/05 02:51:35 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 8192
16 17/08/05 02:51:35 INFO distributedshell.Client: Max virtual cores capabililty of resources in this cluster 32
17 17/08/05 02:51:35 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment
18 17/08/05 02:51:35 INFO distributedshell.Client: Set the environment for the application master
19 17/08/05 02:51:35 INFO distributedshell.Client: Setting up app master command
20 17/08/05 02:51:35 INFO distributedshell.Client: Completed setting up app master command {{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1>/AppMaster.stdout 2>/AppMaster.stderr
21 17/08/05 02:51:35 INFO distributedshell.Client: Submitting application to ASM
22 17/08/05 02:51:36 INFO impl.YarnClientImpl: Submitted application application_1501872322130_0001
23 17/08/05 02:51:37 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
24 17/08/05 02:51:38 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
25 17/08/05 02:51:39 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
26 17/08/05 02:51:40 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
27 17/08/05 02:51:41 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
28 17/08/05 02:51:42 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
29 17/08/05 02:51:43 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
30 17/08/05 02:51:44 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=//master:8088/proxy/application_1501872322130_0001/, appUser=root
31 17/08/05 02:51:44 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop
32 17/08/05 02:51:44 INFO distributedshell.Client: Application completed successfully

3’　　在client机上提交MapReduce任务
　　　　　　（1）指定在YARN上运行MapReduce任务
　　　　　　　　　　首先，在master机上，将文件“/usr/cstor/hadoop/etc/hadoop/mapred-site.xml. template”重命名为“/usr/cstor/hadoop/etc/hadoop/mapred-site.xml”；
　　　　　　　　　　　　　　
　　　　　　　　　　接着，编辑此文件并将如下内容嵌入此文件的configuration标签间：
　　　　　　　　　　　　　　　　mapreduce.framework.nameyarn
　　　　　　　　　　　　　　
　　　　　　　　　　最后，将master机的“/usr/local/hadoop/etc/hadoop/mapred-site.xml”文件拷贝到slaveX与client，（拷贝方法同上YARN配置拷贝方法），重新启动集群。
　　　　　　　　　　　　　　
　　　　　　（2）在client端提交PI Estimator任务
　　　　　　　　　　首先进入Hadoop安装目录：/usr/cstor/hadoop/，然后提交PI Estimator任务。
　　　　　　　　　　命令最后两个两个参数的含义：第一个参数是指要运行map的次数，这里是2次；第二个参数是指每个map任务，取样的个数；而两数相乘即为总的取样数。Pi Estimator使用Monte Carlo方法计算Pi值的。
　　　　　　　　　　bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 2 10
　　　　　　　　　　显示结果如下：

1 Number of Maps  = 2
2 Samples per Map = 10
3 17/08/05 03:03:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
4 Wrote input for Map #0
5 Wrote input for Map #1
6 Starting Job
7 17/08/05 03:03:31 INFO client.RMProxy: Connecting to ResourceManager at master/10.1.21.27:8032
8 17/08/05 03:03:32 INFO input.FileInputFormat: Total input paths to process : 2
9 17/08/05 03:03:32 INFO mapreduce.JobSubmitter: number of splits:2
10 17/08/05 03:03:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1501872322130_0002
11 17/08/05 03:03:32 INFO impl.YarnClientImpl: Submitted application application_1501872322130_0002
12 17/08/05 03:03:32 INFO mapreduce.Job: The url to track the job: //master:8088/proxy/application_1501872322130_0002/
13 17/08/05 03:03:32 INFO mapreduce.Job: Running job: job_1501872322130_0002
14 17/08/05 03:03:39 INFO mapreduce.Job: Job job_1501872322130_0002 running in uber mode : false
15 17/08/05 03:03:39 INFO mapreduce.Job:  map 0% reduce 0%
16 17/08/05 03:03:45 INFO mapreduce.Job:  map 50% reduce 0%
17 17/08/05 03:03:46 INFO mapreduce.Job:  map 100% reduce 0%
18 17/08/05 03:03:52 INFO mapreduce.Job:  map 100% reduce 100%
19 17/08/05 03:03:52 INFO mapreduce.Job: Job job_1501872322130_0002 completed successfully
20 17/08/05 03:03:52 INFO mapreduce.Job: Counters: 49
21     File System Counters
22         FILE: Number of bytes read=50
23         FILE: Number of bytes written=347208
24         FILE: Number of read operations=0
25         FILE: Number of large read operations=0
26         FILE: Number of write operations=0
27         HDFS: Number of bytes read=522
28         HDFS: Number of bytes written=215
29         HDFS: Number of read operations=11
30         HDFS: Number of large read operations=0
31         HDFS: Number of write operations=3
32     Job Counters
33         Launched map tasks=2
34         Launched reduce tasks=1
35         Data-local map tasks=2
36         Total time spent by all maps in occupied slots (ms)=7932
37         Total time spent by all reduces in occupied slots (ms)=3443
38         Total time spent by all map tasks (ms)=7932
39         Total time spent by all reduce tasks (ms)=3443
40         Total vcore-seconds taken by all map tasks=7932
41         Total vcore-seconds taken by all reduce tasks=3443
42         Total megabyte-seconds taken by all map tasks=8122368
43         Total megabyte-seconds taken by all reduce tasks=3525632
44     Map-Reduce Framework
45         Map input records=2
46         Map output records=4
47         Map output bytes=36
48         Map output materialized bytes=56
49         Input split bytes=286
50         Combine input records=0
51         Combine output records=0
52         Reduce input groups=2
53         Reduce shuffle bytes=56
54         Reduce input records=4
55         Reduce output records=0
56         Spilled Records=8
57         Shuffled Maps =2
58         Failed Shuffles=0
59         Merged Map outputs=2
60         GC time elapsed (ms)=347
61         CPU time spent (ms)=2630
62         Physical memory (bytes) snapshot=683196416
63         Virtual memory (bytes) snapshot=2444324864
64         Total committed heap usage (bytes)=603979776
65     Shuffle Errors
66         BAD_ID=0
67         CONNECTION=0
68         IO_ERROR=0
69         WRONG_LENGTH=0
70         WRONG_MAP=0
71         WRONG_REDUCE=0
72     File Input Format Counters
73         Bytes Read=236
74     File Output Format Counters
75         Bytes Written=97
76 Job Finished in 20.592 seconds
77 Estimated value of Pi is 3.80000000000000000000

小结：
　　　　关于YARN框架的学习不需多深入，只需搭建好配置环境，以供下面MapReduce的学习。
　　　　在新版Hadoop中，Yarn作为一个资源管理调度框架，是Hadoop下MapReduce程序运行的生存环境。其实MapRuduce除了可以运行Yarn框架下，也可以运行在诸如Mesos，Corona之类的调度框架上，使用不同的调度框架，需要针对Hadoop做不同的适配。