大数据技术 mapreduce同时加载读取多个文件-职坐标

大数据技术 mapreduce同时加载读取多个文件

沉沙 2018-10-12 来源：阅读 3116 评论 0

摘要：本篇教程探讨了大数据技术 mapreduce同时加载读取多个文件，希望阅读本篇文章以后大家有所收获，帮助大家对大数据技术的理解更加深入。

本篇教程探讨了大数据技术 mapreduce同时加载读取多个文件，希望阅读本篇文章以后大家有所收获，帮助大家对大数据技术的理解更加深入。

方法一:

　　a.第一步:在job中加载两个文件所在的位置

         FileInputFormat.setInputPaths(job, new Path[] { new Path("hdfs://192.168.9.13:8020/gradeMarking"),
　　　　　　new Path("hdfs://192.168.9.13:8020/implyCount") });

　　b.第二步:在Mapper类中重写setup方法,使用context对象获取该文件所在的文件名(如果是经过处理后的数据文件,因为文件名一样part-r-00000,所以要获取其所在的文件夹名)
        @Override
　　　　protected void setup(Mapper.Context context) throws IOException, InterruptedException {
　　　　FileSplit fs = (FileSplit) context.getInputSplit();
　　　　parentName = fs.getPath().getParent().getName();
　　　　}

方法二:

　　a.第一步:在job中将文件加载到本地

　　　　job.addCacheFile(new URI("hdfs://192.168.9.13:8020/meanwhileFind(同现)_data/part-r-00000"));
　　b.第二步:在Mapper函数中重写setup函数,用字符缓冲流进行读取
　　　
复制代码

1 @Override
2         protected void setup(Mapper.Context context)
3                 throws IOException, InterruptedException {
4             @SuppressWarnings("resource")
5             BufferedReader br = new BufferedReader(new FileReader("part-r-00000"));
6             String str = null;
7             while ((str = br.readLine()) != null) {
8                 String[] datas = str.split("\t");
9                 String[] sp = datas[0].split("-");
10                 if (!map.containsKey(sp[0])) {
11                     HashMap mapInner = new HashMap<>();
12                     mapInner.put(sp[1], Double.parseDouble(datas[1]));
13                     map.put(sp[0], mapInner);
14                 } else {
15                     @SuppressWarnings("rawtypes")
16                     HashMap mapInner = map.get(sp[0]);
17                     mapInner.put(sp[1], Double.parseDouble(datas[1]));
18                 }
19             }
20         }