大数据技术之MongoDB aggregate聚合高级查询
沉沙 2019-03-27 来源 : 阅读 3145 评论 0

摘要:本篇文章探讨了大数据技术之MongoDB aggregate聚合高级查询,希望阅读本篇文章以后大家有所收获,帮助大家对相关内容的理解更加深入。

本篇文章探讨了大数据技术之MongoDB aggregate聚合高级查询,希望阅读本篇文章以后大家有所收获,帮助大家对相关内容的理解更加深入。

大数据技术之MongoDB aggregate聚合高级查询

Crunching the Zipcode dataset
Please calculate the average population of cities in California (abbreviation CA) and New York (NY) (taken together) with populations over 25,000. 
For this problem, assume that a city name that appears in more than one state represents two separate cities. 
Please round the answer to a whole number. 
Hint: The answer for CT and NJ (using this data set) is 38177. 
Please note:

  • Different states might have the same city name.

  • A city might have multiple zip codes.

For purposes of keeping the Hands On shell quick, we have used a subset of the data you previously used in zips.json, not the full set. This is why there are only 200 documents (and 200 zip codes), and all of them are in New York, Connecticut, New Jersey, and California. 
If you prefer, you may download the handout and perform your analysis on your machine with

> mongoimport -d test -c zips --drop small_zips.json


Once you've generated your aggregation query and found your answer, select it from the choices below. 

  • 448055

  • 592167

  • 935718

  • 198242

  • 693777
                        /***********ANSWER**************/

db.zips.aggregate([
{
    $group:{
                "_id": {state: "$state",
                        city: "$city"},
                pop: {"$sum": "$pop"}
           }
},
{
    $match:{
                "pop":{"$gt":25000},
                "_id.state": {
                                "$in": ["CA", "NY"]
                              }
           }
},
{
    $group:{
                "_id":null,
                avg: {"$avg": "$pop"}
           }

])

c:\Program Files\MongoDB 2.6 Standard\bin>type aggregate_right_5_2.js | mongo

MongoDB shell version: 2.6.7
connecting to: test
{ "_id" : null, "avg" : 44804.782608695656 }
bye

/***************************************************************************/

HOMEWORK: HOMEWORK 5.3 (HANDS ON)

Who's the easiest grader on campus?
A set of grades are loaded into the grades collection. 
The documents look like this:

{
 "_id" : ObjectId("50b59cd75bed76f46522c392"),
 "student_id" : 10,
 "class_id" : 5,
 "scores" : [
  {
   "type" : "exam",
   "score" : 69.17634380939022
  },
  {
   "type" : "quiz",
   "score" : 61.20182926719762
  },
  {
   "type" : "homework",
   "score" : 73.3293624199466
  },
  {
   "type" : "homework",
   "score" : 15.206314042622903
  },
  {
   "type" : "homework",
   "score" : 36.75297723087603
  },
  {
   "type" : "homework",
   "score" : 64.42913107330241
  }
 ]
}

There are documents for each student (student_id) across a variety of classes (class_id). Note that not all students in the same class have the same exact number of assessments. Some students have three homework assignments, etc. 
Your task is to calculate the class with the best average student performance. This involves calculating an average for each student in each class of all non-quiz assessments and then averaging those numbers to get a class average. To be clear, each student's average includes only exams and homework grades. Don't include their quiz scores in the calculation. 
What is the class_id which has the highest average student perfomance? 
Hint/Strategy: You need to group twice to solve this problem. You must figure out the GPA that each student has achieved in a class and then average those numbers to get a class average. After that, you just need to sort. The class with the lowest average is the class with class_id=2. Those students achieved a class average of 37.6 
If you prefer, you may download the handout and perform your analysis on your machine with

> mongoimport -d test -c grades --drop grades.json


Below, choose the class_id with the highest average student average.

/*******ANSWER**********/

db.grades.aggregate([
                        {
                          $unwind : "$scores"
                        },
                        {
                          $match: {
                                    $or: [
                                            {"scores.type":"exam"},
                                            {"scores.type":"homework"}
                                         ]
                                  }
                        },
                        {
                          $group: {
                                    _id:{
                                            class_id: "$class_id",
                                            student_id: "$student_id"
                                        },
                                    grade:{
                                             $avg: "$scores.score"
                                          }
                                   }
                        },
                        {
                          $group: {
                                    _id:{
                                            class_id: "$_id.class_id"
                                         },
                                   avg_grade:{
                                            $avg: "$grade"
                                             }
                                  }
                        },
                        {
                                 $sort: { avg_grade : -1 }
                        },
                        {
                                 $limit:1
                        }
           ]).pretty()

c:\Program Files\MongoDB 2.6 Standard\bin>type aggregate_grade_5_3.js | mongo

MongoDB shell version: 2.6.7
connecting to: test
{ "_id" : { "class_id" : 1 }, "avg_grade" : 64.50642324269174 }
bye

/***************************************************************************/

HOMEWORK: HOMEWORK 5.4

Removing Rural Residents
In this problem you will calculate the number of people who live in a zip code in the US where the city starts with a digit. We will take that to mean they don't really live in a city. Once again, you will be using the zip code collection, which you will find in the 'handouts' link in this page. Import it into your mongod using the following command from the command line:

> mongoimport -d test -c zips --drop zips.json


If you imported it correctly, you can go to the test database in the mongo shell and conform that

> db.zips.count()


yields 29,467 documents. 
The project operator can extract the first digit from any field. For example, to extract the first digit from the city field, you could write this query:

db.zips.aggregate([
    {$project: 
     {
 first_char: {$substr : ["$city",0,1]},
     }  
   }
])

Using the aggregation framework, calculate the sum total of people who are living in a zip code where the city starts with a digit. Choose the answer below. 
Note that you will need to probably change your projection to send more info through than just that first character. Also, you will need a filtering step to get rid of all documents where the city does not start with a digital (0-9).

  • 298015

  • 345232

  • 245987

  • 312893

  • 158249

  • 543282

/*******ANSWER**********/

db.zips.aggregate([
                   {
                     $project: {  _id:1,
                                  city:1,
                                  state:1,
                                  pop:1,
                                  city_first_char:{
                                                    $substr : ["$city",0,1]
                                                  }
                               }
                    },
                   
                   {
                     $match:{
                              city_first_char:
                             {$in:["0","1","2","3","4","5","6","7","8","9"]}
                            }
                    },
                   
                    {
                     $group:{
                              _id: null,
                               total_pop: {
                                            $sum: "$pop"
                                           }
                            }
                    }
                  ])

c:\Program Files\MongoDB 2.6 Standard\bin>type aggregate_ciry_5_4.js | mongo

MongoDB shell version: 2.6.7
connecting to: test
{ "_id" : null, "total_pop" : 298015 }
bye

      本文由职坐标整理发布,学习更多的相关知识,请关注职坐标IT知识库!

本文由 @沉沙 发布于职坐标。未经许可,禁止转载。
喜欢 | 0 不喜欢 | 0
看完这篇文章有何感觉?已经有0人表态,0%的人喜欢 快给朋友分享吧~
评论(0)
后参与评论

您输入的评论内容中包含违禁敏感词

我知道了

助您圆梦职场 匹配合适岗位
验证码手机号,获得海同独家IT培训资料
选择就业方向:
人工智能物联网
大数据开发/分析
人工智能Python
Java全栈开发
WEB前端+H5

请输入正确的手机号码

请输入正确的验证码

获取验证码

您今天的短信下发次数太多了,明天再试试吧!

提交

我们会在第一时间安排职业规划师联系您!

您也可以联系我们的职业规划师咨询:

小职老师的微信号:z_zhizuobiao
小职老师的微信号:z_zhizuobiao

版权所有 职坐标-一站式IT培训就业服务领导者 沪ICP备13042190号-4
上海海同信息科技有限公司 Copyright ©2015 www.zhizuobiao.com,All Rights Reserved.
 沪公网安备 31011502005948号    

©2015 www.zhizuobiao.com All Rights Reserved

208小时内训课程