摘要:本篇主要介绍了大数据开发基础:Hive的开窗函数,通过具体的内容展示,希望对大数据开发Hive的学习有一定的帮助。
本篇主要介绍了大数据开发基础:Hive的开窗函数,通过具体的内容展示,希望对大数据开发Hive的学习有一定的帮助。
Hive的开窗函数
一、窗口函数 ROW_NUMBER,RANK,DENSE_RANK
1、数据准备
cookie1,2021-06-10,1
cookie1,2021-06-11,5
cookie1,2021-06-12,7
cookie1,2021-06-13,3
cookie1,2021-06-14,2
cookie1,2021-06-15,4
cookie1,2021-06-16,4
cookie2,2021-06-10,2
cookie2,2021-06-11,3
cookie2,2021-06-12,5
cookie2,2021-06-13,6
cookie2,2021-06-14,3
cookie2,2021-06-15,9
cookie2,2021-06-16,7
CREATE TABLE it_t1 (
cookieid string,
createtime string, --day
pv INT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-- 加载数据:
load data local inpath '/export/data/hivedatas/it_t2.txt' into table it_t1;
2、ROW_NUMBER
ROW_NUMBER() 从1开始,按照顺序,生成分组内记录的序列
SELECT
cookieid,
createtime,
pv,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn
FROM it_t1;
3、RANK 和 DENSE_RANK
RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位
DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位
SELECT
cookieid,
createtime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3
FROM it_t1
WHERE cookieid = 'cookie1';
二、Hive分析窗口函数 SUM,AVG,MIN,MAX
1、数据准备
--建表语句:
create table it_t2(
cookieid string,
createtime string, --day
pv int
) row format delimited
fields terminated by ',';
--加载数据:
load data local inpath '/root/hivedata/ it_t2.txt' into table it_t2;
--开启智能本地模式
SET hive.exec.mode.local.auto=true;
2、SUM(结果和ORDER BY相关,默认为升序)
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime) as pv1
from it_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from it_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid) as pv3
from it_t2; --如果没有order by排序语句 默认把分组内的所有数据进行sum操作
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as pv4
from it_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5
from it_t2;
select cookieid,createtime,pv,
sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as pv6
from it_t2;
--pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
--pv2: 同pv1
--pv3: 分组内(cookie1)所有的pv累加
--pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号,
13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
--pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
--pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,
14号=14号+15号+16号=2+4+4=10
/*
- 如果不指定rows between,默认为从起点到当前行;
- 如果不指定order by,则将分组内所有值累加;
- 关键是理解rows between含义,也叫做window子句:
- preceding:往前
- following:往后
- current row:当前行
- unbounded:起点
- unbounded preceding 表示从前面的起点
- unbounded following:表示到后面的终点
*/
3、AVG,MIN,MAX
AVG,MIN,MAX和SUM用法一样
select cookieid,createtime,pv,
avg(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from it_t2;
select cookieid,createtime,pv,
max(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from it_t2;
select cookieid,createtime,pv,
min(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
from it_t2;
————————————————
我是小职,记得找我
✅ 解锁高薪工作
✅ 免费获取基础课程·答疑解惑·职业测评
您输入的评论内容中包含违禁敏感词
我知道了
请输入正确的手机号码
请输入正确的验证码
您今天的短信下发次数太多了,明天再试试吧!
我们会在第一时间安排职业规划师联系您!
您也可以联系我们的职业规划师咨询:
版权所有 职坐标-一站式IT培训就业服务领导者 沪ICP备13042190号-4
上海海同信息科技有限公司 Copyright ©2015 www.zhizuobiao.com,All Rights Reserved.
沪公网安备 31011502005948号