如何优雅的写sql 01 窗口函数的窗口别名复用

发现

最近偶然间,发现一个以前spark sql不知道的写法,窗口函数的窗口是可以起别名的。

本文就作为笔记罗列下窗口别名的写法。

哪里看?

要了解最官方的语法及解释,自然是看官方的解释了。
spark的官方为 Window Functions
hive的官方为LanguageManual WindowingAndAnalytics

本文以spark sql语法为准

窗口别名的基本语法

1
2
3
4
5
6
7
8
select 
window_function [ nulls_option ] OVER window_name

from xxx

window window_name as ( [ { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ]
{ ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ]
[ window_frame ] )

举例来说

1
2
3
4
5
6
7
8
9
10
with tmp as (
select '张三' as name,100 as score
union all
select '张三' as name,80 as score
)
select *
,lead(score) over w as lead_score
from tmp
window w as (partition by name order by score)
order by score

此时结果为

name score lead_score
张三 80 100
张三 100 null

有什么作用?

当我们同一个窗口,要取的字段比较多时,窗口的partition by xx order by xxx就会重复编写,此时,如要要修改窗口,那么就会要改很多地方,而使用别名,则只需修改一个地方,同时,代码也会简洁很多。

还是举个例子,比如有如下的订单表

id 用户名 商品 支付时间 购买数量 支付金额
1 张三 奶粉 2023-01-01 13:53:52 2 120
2 张三 尿不湿 2023-01-02 14:14:14 3 200
需求是加上上一次同一个用户购买的商品名称,支付时间,购买金额用于分析比较。

那么,正常的一个sql写法为

1
2
3
4
5
6
7
8
9
10
11
with item_order as (
select 1 as id,'张三' as name,'奶粉' as item_name,'2023-01-01 13:53:52' as pay_time,2 as item_cnt,120 as pay_amt
union all
select 2 as id,'张三' as name,'尿不湿' as item_name,'2023-01-02 14:14:14' as pay_time,3 as item_cnt,200 as pay_amt
)
select *
,lag(item_name) over (partition by name order by pay_time) as lag_item_name
,lag(pay_time) over (partition by name order by pay_time) as lag_pay_time
,lag(pay_amt) over (partition by name order by pay_time) as lag_pay_amt
from item_order
order by name,pay_time

查看上述sql,会发现写了三次 over (partition by name order by pay_time)

而采用窗口别名复用的方法,sql可以变为如下

1
2
3
4
5
6
7
8
9
10
11
12
13
with item_order as (
select 1 as id,'张三' as name,'奶粉' as item_name,'2023-01-01 13:53:52' as pay_time,2 as item_cnt,120 as pay_amt
union all
select 2 as id,'张三' as name,'尿不湿' as item_name,'2023-01-02 14:14:14' as pay_time,3 as item_cnt,200 as pay_amt
)
select *
,lag(item_name) over name_window as lag_item_name
,lag(pay_time) over name_window as lag_pay_time
,lag(pay_amt) over name_window as lag_pay_amt
from item_order
window name_window as (partition by name order by pay_time)
order by name,pay_time

可以看到对窗口name_window进行了复用,此时只需修改 name_window 一个地方就可以了。

参考资料