distribute by 和 partition by

发布日期: 2021-06-25

开窗函数
over（）开窗函数，其括号内部主要有两种形式，固定搭配，不能更改：
over(distribute by…sort by…) 和 over(partition by…order by…)

两种开窗方式区别
patition by是按照一个一个reduce去处理数据的，所以要使用全局排序order by
distribute by是按照多个reduce去处理数据的，所以对应的排序是局部排序sort by

select 
nid,ntime,sum(nid) over(partition by year(ntime) order by ntime )
from zxz_5

INFO  : Compiling command(queryId=hive_20210625110909_e1c79678-b33a-4e7f-be24-80f53eab913c): select 
nid,ntime,sum(nid) over(partition by year(ntime) order by ntime )
from zxz_5
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:nid, type:bigint, comment:null), FieldSchema(name:ntime, type:string, comment:null), FieldSchema(name:_wcol0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20210625110909_e1c79678-b33a-4e7f-be24-80f53eab913c); Time taken: 0.075 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20210625110909_e1c79678-b33a-4e7f-be24-80f53eab913c): select 
nid,ntime,sum(nid) over(partition by year(ntime) order by ntime )
from zxz_5
INFO  : Query ID = hive_20210625110909_e1c79678-b33a-4e7f-be24-80f53eab913c
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1576480232731_834434
INFO  : The url to track the job: http://emr-master.izhikang.com:8088/proxy/application_1576480232731_834434/
INFO  : Starting Job = job_1576480232731_834434, Tracking URL = http://emr-master.izhikang.com:8088/proxy/application_1576480232731_834434/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job  -kill job_1576480232731_834434
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2021-06-25 11:09:42,608 Stage-1 map = 0%,  reduce = 0%
INFO  : 2021-06-25 11:09:46,696 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.07 sec
INFO  : 2021-06-25 11:09:51,794 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.58 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 580 msec
INFO  : Ended Job = job_1576480232731_834434
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.58 sec   HDFS Read: 16236 HDFS Write: 319 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 4 seconds 580 msec
INFO  : Completed executing command(queryId=hive_20210625110909_e1c79678-b33a-4e7f-be24-80f53eab913c); Time taken: 13.853 seconds
INFO  : OK

select 
nid,ntime,sum(nid) over(distribute by year(ntime) sort by ntime )
from zxz_5
INFO  : Compiling command(queryId=hive_20210625111111_6c4896d8-d7f6-4516-956c-ac78cb7d9094): select 
nid,ntime,sum(nid) over(distribute by year(ntime) sort by ntime )
from zxz_5
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:nid, type:bigint, comment:null), FieldSchema(name:ntime, type:string, comment:null), FieldSchema(name:_wcol0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20210625111111_6c4896d8-d7f6-4516-956c-ac78cb7d9094); Time taken: 0.075 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20210625111111_6c4896d8-d7f6-4516-956c-ac78cb7d9094): select 
nid,ntime,sum(nid) over(distribute by year(ntime) sort by ntime )
from zxz_5
INFO  : Query ID = hive_20210625111111_6c4896d8-d7f6-4516-956c-ac78cb7d9094
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1576480232731_834435
INFO  : The url to track the job: http://emr-master.izhikang.com:8088/proxy/application_1576480232731_834435/
INFO  : Starting Job = job_1576480232731_834435, Tracking URL = http://emr-master.izhikang.com:8088/proxy/application_1576480232731_834435/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job  -kill job_1576480232731_834435
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2021-06-25 11:11:53,319 Stage-1 map = 0%,  reduce = 0%
INFO  : 2021-06-25 11:11:57,405 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.98 sec
INFO  : 2021-06-25 11:12:02,507 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.61 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 610 msec
INFO  : Ended Job = job_1576480232731_834435
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.61 sec   HDFS Read: 16232 HDFS Write: 319 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 4 seconds 610 msec
INFO  : Completed executing command(queryId=hive_20210625111111_6c4896d8-d7f6-4516-956c-ac78cb7d9094); Time taken: 14.864 seconds
INFO  : OK