首页 技术 正文
技术 2022年11月23日
0 收藏 961 点赞 2,411 浏览 3248 个字

The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

In general, distributing rows based on the hash will give you a even distribution(均匀分布) in the buckets.

set mapred.reduce.tasks = ;

set hive.enforce.bucketing = true;

CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)

COMMENT ‘A bucketed copy of user_info’

PARTITIONED BY(ds STRING)

CLUSTERED BY(user_id) INTO BUCKETS;

INSERT into TABLE user_info_bucketed

PARTITION (ds=’2015-07-25′)

values

(100,’python’,’postgresql’), (101,’python’,’postgresql’), (102,’python’,’postgresql’), (103,’python’,’postgresql’), (104,’python’,’postgresql’), (105,’python’,’postgresql’), (106,’python’,’postgresql’), (107,’python’,’postgresql’), (108,’python’,’postgresql’), (109,’python’,’postgresql’), (111,’python’,’postgresql’), (112,’python’,’postgresql’), (113,’python’,’postgresql’), (114,’python’,’postgresql’), (115,’python’,’postgresql’), (116,’python’,’postgresql’), (117,’python’,’postgresql’), (118,’python’,’postgresql’), (119,’python’,’postgresql’), (120,’python’,’postgresql’), (121,’python’,’postgresql’), (122,’python’,’postgresql’), (2000,’R’,’Oracle’), (2001,’R’,’Oracle’), (2002,’R’,’Oracle’), (2003,’R’,’Oracle’), (2004,’R’,’Oracle’), (2005,’R’,’Oracle’), (2006,’R’,’Oracle’), (2007,’R’,’Oracle’), (2008,’R’,’Oracle’), (2009,’R’,’Oracle’), (2010,’R’,’Oracle’), (2011,’R’,’Oracle’), (2012,’R’,’Oracle’), (2013,’R’,’Oracle’), (2014,’R’,’Oracle’), (2015,’R’,’Oracle’), (2016,’R’,’Oracle’), (2017,’R’,’Oracle’), (2018,’R’,’Oracle’), (2019,’R’,’Oracle’), (2020,’R’,’Oracle’), (2030,’R’,’Oracle’), (2040,’R’,’Oracle’), (2050,’R’,’Oracle’);

[spark01 ~]$ hadoop fs -ls -R /user/hive/warehouse/test.db/user_info_bucketed
drwxrwxrwx   – huai supergroup          0 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25
-rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0
-rwxrwxrwx   3 huai supergroup        288 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0
-rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0

[spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0 |sort
102pythonpostgresql
105pythonpostgresql
108pythonpostgresql
111pythonpostgresql
114pythonpostgresql
117pythonpostgresql
120pythonpostgresql
2001ROracle
2004ROracle
2007ROracle
2010ROracle
2013ROracle
2016ROracle
2019ROracle
2040ROracle
[spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0 |sort
100pythonpostgresql
103pythonpostgresql
106pythonpostgresql
109pythonpostgresql
112pythonpostgresql
115pythonpostgresql
118pythonpostgresql
121pythonpostgresql
2002ROracle
2005ROracle
2008ROracle
2011ROracle
2014ROracle
2017ROracle
2020ROracle
2050ROracle
[spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0 |sort
101pythonpostgresql
104pythonpostgresql
107pythonpostgresql
113pythonpostgresql
116pythonpostgresql
119pythonpostgresql
122pythonpostgresql
2000ROracle
2003ROracle
2006ROracle
2009ROracle
2012ROracle
2015ROracle
2018ROracle
2030ROracle

相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,440
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,857
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,695
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,450
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:8,096
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:5,248