首页 技术 正文
技术 2022年11月23日
0 收藏 961 点赞 2,139 浏览 3248 个字

The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

In general, distributing rows based on the hash will give you a even distribution(均匀分布) in the buckets.

set mapred.reduce.tasks = ;

set hive.enforce.bucketing = true;

CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING)

COMMENT ‘A bucketed copy of user_info’

PARTITIONED BY(ds STRING)

CLUSTERED BY(user_id) INTO BUCKETS;

INSERT into TABLE user_info_bucketed

PARTITION (ds=’2015-07-25′)

values

(100,’python’,’postgresql’), (101,’python’,’postgresql’), (102,’python’,’postgresql’), (103,’python’,’postgresql’), (104,’python’,’postgresql’), (105,’python’,’postgresql’), (106,’python’,’postgresql’), (107,’python’,’postgresql’), (108,’python’,’postgresql’), (109,’python’,’postgresql’), (111,’python’,’postgresql’), (112,’python’,’postgresql’), (113,’python’,’postgresql’), (114,’python’,’postgresql’), (115,’python’,’postgresql’), (116,’python’,’postgresql’), (117,’python’,’postgresql’), (118,’python’,’postgresql’), (119,’python’,’postgresql’), (120,’python’,’postgresql’), (121,’python’,’postgresql’), (122,’python’,’postgresql’), (2000,’R’,’Oracle’), (2001,’R’,’Oracle’), (2002,’R’,’Oracle’), (2003,’R’,’Oracle’), (2004,’R’,’Oracle’), (2005,’R’,’Oracle’), (2006,’R’,’Oracle’), (2007,’R’,’Oracle’), (2008,’R’,’Oracle’), (2009,’R’,’Oracle’), (2010,’R’,’Oracle’), (2011,’R’,’Oracle’), (2012,’R’,’Oracle’), (2013,’R’,’Oracle’), (2014,’R’,’Oracle’), (2015,’R’,’Oracle’), (2016,’R’,’Oracle’), (2017,’R’,’Oracle’), (2018,’R’,’Oracle’), (2019,’R’,’Oracle’), (2020,’R’,’Oracle’), (2030,’R’,’Oracle’), (2040,’R’,’Oracle’), (2050,’R’,’Oracle’);

[spark01 ~]$ hadoop fs -ls -R /user/hive/warehouse/test.db/user_info_bucketed
drwxrwxrwx   – huai supergroup          0 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25
-rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0
-rwxrwxrwx   3 huai supergroup        288 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0
-rwxrwxrwx   3 huai supergroup        266 2015-07-20 22:46 /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0

[spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000000_0 |sort
102pythonpostgresql
105pythonpostgresql
108pythonpostgresql
111pythonpostgresql
114pythonpostgresql
117pythonpostgresql
120pythonpostgresql
2001ROracle
2004ROracle
2007ROracle
2010ROracle
2013ROracle
2016ROracle
2019ROracle
2040ROracle
[spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000001_0 |sort
100pythonpostgresql
103pythonpostgresql
106pythonpostgresql
109pythonpostgresql
112pythonpostgresql
115pythonpostgresql
118pythonpostgresql
121pythonpostgresql
2002ROracle
2005ROracle
2008ROracle
2011ROracle
2014ROracle
2017ROracle
2020ROracle
2050ROracle
[spark01 ~]$ hadoop fs -cat /user/hive/warehouse/test.db/user_info_bucketed/ds=2015-07-25/000002_0 |sort
101pythonpostgresql
104pythonpostgresql
107pythonpostgresql
113pythonpostgresql
116pythonpostgresql
119pythonpostgresql
122pythonpostgresql
2000ROracle
2003ROracle
2006ROracle
2009ROracle
2012ROracle
2015ROracle
2018ROracle
2030ROracle

微信扫一扫

支付宝扫一扫

本文网址:https://www.zhankr.net/141812.html

相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:877 阅读:5,786
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:806 阅读:3,864
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:565 阅读:4,669
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:731 阅读:4,614
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:5,705
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:3,408
助力内容变现

将您的收入提升到一个新的水平

点击联系客服

在线时间:8:00-16:00

客服电话

400-888-8888

客服邮箱

ceotheme@ceo.com

扫描二维码

关注微信公众号

扫描二维码

手机访问本站