首页 技术 正文
技术 2022年11月13日
0 收藏 641 点赞 3,995 浏览 2255 个字

Spark中产生shuffle的算子

作用

算子名

能否替换,由谁替换

去重

distinct()

不能

聚合

reduceByKey()

groupByKey

groupBy()

groupByKey()

reduceByKey

aggregateByKey()

combineByKey()

排序

sortByKey()

sortBy()

重分区

coalesce()

repartition()

集合或者表操作

Intersection()

Substract()

SubstractByKey()

Join()

LeftOutJoin()

https://www.cnblogs.com/Alex-zqzy/p/9949117.html

去重

def distinct()def distinct(numPartitions: Int)

聚合

def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]def groupBy[K](f: T => K, p: Partitioner):RDD[(K, Iterable[V])]def groupByKey(partitioner: Partitioner):RDD[(K, Iterable[V])]def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner): RDD[(K, U)]def aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int): RDD[(K, U)]def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C): RDD[(K, C)]def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, numPartitions: Int): RDD[(K, C)]def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null): RDD[(K, C)]

排序

def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length): RDD[(K, V)]def sortBy[K](f: (T) => K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]

重分区

def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty)def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null)

集合或者表操作

def intersection(other: RDD[T]): RDD[T]def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]def intersection(other: RDD[T], numPartitions: Int): RDD[T]def subtract(other: RDD[T], numPartitions: Int): RDD[T]def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]def subtractByKey[W: ClassTag](other: RDD[(K, W)]): RDD[(K, V)]def subtractByKey[W: ClassTag](other: RDD[(K, W)], numPartitions: Int): RDD[(K, V)]def subtractByKey[W: ClassTag](other: RDD[(K, W)], p: Partitioner): RDD[(K, V)]def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]
下一篇: Idea 2017注册码
相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,082
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,556
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,406
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,179
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:7,815
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:4,898