Spark大型电商项目实战-及其改良(2) RDD优化效果不稳定的真正原因

首先看没有map join的第2任务:

时间线如下

接着是对应id的算子计算时间表

Stage Id	Description	Submitted	Duration	Tasks: Succeeded/Total	Shuffle Read	<!– Place the shuffle write tooltip on the left (rather than the default position of on top) because the shuffle write column is the last column on the right side and the tooltip is wider than the column, so it doesn’t fit on top. –> Shuffle Write
13	collect at AreaTop3ProductRDD.java:353 +details	2019/01/29 11:19:02	59 ms	41/41	235.3 KB
12	mapToPair at AreaTop3ProductRDD.java:259 +details	2019/01/29 11:19:02	0.1 s	41/41	383.2 KB	235.3 KB
11	mapToPair at AreaTop3ProductRDD.java:251 +details	2019/01/29 11:19:02	95 ms	41/41	99.3 KB	246.2 KB
9	mapToPair at AreaTop3ProductRDD.java:230 +details	2019/01/29 11:19:01	0.5 s	41/41	767.7 KB	99.3 KB
8	mapToPair at AreaTop3ProductRDD.java:128 +details	2019/01/29 11:19:01	0.5 s	41/41		752.0 KB
7	mapToPair at AreaTop3ProductRDD.java:164 +details	2019/01/29 11:19:01	0.3 s	1/1		15.7 KB
10	mapToPair at AreaTop3ProductRDD.java:248 +details	2019/01/29 11:19:01	0.5 s	41/41		137.0 KB

城市区域表(对应id 10)和商品列表(对应id 7)的数据量比较小，但在集群中的运行时间还是比较长的

不过因为是并行化运行，点击记录(对应id 8)的处理很快就完毕

并且id 9(把数据转换为key是区域+商品id，value是城市信息的组合)的运行时间也不长

在程序只是简单转换为RDD的情况下也能发挥优化效果

相比上述程序，speedUp版程序执行效率没有多大提升。

时间线如下

时间表如下

Stage Id	Description	Submitted	Duration	Tasks: Succeeded/Total	Shuffle Read	<!– Place the shuffle write tooltip on the left (rather than the default position of on top) because the shuffle write column is the last column on the right side and the tooltip is wider than the column, so it doesn’t fit on top. –> Shuffle Write
17	collect at AreaTop3ProductRDDSpeedUp.java:371 +details	2019/01/29 11:19:03	53 ms	41/41	246.7 KB
16	mapToPair at AreaTop3ProductRDDSpeedUp.java:284 +details	2019/01/29 11:19:03	0.1 s	41/41	475.6 KB	246.7 KB
15	mapToPair at AreaTop3ProductRDDSpeedUp.java:218 +details	2019/01/29 11:19:02	0.6 s	41/41		475.9 KB

Stage Id

Description

Submitted

Duration

Tasks: Succeeded/Total

Input

Output

Shuffle Read

<!– Place the shuffle write tooltip on the left (rather than the default position
of on top) because the shuffle write column is the last column on the right side and
the tooltip is wider than the column, so it doesn’t fit on top. –> Shuffle Write

collect at AreaTop3ProductRDDSpeedUp.java:371 +details

2019/01/29 11:19:03

53 ms

41/41

246.7 KB

mapToPair at AreaTop3ProductRDDSpeedUp.java:284 +details

2019/01/29 11:19:03

0.1 s

41/41

475.6 KB

246.7 KB

mapToPair at AreaTop3ProductRDDSpeedUp.java:218 +details