Description
From reading
http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
First, it appears you had fewer Kafka partitions than you had worker nodes. Because there's a 1:1 relationship between spark partitions and kafka partitions, you're not going to get full utilization of your workers. As you noted, shuffling can be expensive, you're better off doing the partitioning at the time you produce into kafka (ie add more kafka partitions)
Second, groupByKey is almost always a bad idea in Spark. You want e.g. reduceByKey to get aggregation work done before the shuffle (this is similar to a hadoop combiner).
Finally, it's not clear from the code whether the json and redis code paths are identical across the different benchmarks. For instance, there's a comment in the code that redis caching is not being done in the Spark case, but is in the Storm case. It doesn't seem like a fair comparison can be made without controlling for other variables.
If you want assistance with anything, let me know.