Recently I’ve started to use Spark more and more, so I’ve decided to read something about it. High Performance Spark by Holden Karau and Rachel Warren looked like an interesting book, and I already had it from some HIB bundle. The book is quite short, but it covers a lot of topics. It has a lot of technics to make Spark faster and avoid common bottlenecks with explanation and sometimes even going down to Spark internals. Although I’m mostly using PySpark and almost everything in the book is in Scala, it was still useful as API is mostly the same.
Some parts of High Performance Spark are like
config key/param – sort of documentation, but most of the book is ok.