I’m trying to sift through information on data technologies. I have data stored in S3 that I want to analyze using EMR. However, when I try to research the pros and cons of Presto, Hive, Spark, or any other , I end up drowning in company sponsored benchmark reports or papers written by people with clear biases.

So, my ask: Am I better off just experimenting with each , or do you have any suggested that offer opinions with substance, and not just buzzwords?

Source link
and data center


Please enter your comment!
Please enter your name here