Index-Based SQL-on-Hadoop - An Architectural Comparison of Tools
Since the launch of Cloudera's Impala 2 years ago, multiple tools such as Hive/Tez, Presto, Drill, Hawq and SharkSQL have emerged in the quest to improve interactive query performance on Hadoop. All of these tools share the basic architecture design of full-scan / MPP databases and bypass indexes. Yet indexes have been used for decades to improve query performance.
In this session we'll dive deep into the various architectural approaches taken by the range of SQL-on-Hadoop tools and examine them in detail. We will assess the trade-offs of design choices such as full-scan vs index access, shared-nothing vs shared-everything, local processing vs remote compute and many more.
We will review common use-cases such as Interactive BI tools, exploratory SQL queries, reporting and live dashboards, and assess the best design that is suitable for each. The session will include live benchmark of Jethro vs Impala and be open for Q&A.
Boaz Raufman has over 25 years of software design and management experience. He is a leading expert in database architecture, information retrieval and search technologies and has led numerous information retrieval projects for various Israeli intelligence agencies as well as for commercial companies.
Boaz started JethroData in 2010 with the idea of integrating database and search technologies to accelerate big data analytics. Boaz has Bachelor's degree in Computer Science and Philosophy from the Tel-Aviv University.