Back to Browse

Building a High-Throughput Data Extract Architecture

328 views
Jun 23, 2022
20:35

Speaker: Bogdan Ghit, Tech Lead & Senior Software Engineer at Databricks One of the most interesting use cases seen at Databricks was the integration and support for Business Intelligence (BI) tools. Such tools are notoriously slow at extracting large query results from traditional data warehouses because they use a single-threaded technique to fetch the data through a SQL endpoint that becomes a data transfer bottleneck. In this talk, Bogdan discusses how his team has achieved high-throughput connectivity with BI tools using Cloud Fetch. Cloud Fetch is a new mechanism for fetching data in parallel via cloud storage such as AWS S3 and Azure Data Lake Storage to bring the data faster to BI tools. In the Databricks team's experiments using Cloud Fetch, they observed a 10x speed-up in extract performance due to parallelism.

Download

0 formats

No download links available.

Building a High-Throughput Data Extract Architecture | NatokHD