Back to Browse

Understanding and Improving Code Generation

4.6K views
Aug 26, 2020
24:02

Code generation is integral to Spark’s physical execution engine. When implemented, the Spark engine creates optimized bytecode at runtime improving performance when compared to interpreted execution. Spark has taken the next step with whole-stage codegen which collapses an entire query into a single function. However, as the generated function sizes increase, new problems arise. Complex queries can lead to code generated functions ranging from thousands to hundreds of thousands of lines of code. This can lead to many problems such as OOM errors due to compilation costs, exceptions from exceeding the 64KB method limit in Java, and performance regressions when JIT compilation is turned off for a function whose bytecode exceeds 8KB. With whole-stage codegen turned off, Spark is able to split these functions into smaller functions to avoid these problems, but then the improvements of whole-stage codegen are lost. This talk will go over the improvements that Workday has made to code generation to handle whole-stage codegen for various queries. We will begin with the differences between expression codegen and whole-stage codegen. Then we will discuss how to split the collapsed function from whole-stage codegen. We will also present the performance improvements that we have seen from these improvements in our production workloads. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Read more here: https://databricks.com/product/unifie... Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner

Download

0 formats

No download links available.

Understanding and Improving Code Generation | NatokHD