Mastering ClickHouse® Schema Design
Who is this video for: - Developers working with ClickHouse® for the first time - Data engineers optimizing analytical workloads - Teams building real-time analytics applications - Anyone looking to understand ClickHouse schema best practices 0:00 Intro Learn how ClickHouse schema design directly impacts data storage efficiency and query performance. Discover the key decisions you need to make when designing schemas to transform terabytes of data into millisecond query responses 0:23 Mastering ClickHouse schema design Master the fundamental principles of ClickHouse schema design including smaller data types for better performance, the critical importance of sorting keys for sequential access, specialized types like LowCardinality for compression, and the balance between flexibility and stability in columnar databases 4:30 Nail your schemas in ClickHouse Explore four practical approaches to schema design: using known schemas for established data patterns, leveraging sample data analysis with describe commands, implementing AI-assisted schema design for rapid prototyping, and utilizing predefined templates for common use cases like OpenTelemetry 9:30 Data types rules Follow five essential rules for ClickHouse data type selection: use strict types to avoid slowness, prefer minimal precision types for optimal storage, avoid nullable columns when possible to reduce overhead, leverage LowCardinality for categorical data compression, and carefully evaluate complex types like enums, maps, and arrays 10:30 Codecs Master ClickHouse compression codecs including specialized options like Delta encoding for sequential data, ZSTD compression algorithms, and the trade-offs between CPU overhead and I/O reduction. Learn how proper codec selection can dramatically improve storage efficiency and query performance 12:32 Data locality & sorting keys Understand the critical importance of data locality in ClickHouse through sorting key design. Learn how proper column ordering can reduce full table scans from 300ms to 30ms, and discover why filtering by non-sorting key columns still requires expensive full scans 13:44 The right numeric type Choose optimal numeric types in ClickHouse by selecting the smallest integer that fits your data range, using decimal types for precise amount calculations, leveraging unsigned types for better compression, and avoiding numeric overflows that can corrupt your results 16:34 Strings Optimize string handling in ClickHouse with three key strategies: use FixedString for codes like country identifiers, leverage LowCardinality for categorical data to achieve massive storage savings and faster GROUP BY operations, and avoid LowCardinality for high-cardinality columns to prevent performance degradation 18:46 Date & Time Design efficient date and time schemas by choosing the right precision for your use case (Date for rollups, DateTime for events), using the lowest precision possible to avoid unnecessary storage overhead, and storing data in UTC while converting to local time zones at query time 20:07 Nullable columns Handle nullable columns efficiently by understanding they require extra storage for bit maps, using default values when null has no special semantics, and falling back to nullable only when necessary. Learn the storage and performance implications of nullable vs. default value approaches 20:57 Special data types Navigate ClickHouse's special data types by sticking to simpler, structured schemas for faster queries. Use arrays for lists of items and aggregate functions, but carefully evaluate complex types like JSON, enums, maps, tuples, and UUIDs based on your specific use case requirements 21:41 Schema evolution: The elephant in the room Tackle ClickHouse schema evolution challenges including the complexity of changing column types, the expensive full rewrites required for incompatible type changes, and the need for careful planning when modifying production schemas. Learn why schema design should be iterative but requires upfront planning 24:10 ClickHouse for developers Discover how this ClickHouse for Developers series unveils the complex aspects of ClickHouse while providing the foundation for learning the basics and identifying the right developer tools for working with ClickHouse effectively in your projects - What is a columnar database? - https://tbrd.co/coldb - When to use a columnar database? - https://tbrd.co/coldb-when - Data Engineering for developers - https://tbrd.co/ch4devs-yt - SQL best practices - https://tbrd.co/faster-sql-yt - How to ingest 1 BILLION rows per second in ClickHouse: https://tbrd.co/1b-rows Tinybird is not affiliated with, associated with, or sponsored by ClickHouse, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.
Download
0 formatsNo download links available.