ClickHouse JSON Type for Observability: Faster Queries on Semi-Structured Data
Presented by Mike Shi ClickStack Product Manager @ ClickHouse and Founder of HyperDX, at our Los Gatos Meetup at the Netflix Theater. See what the future of ClickHouse holds for observability workloads. Come join us at one of our amazing meetup groups, this presentation is from our San Francisco meetup on July 10th: https://www.meetup.com/clickhouse-silicon-valley-meetup-group #programming #clickhouse #database #observability #logs #events #traces Don't forget to give us a ⭐ on Github! https://github.com/clickhouse/clickhouse ClickStack is an open-source observability platform that unifies logs, metrics, and traces in a single ClickHouse database, enabling fast aggregations over high-cardinality, unsampled data. This solution provides end-to-end monitoring from front-end user sessions to back-end infrastructure, eliminating the data silos common in traditional observability stacks. By storing all telemetry in one system, ClickStack offers seamless correlation across signals; users can pivot from a specific log message directly to the full distributed trace and corresponding Kubernetes infrastructure metrics without manual correlation. The platform is built on open standards, recommending OpenTelemetry for data ingestion to support monitoring for Kubernetes, serverless, and other cloud-native environments.\n\nClickStack delivers high-performance querying on petabytes of data with sub-second search capabilities. The underlying ClickHouse columnar database provides 10x to 100x data compression, reducing hardware footprint and cost. The architecture supports the separation of compute and storage, allowing ingestion and query workloads to be scaled independently for resilience and cost efficiency with high-volume telemetry.\n\nFor handling semi-structured, high-cardinality observability data, ClickStack leverages the native ClickHouse JSON data type, which provides superior query performance compared to traditional map-based schemas. In performance comparisons, a query on the JSON type completed in 0.09 seconds scanning only 9MB of data, whereas the same query on a map type took 0.45 seconds and scanned over 500MB. The JSON type avoids costly linear scans by storing each property path as an individual column. This schema design preserves original data types (int, float, bool), preventing precision loss from string casting, and natively supports deeply nested fields. To manage field explosion with high-cardinality data, the JSON type promotes the most common property-type combinations to dedicated columns at the part level, while overflowing less frequent combinations into a single binary `shared_data` column, effectively balancing performance and storage efficiency. This makes it a highly cost-effective and performant solution for unsampled observability data. Data can be exported to Parquet for archival purposes, though it cannot be read back by ClickHouse in that format.
Download
0 formatsNo download links available.