Yeah, don't use TimescaleDB, use ClickHouse - I have 10 years of NOAA climate da...

mfreed · on April 16, 2024

Our experience is that Clickhouse and Timescale are designed for different workloads, and that Timescale is optimized for many of the time-series workloads people use in production:

- https://www.timescale.com/blog/what-is-clickhouse-how-does-i...

Sidenote: Timescale _does_ provide columnar storage. I don't believe that the blog author focused on this as part of insert benchmarks:

- Timescale columnar storage: https://www.timescale.com/blog/building-columnar-compression...

- Timescale query vectorization: https://www.timescale.com/blog/teaching-postgres-new-tricks-...

rkwasny · on April 16, 2024

Well, as a Co-founder and CTO of Timescale, would you say TimescaleDB is a good fit for storing weather data as OP does?

mfreed · on April 16, 2024

TimescaleDB primarily serves operational use cases: Developers building products on top of live data, where you are regularly streaming in fresh data, and you often know what many queries look like a priori, because those are powering your live APIs, dashboards, and product experience.

That's different from a data warehouse or many traditional "OLAP" use cases, where you might dump a big dataset statically, and then people will occasionally do ad-hoc queries against it. This is the big weather dataset file sitting on your desktop that you occasionally query while on holidays.

So it's less about "can you store weather data", but what does that use case look like? How are the queries shaped? Are you saving a single dataset for ad-hoc queries across the entire dataset, or continuously streaming in new data, and aging out or de-prioritizing old data?

In most of the products we serve, customers are often interested in recent data in a very granular format ("shallow and wide"), or longer historical queries along a well defined axis ("deep and narrow").

For example, this is where the benefits of TimescaleDB's segmented columnar compression emerges. It optimizes for those queries which are very common in your application, e.g., an IoT application that groups by or selected by deviceID, crypto/fintech analysis based on the ticker symbol, product analytics based on tenantID, etc.

If you look at Clickbench, what most of the queries say are: Scan ALL the data in your database, and GROUP BY one of the 100 columns in the web analytics logs.

- https://github.com/ClickHouse/ClickBench/blob/main/clickhous...

There are almost no time-predicates in the benchmark that Clickhouse created, but perhaps that is not surprising given it was designed for ad-hoc weblog analytics at Yandex.

So yes, Timescale serves many products today that use weather data, but has made different choices than Clickhouse (or things like DuckDB, pg_analytics, etc) to serve those more operational use cases.

dangoodmanUT · on April 16, 2024

Agreed, clickhouse is faster and has better features for this

RyanHamilton · on April 16, 2024

I agree. Clickhouse is awesomely powerful and fast. I maintain a list of benchmarks, if you know of any speed comparisons please let me know and I will add them to the lsit: https://www.timestored.com/data/time-series-database-benchma...

lyapunova · on April 16, 2024

Thanks for maintaining benchmarks here. Is there a github repo that might accompany the benchmarks that I could take a look at / reproduce?

anentropic · on April 16, 2024

Do you say that because its quicker to insert large batches of rows into Clickhouse, or because it's better in other ways?

(I'm currently inserting large batches of rows into MySQL and curious about Clickhouse...)

rkwasny · on April 16, 2024

It's better in insert speed, query speed and used disk storage.

gonzo41 · on April 16, 2024

So click house is a column db. Any thoughts on if the performance would be a wash if you just pivoted the timescale hypertable and indexed the time + column on timescale?

PolarizedPoutin · on April 16, 2024

Haha very cool use! Yeah reading up on TimescaleDB vs. Clickhouse it seems like columnar storage and Clickhouse will be faster and better compress the time series data. For now I'm sticking to TimescaleDB to learn Postgres and PostGIS, but might make a TimescaleDB vs. Clickhouse comparison when I switch!

rkwasny · on April 16, 2024

I can replicate your benchmark when I get a moment, the data, is it free to share if I wanted to make an open browser?

I have a feeling general public has very limited access to weather data, and graphs in news that state "are we getting hotter on colder" are all sensational and hard to verify.

cevian · on April 16, 2024

Please note that TimescaleDB also uses columnar storage for its compressed data.

Disclosure: I am a TimescaleDB engineer.