Firebolt vs Snowflake

Firebolt helped this client reach interactive, sub-sec analytics at lower costs
We ran queries that were previously run on Snowflake, over a 0.5 TB data set.
The results were great - huge performance gains over smaller and cheaper clusters
(deep diving into the PoC after the table):

Query

Duration

Perf.boost

$1.54 / hour

Engine size:

1 x c5d.4xlarge

$16 / hour

Warehouse size:

Large

0.09 sec

72 sec

62 sec

90 sec

60 sec

1.88 sec

0.01 sec

0.16 sec

X388

X800

X48

X6000

83 sec

1.6 sec

X52

Disclaimer - This is not a global benchmark. The results are based on real world queries and run-times as reported by our users over Snowflake, and their equivalent run-times over the same data in Firebolt after tuning and optimization

Here's one of the Snowflake queries (fields/values are masked of course):

"
SELECT
SUM(CASE WHEN type='initiate' THEN val ELSE 0 END) as "initiate",
SUM(CASE WHEN type='start' THEN val ELSE 0 END) as "start",
SUM(CASE WHEN type='cached' THEN val ELSE 0 END) as "cached",
SUM(CASE WHEN type='ack' THEN val ELSE 0 END) as "ack",
SUM(CASE WHEN type='denied' THEN val ELSE 0 END) as "denied", day, country, domain, account
FROM(
(SELECT type, val, DATE_TRUNC('day',date) as "day", country, domain, account
FROM events
     WHERE type IN('initiate', 'start', 'cached', 'ack', 'denied')
      AND date >= '1604217600' AND date < '1604235600'
       AND user = '******************')
) GROUP BY "day","country","domain","account"
‍
"

It's a simple query. No joins, only one big table. The other queries weren't very different conceptually, so let's stick to this example. The query essentially aggregates over a few dimensions, and returns sums with a combination of case statements. The query is filtered by a few of the dimensions.

Why even simple queries can be slow in cloud data warehouses?

Unlike in the on-premise world, modern cloud data engines have a much more complicated relationship with storage. In the cloud we enjoy infinite storage, which is great. But the infinite storage layer of S3 (I will talk in AWS lingo throughout the post, but the same concepts/challenges are true for all cloud providers) is far from optimal for performance.

When queries aren't waiting in the cache, and the query engine has to scan data in S3, that's when you typically go and get coffee. Queries are too slow for interactive analytics if large data scans over S3 are involved. This is something that most Athena/Presto users know very well. Since these query engines scan data directly in S3, they run into slowdowns very often when data volumes are significant.

All cloud data warehouses have various techniques for storing data in a smarter way and serving it to the query engine in a faster way. This is why cloud data warehouses are typically faster than Athena/Presto for non-cached queries.

But at the end of the day, at scale, even the modern cloud data warehouses have to move too much data between S3 and the SSD of the compute nodes to stay performant.

Some key differences from the more detailed comparison between Firebolt and Snowflake:

Scalability

Snowflake

Firebolt

Architecture

Query scalability

Elasticity - separation of storage and compute

Supported cloud infrastructure

Isolated tenancy

User scalability (automatic or manual)

User concurrency (maximum concurrency)

Write scalability (batch, continuous, upserts)

Performance

Indexing

Query optimization

Storage and partitioning

Semi-structured native data storage, functions, performance

Use cases

COST

Reporting and dashboards

Administration

Price

Ad hoc, interactive analytics

Operations, customer facing

Choice of resources

Up to 10 same-size (1-128 node) warehouses

Unlimited user concurrency

AWS only

AWS, Azure, Google Cloud

Yes

VPS only

Yes

Manual 1-click resize to larger warehouse

Automatic

1-click resize to larger EC2 type, number nodes

Automate with scripting

Limited - batch-centric, slow updates

Strong - batch, and continuous, fast updates

None

Sparse, aggregate, join

Yes (static)

Yes, and ad hoc

sec-minutes

sub-second

Batch-centric, higher scale

Continuous, high scale

Easy to deploy and resize

Choice of fixed size warehouses

Choice of EC2 instance types, engine sizes

Limited

Micro-partition storage, separate RAM

Optimized across storage and RAM (F3)

Non-native,
native funct.
and slow

Native storage, native lambda
and fast

Extensive

High: based on compute, storage

Low: based only on size of data

Firebolt vs Snowflake

Why even simple queries can be slow in cloud data warehouses?

Curious to learn more?

Schedule a meeting with our solution architect

People say:

With Firebolt, our 1000 Looker users can now run any analytics against billions of rows and terabytes of data, in seconds or less.

Alexandra Sudilovski
Senior BI Expert & Looker Guild Master, AppsFlyer

As seen on

Curious to learn more?

© 2024 Firebolt Analytics Inc. All rights reserved

Firebolt vs Snowflake

Why even simple queries can be slow in cloud data warehouses?

Curious to learn more?

Schedule a meeting with our solution architect

People say:

With Firebolt, our 1000 Looker users can now run any analytics against billions of rows and terabytes of data, in seconds or less.

Alexandra SudilovskiSenior BI Expert & Looker Guild Master, AppsFlyer

As seen on

Curious to learn more?

© 2024 Firebolt Analytics Inc. All rights reserved

Alexandra Sudilovski
Senior BI Expert & Looker Guild Master, AppsFlyer