Firebolt vs Snowflake

Firebolt helped this client reach interactive, sub-sec analytics at lower costs
We ran queries that were previously run on Snowflake, over a 0.5 TB data set.
The results were great - huge performance gains over smaller and cheaper clusters
(deep diving into the PoC after the table):

Query
Duration
Duration
Perf.boost
$1.54 / hour
Engine size:
1 x c5d.4xlarge
$16 / hour
Warehouse size:
Large
#5
#1
#2
#3
#4
0.09 sec
72 sec
62 sec
90 sec
60 sec
1.88 sec
0.01 sec
0.16 sec
X388
X800
X48
X6000
83 sec
1.6 sec
X52

Disclaimer - This is not a global benchmark. The results are based on real world queries and run-times as reported by our users over Snowflake, and their equivalent run-times over the same data in Firebolt after tuning and optimization

Here's one of the Snowflake queries (fields/values are masked of course):

"
SELECT

SUM(CASE WHEN type='initiate' THEN val ELSE 0 END) as "initiate",
SUM(CASE WHEN type='start' THEN val ELSE 0 END) as "start",
SUM(CASE WHEN type='cached' THEN val ELSE 0 END) as "cached",
SUM(CASE WHEN type='ack' THEN val ELSE 0 END) as "ack",
SUM(CASE WHEN type='denied' THEN val ELSE 0 END) as "denied", day, country, domain, account
FROM(
(SELECT type, val, DATE_TRUNC('day',date) as "day", country, domain, account
FROM events
     WHERE type IN('initiate', 'start', 'cached', 'ack', 'denied')
      AND date >= '1604217600' AND date < '1604235600'
       AND user = '******************')
GROUP BY "day","country","domain","account"

"

It's a simple query. No joins, only one big table. The other queries weren't very different conceptually, so let's stick to this example. The query essentially aggregates over a few dimensions, and returns sums with a combination of case statements. The query is filtered by a few of the dimensions.

Why even simple queries can be slow in cloud data warehouses?

Unlike in the on-premise world, modern cloud data engines have a much more complicated relationship with storage. In the cloud we enjoy infinite storage, which is great. But the infinite storage layer of S3 (I will talk in AWS lingo throughout the post, but the same concepts/challenges are true for all cloud providers) is far from optimal for performance.

When queries aren't waiting in the cache, and the query engine has to scan data in S3, that's when you typically go and get coffee. Queries are too slow for interactive analytics if large data scans over S3 are involved. This is something that most Athena/Presto users know very well. Since these query engines scan data directly in S3, they run into slowdowns very often when data volumes are significant.

All cloud data warehouses have various techniques for storing data in a smarter way and serving it to the query engine in a faster way. This is why cloud data warehouses are typically faster than Athena/Presto for non-cached queries.

But at the end of the day, at scale, even the modern cloud data warehouses have to move too much data between S3 and the SSD of the compute nodes to stay performant.

Curious to learn more?

Schedule a meeting with our solution architect

Some key differences from the more detailed comparison between Firebolt and Snowflake:

Scalability
Architecture
Query scalability
Elasticity - separation of storage and compute
Supported cloud infrastructure
Isolated tenancy
User scalability (automatic or manual)
User concurrency (maximum concurrency)
Write scalability (batch, continuous, upserts)
Performance
Indexing
Query optimization
Storage and partitioning
Semi-structured native data storage, functions, performance
Use cases
COST
Reporting and dashboards
Administration
Price
Ad hoc, interactive analytics
Operations, customer facing
Choice of resources
Up to 10 same-size (1-128 node) warehouses
Unlimited user concurrency
AWS only
AWS, Azure, Google Cloud
Yes
VPS only
Yes
Yes
Manual 1-click resize to larger warehouse
Automatic
1-click resize to larger EC2 type, number nodes
Automate with scripting
Limited - batch-centric, slow updates
Strong - batch, and continuous, fast updates
None
Sparse, aggregate, join
Yes (static)
Yes, and ad hoc
sec-minutes
sub-second
Batch-centric, higher scale
Continuous, high scale
Easy to deploy and resize
Easy to deploy and resize
Choice of fixed size warehouses
Choice of EC2 instance types, engine sizes
Limited
Micro-partition storage, separate RAM
Optimized across storage and RAM (F3)
Non-native,
native funct.
and slow
Native storage, native lambda
and fast
Extensive
High: based on compute, storage
Low: based only on size of data

People say:

With Firebolt, our 1000 Looker users can now run any analytics against billions of rows and terabytes of data, in seconds or less.

Alexandra Sudilovski
Senior BI Expert & Looker Guild Master, AppsFlyer

As seen on

Thank you! We'll get back to you soon
Something wen't wrong.. please dm us via Facebook or Twitter

Curious to learn more?

Schedule a meeting with our solution architect