Optimize COUNT(*) Queries: A Performance Boost

Nov 1, 2025 by SLV Team 47 views

Optimizing COUNT(*) Queries: Achieving a Significant Performance Boost

Hey guys! Today, we're diving deep into an exciting optimization journey focused on speeding up COUNT(*) queries. Specifically, we’ll address a performance bottleneck where our current implementation is significantly slower compared to SQLite. Let's break down the issue, explore the root cause, and walk through the proposed solution.

The Performance Gap: A 125x Slowdown

In our benchmark tests, we discovered a pretty substantial performance difference between our system (NistMemSQL) and SQLite when executing SELECT COUNT(*) FROM table queries. The results were quite eye-opening:

SQLite: Clocking in at a super-fast ~1.8μs
NistMemSQL: Lagging behind at ~220μs

That's a whopping 125x difference! Obviously, this performance gap is something we need to address ASAP. Efficient COUNT(*) operations are crucial for many applications, so optimizing this is a high priority. So, let's delve into the root causes and potential solutions to make things lightning-fast!

Root Cause Analysis: Materializing Rows Unnecessarily

The primary culprit behind this performance discrepancy is how our current system handles COUNT(*) queries. We've identified that we're materializing all rows before counting them, even though the actual count is readily available without needing to access the row data. This is like bringing every single book from the library to the counter just to know how many books there are in total – super inefficient!

Current Execution Flow: A Step-by-Step Breakdown

To illustrate this, let's trace the execution flow for a simple query like SELECT COUNT(*) FROM test_table:

execute_from(): This step scans the entire table and materializes ALL 1000 rows into a Vec<Row>. This is an O(n) operation, meaning the time it takes increases linearly with the number of rows.
apply_where_filter_combined(): This step iterates through all the materialized rows. Again, this is an O(n) operation, even if there's no WHERE clause!
group_rows(): Here, we create a single group that contains all the rows. Another O(n) operation.
evaluate_aggregate_function(): Finally, we get to the counting part. This step returns the group_rows.len(), which is an O(1) operation (constant time). However, the damage is already done – we've spent O(n) time materializing and processing rows we didn't need.

As you can see, even though the final counting operation is quick, the preceding steps involve unnecessary work. We're essentially doing a lot of extra processing before we can get to the core task of counting.

SQLite's Approach: A Lesson in Efficiency

So, how does SQLite achieve its impressive ~1.8μs time? Well, SQLite is known for its performance, and it employs several strategies to optimize queries:

Avoiding Unnecessary Materialization: SQLite doesn't materialize row data unless it's absolutely necessary. This is a key difference compared to our current approach.
Highly Optimized C Implementation: SQLite is written in C, which allows for fine-grained control over memory and execution, resulting in highly efficient code.
Potential Use of Covering Indexes or Optimized Counting Loops: SQLite might be using covering indexes (indexes that contain all the data needed for a query) or optimized counting loops to further speed up the process.

Interestingly, SQLite doesn't cache row counts as metadata. This was confirmed using EXPLAIN QUERY PLAN, which shows that SQLite dynamically calculates the count each time. This makes their speed even more impressive!

Proposed Optimization: The Fast Path

To bridge this performance gap, we propose adding a fast path for COUNT(*) queries. This fast path will bypass the unnecessary row materialization and directly retrieve the count from the table metadata. Think of it as an express lane for specific types of queries.

Fast Path Detection: Identifying the Right Queries

We'll introduce a special-case detection mechanism within the execute_with_aggregation() function (located in crates/executor/src/select/executor/aggregation/mod.rs). This detection logic will identify opportunities to use the fast path.

Here's a snippet of the proposed code:

// Detect simple COUNT(*) with no filtering
if is_simple_count_star(stmt) {
    // Fast path: return table.len() directly without materializing rows
    let table = database.get_table(&table_name)?;
    return Ok(vec![Row::new(vec![SqlValue::Integer(table.len() as i64)])]);
}

This code checks if the query is a simple COUNT(*) query. If it is, it retrieves the table and directly returns the row count using table.len(), without materializing any rows.

Conditions for the Fast Path: Ensuring Safety

To ensure the fast path is used only in safe and appropriate scenarios, we'll apply it under the following conditions:

SELECT COUNT(*) FROM single_table: The query must be a COUNT(*) operation on a single table.
No WHERE clause: There should be no filtering conditions.
No GROUP BY clause: The query shouldn't involve grouping.
No HAVING clause: There should be no filtering on grouped results.
No DISTINCT: The query shouldn't be counting distinct values.
No JOIN: The query shouldn't involve joining multiple tables.
Single table reference: The table reference should be a direct table, not a subquery or Common Table Expression (CTE).

These conditions ensure that the fast path is only used when we can confidently retrieve the count directly from the table metadata without any side effects.

Expected Impact: A Dramatic Improvement

By implementing this fast path, we anticipate a significant performance improvement. We expect the COUNT(*) execution time to drop from ~220μs to <5μs. That's a potential 44x improvement! This will close the vast majority of the performance gap with SQLite, bringing us much closer to their level of efficiency.

And here’s the best part: this optimization carries zero risk. The fast path only kicks in for specific, safe cases, so it won't affect other queries or introduce any regressions. It’s a win-win situation!

Implementation Details: Getting Our Hands Dirty

Alright, let’s talk specifics. If you're looking to contribute or just understand the nitty-gritty, here's where you need to focus:

File to modify: crates/executor/src/select/executor/aggregation/mod.rs
Key locations:
- Lines 21-158: The execute_with_aggregation() function is where the magic happens.
- Lines 94-112: This is the current aggregate evaluation loop, which we'll be modifying to include the fast path.

It's also worth noting that the Table struct already tracks the row count using Vec::len():

File: crates/storage/src/table.rs
The rows: Vec<Row> field provides O(1) .len() access, which is perfect for our fast path.

Related Files: Exploring the Codebase

To get a broader understanding, you might also want to check out these related files:

crates/executor/src/select/executor/aggregation/evaluation.rs:93-111: This section contains the current COUNT(*) evaluation logic.
benchmarks/test_aggregates.py:238-246: This is where the COUNT(*) benchmark test resides.

Benchmark Test: Putting it to the Test

To validate our optimization, we'll be using a benchmark test similar to this:

def test_count_1k_nistmemsql(benchmark, nistmemsql_db):
    """Benchmark COUNT(*) on nistmemsql (1k rows)."""
    setup_test_table(nistmemsql_db, 1000, 'nistmemsql')
    
    def run_query():
        cursor = nistmemsql_db.cursor()
        cursor.execute("SELECT COUNT(*) FROM test_table")
        result = cursor.fetchone()[0]
        assert result == 1000
    
    benchmark(run_query)

This test measures the execution time of COUNT(*) on a table with 1000 rows. We'll run this benchmark before and after the optimization to quantify the performance improvement.

Success Criteria: Measuring Our Impact

To declare victory, we need to meet the following criteria:

COUNT(*) benchmark shows <10μs execution time: This confirms that our optimization has achieved the desired speedup.
All existing tests pass: We need to ensure that our changes haven't introduced any regressions or broken existing functionality.
Fast path is only applied to safe cases: We'll conduct comprehensive tests to verify that the fast path is only triggered under the specified conditions.

Let's Do This!

Optimizing COUNT(*) queries is a crucial step in enhancing our system's performance. By implementing this fast path, we can significantly reduce execution time and bring our performance closer to that of SQLite. This optimization is a low-risk, high-reward endeavor that will benefit a wide range of applications.

So, let's get our hands dirty, dive into the code, and make this happen! If you have any questions, suggestions, or just want to discuss the implementation, feel free to chime in. Let's make our database engine blazing fast! Thanks, guys! This is going to be awesome!