• DuckDB Parquet Benchmarking Results

    Benchmarking LONG Format File (long.parquet)

    Operation / Query Metric Value (seconds)
    Scan and count all rows
    SELECT COUNT(*) FROM 'long.parquet'; Cold Run 1 (Disk I/O + CPU) 0.007935
    Hot Reads (RAM I/O + CPU)
       Mean 0.000646
       StDev 0.000212
    Filter for existing appliance 'fridge'
    SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'fridge'; Cold Run 1 (Disk I/O + CPU) 0.000899
    Hot Reads (RAM I/O + CPU)
       Mean 0.000870
       StDev 0.000074
    Filter for non-existent appliance 'andreas'
    SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'andreas'; Cold Run 1 (Disk I/O + CPU) 0.000578
    Hot Reads (RAM I/O + CPU)
       Mean 0.001077
       StDev 0.000456

    Benchmarking WIDE (Pivoted) Format File (wide.parquet)

    Operation / Query Metric Value (seconds)
    Scan and count all rows
    SELECT COUNT(*) FROM 'wide.parquet'; Cold Run 1 (Disk I/O + CPU) 0.000553
    Hot Reads (RAM I/O + CPU)
       Mean 0.000563
       StDev 0.000187
    Select existing column 'fridge'
    SELECT AVG("fridge") FROM 'wide.parquet'; Cold Run 1 (Disk I/O + CPU) 0.000726
    Hot Reads (RAM I/O + CPU)
       Mean 0.000956
       StDev 0.000207
    Select non-existent column 'andreas'
    SELECT AVG("andreas") FROM 'wide.parquet'; Result Query fails as expected
    Time to fail 0.00065769

    Benchmarking On-the-fly Transformation

    Operation / Query Metric Value (seconds)
    PIVOT 'long.parquet' from long to wide
    PIVOT read_parquet('long.parquet') ON appliance USING AVG(elec) GROUP BY timestamp; Cold Run 1 (Disk I/O + CPU) 0.048430
    Hot Reads (RAM I/O + CPU)
       Mean 0.029494
       StDev 0.000991

    Schemas

    long.parquet Schema

    Column Name Data Type
    timestamp TIMESTAMP WITH TIME ZONE
    elec DOUBLE
    appliance VARCHAR

    wide.parquet Schema

    Column Name Data Type
    dishwasher DOUBLE
    fridge DOUBLE
    standby DOUBLE
    washingMachine DOUBLE
    timestamp TIMESTAMP WITH TIME ZONE
    Edited by Andreas Andriotis
  • 
    uv run test_duck.py 
    Found both long.parquet and wide.parquet files. Starting...
    
    ============================================================
                 Benchmarking LONG Format File
    ============================================================
    --- Benchmarking: Scan and count all rows ---
    Query: SELECT COUNT(*) FROM 'long.parquet';
      Cold Run 1 (primes OS cache): 0.007935 seconds
    
    --- Results Summary ---
    Cold Read (Disk I/O + CPU): 0.007935 seconds
    Hot Reads (RAM I/O + CPU, avg of 4 runs):
      Mean:   0.000646 seconds
      StDev:  0.000212 seconds
    -------------------------------------------------------
    
    --- Benchmarking: Filter for existing appliance 'fridge' ---
    Query: SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'fridge';
      Cold Run 1 (primes OS cache): 0.000899 seconds
    
    --- Results Summary ---
    Cold Read (Disk I/O + CPU): 0.000899 seconds
    Hot Reads (RAM I/O + CPU, avg of 4 runs):
      Mean:   0.000870 seconds
      StDev:  0.000074 seconds
    -------------------------------------------------------
    
    --- Benchmarking: Filter for non-existent appliance 'andreas' ---
    Query: SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'andreas';
      Cold Run 1 (primes OS cache): 0.000578 seconds
    
    --- Results Summary ---
    Cold Read (Disk I/O + CPU): 0.000578 seconds
    Hot Reads (RAM I/O + CPU, avg of 4 runs):
      Mean:   0.001077 seconds
      StDev:  0.000456 seconds
    -------------------------------------------------------
    
    ============================================================
                Benchmarking WIDE (Pivoted) Format File
    ============================================================
    --- Benchmarking: Scan and count all rows ---
    Query: SELECT COUNT(*) FROM 'wide.parquet';
      Cold Run 1 (primes OS cache): 0.000553 seconds
    
    --- Results Summary ---
    Cold Read (Disk I/O + CPU): 0.000553 seconds
    Hot Reads (RAM I/O + CPU, avg of 4 runs):
      Mean:   0.000563 seconds
      StDev:  0.000187 seconds
    -------------------------------------------------------
    
    --- Benchmarking: Select existing column 'fridge' ---
    Query: SELECT AVG("fridge") FROM 'wide.parquet';
      Cold Run 1 (primes OS cache): 0.000726 seconds
    
    --- Results Summary ---
    Cold Read (Disk I/O + CPU): 0.000726 seconds
    Hot Reads (RAM I/O + CPU, avg of 4 runs):
      Mean:   0.000956 seconds
      StDev:  0.000207 seconds
    -------------------------------------------------------
    
    --- Benchmarking: Select non-existent column 'andreas' ---
    Query (expected to fail): SELECT AVG("andreas") FROM 'wide.parquet';
    
    --- Results Summary ---
    SUCCESS: Query failed as expected.
    Error: Binder Error: Referenced column "andreas" not found in FROM clause!
    Candidate bindings: "standby", "dishwasher", "washingMachine", "fridge"
    
    LINE 1: SELECT AVG("andreas") FROM 'wide.parquet';
                       ^
    Time to fail: 0.00065769 seconds (measures parsing/planning time)
    -------------------------------------------------------
    
    ============================================================
                Benchmarking On-the-fly Transformation
    ============================================================
    --- Benchmarking: PIVOT 'long.parquet' from long to wide ---
    Query: PIVOT read_parquet('long.parquet')
                ON appliance USING AVG(elec) GROUP BY timestamp;
      Cold Run 1 (primes OS cache): 0.048430 seconds
    
    --- Results Summary ---
    Cold Read (Disk I/O + CPU): 0.048430 seconds
    Hot Reads (RAM I/O + CPU, avg of 4 runs):
      Mean:   0.029494 seconds
      StDev:  0.000991 seconds
    -------------------------------------------------------
    
    ============================================================
                Schemas
    ============================================================
    --- Exploring Schema of: long.parquet ---
    Column Name                    | Data Type
    ------------------------------ | --------------------
    timestamp                      | TIMESTAMP WITH TIME ZONE
    elec                           | DOUBLE
    appliance                      | VARCHAR
    -------------------------------------------------------
    
    --- Exploring Schema of: wide.parquet ---
    Column Name                    | Data Type
    ------------------------------ | --------------------
    dishwasher                     | DOUBLE
    fridge                         | DOUBLE
    standby                        | DOUBLE
    washingMachine                 | DOUBLE
    timestamp                      | TIMESTAMP WITH TIME ZONE
    -------------------------------------------------------
    
    Edited by Andreas Andriotis
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment