-
DuckDB Parquet Benchmarking Results
Benchmarking LONG Format File (
long.parquet
)Operation / Query Metric Value (seconds) Scan and count all rows SELECT COUNT(*) FROM 'long.parquet';
Cold Run 1 (Disk I/O + CPU) 0.007935 Hot Reads (RAM I/O + CPU) Mean 0.000646 StDev 0.000212 Filter for existing appliance 'fridge' SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'fridge';
Cold Run 1 (Disk I/O + CPU) 0.000899 Hot Reads (RAM I/O + CPU) Mean 0.000870 StDev 0.000074 Filter for non-existent appliance 'andreas' SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'andreas';
Cold Run 1 (Disk I/O + CPU) 0.000578 Hot Reads (RAM I/O + CPU) Mean 0.001077 StDev 0.000456 Benchmarking WIDE (Pivoted) Format File (
wide.parquet
)Operation / Query Metric Value (seconds) Scan and count all rows SELECT COUNT(*) FROM 'wide.parquet';
Cold Run 1 (Disk I/O + CPU) 0.000553 Hot Reads (RAM I/O + CPU) Mean 0.000563 StDev 0.000187 Select existing column 'fridge' SELECT AVG("fridge") FROM 'wide.parquet';
Cold Run 1 (Disk I/O + CPU) 0.000726 Hot Reads (RAM I/O + CPU) Mean 0.000956 StDev 0.000207 Select non-existent column 'andreas' SELECT AVG("andreas") FROM 'wide.parquet';
Result Query fails as expected Time to fail 0.00065769 Benchmarking On-the-fly Transformation
Operation / Query Metric Value (seconds) PIVOT 'long.parquet' from long to wide PIVOT read_parquet('long.parquet') ON appliance USING AVG(elec) GROUP BY timestamp;
Cold Run 1 (Disk I/O + CPU) 0.048430 Hot Reads (RAM I/O + CPU) Mean 0.029494 StDev 0.000991 Schemas
long.parquet
SchemaColumn Name Data Type timestamp TIMESTAMP WITH TIME ZONE elec DOUBLE appliance VARCHAR wide.parquet
SchemaColumn Name Data Type dishwasher DOUBLE fridge DOUBLE standby DOUBLE washingMachine DOUBLE timestamp TIMESTAMP WITH TIME ZONE -
uv run test_duck.py Found both long.parquet and wide.parquet files. Starting... ============================================================ Benchmarking LONG Format File ============================================================ --- Benchmarking: Scan and count all rows --- Query: SELECT COUNT(*) FROM 'long.parquet'; Cold Run 1 (primes OS cache): 0.007935 seconds --- Results Summary --- Cold Read (Disk I/O + CPU): 0.007935 seconds Hot Reads (RAM I/O + CPU, avg of 4 runs): Mean: 0.000646 seconds StDev: 0.000212 seconds ------------------------------------------------------- --- Benchmarking: Filter for existing appliance 'fridge' --- Query: SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'fridge'; Cold Run 1 (primes OS cache): 0.000899 seconds --- Results Summary --- Cold Read (Disk I/O + CPU): 0.000899 seconds Hot Reads (RAM I/O + CPU, avg of 4 runs): Mean: 0.000870 seconds StDev: 0.000074 seconds ------------------------------------------------------- --- Benchmarking: Filter for non-existent appliance 'andreas' --- Query: SELECT AVG(elec) FROM 'long.parquet' WHERE appliance = 'andreas'; Cold Run 1 (primes OS cache): 0.000578 seconds --- Results Summary --- Cold Read (Disk I/O + CPU): 0.000578 seconds Hot Reads (RAM I/O + CPU, avg of 4 runs): Mean: 0.001077 seconds StDev: 0.000456 seconds ------------------------------------------------------- ============================================================ Benchmarking WIDE (Pivoted) Format File ============================================================ --- Benchmarking: Scan and count all rows --- Query: SELECT COUNT(*) FROM 'wide.parquet'; Cold Run 1 (primes OS cache): 0.000553 seconds --- Results Summary --- Cold Read (Disk I/O + CPU): 0.000553 seconds Hot Reads (RAM I/O + CPU, avg of 4 runs): Mean: 0.000563 seconds StDev: 0.000187 seconds ------------------------------------------------------- --- Benchmarking: Select existing column 'fridge' --- Query: SELECT AVG("fridge") FROM 'wide.parquet'; Cold Run 1 (primes OS cache): 0.000726 seconds --- Results Summary --- Cold Read (Disk I/O + CPU): 0.000726 seconds Hot Reads (RAM I/O + CPU, avg of 4 runs): Mean: 0.000956 seconds StDev: 0.000207 seconds ------------------------------------------------------- --- Benchmarking: Select non-existent column 'andreas' --- Query (expected to fail): SELECT AVG("andreas") FROM 'wide.parquet'; --- Results Summary --- SUCCESS: Query failed as expected. Error: Binder Error: Referenced column "andreas" not found in FROM clause! Candidate bindings: "standby", "dishwasher", "washingMachine", "fridge" LINE 1: SELECT AVG("andreas") FROM 'wide.parquet'; ^ Time to fail: 0.00065769 seconds (measures parsing/planning time) ------------------------------------------------------- ============================================================ Benchmarking On-the-fly Transformation ============================================================ --- Benchmarking: PIVOT 'long.parquet' from long to wide --- Query: PIVOT read_parquet('long.parquet') ON appliance USING AVG(elec) GROUP BY timestamp; Cold Run 1 (primes OS cache): 0.048430 seconds --- Results Summary --- Cold Read (Disk I/O + CPU): 0.048430 seconds Hot Reads (RAM I/O + CPU, avg of 4 runs): Mean: 0.029494 seconds StDev: 0.000991 seconds ------------------------------------------------------- ============================================================ Schemas ============================================================ --- Exploring Schema of: long.parquet --- Column Name | Data Type ------------------------------ | -------------------- timestamp | TIMESTAMP WITH TIME ZONE elec | DOUBLE appliance | VARCHAR ------------------------------------------------------- --- Exploring Schema of: wide.parquet --- Column Name | Data Type ------------------------------ | -------------------- dishwasher | DOUBLE fridge | DOUBLE standby | DOUBLE washingMachine | DOUBLE timestamp | TIMESTAMP WITH TIME ZONE -------------------------------------------------------
Please register or sign in to comment