Career Profile

I helped build DuckDB from a research prototype at CWI Amsterdam into one of the most widely adopted analytical databases in the world, with 36k+ GitHub stars, 28M+ monthly downloads, and embedded across the modern data stack. As one of the earliest contributors, I served as Chief Operating Officer (COO) of DuckDB Labs from 2021 to 2023, helping scale the company from a small research spinoff into a leading open-source database company. I now lead core infrastructure as Principal Engineer, helping drive the core of DuckDB and DuckLake. Previously a research intern at Microsoft Research, one of the most selective research programs in industry. Published at VLDB and ICDE, the top peer-reviewed venues in database research. PhD from CWI/Leiden University.

Experiences

Principal Engineer

2025 - Present
DuckDB Labs, Amsterdam

Leading the development of core database infrastructure in DuckDB, the open-source analytical database with 36k+ stars and 28M+ monthly downloads. Helping drive the core of DuckDB and DuckLake.

Software Engineer

2023 - 2025
DuckDB Labs, Amsterdam

Built and maintained core DuckDB subsystems including the parallel CSV Reader & Sniffer, ART Persistent Storage, Arrow & ADBC integration (~38x faster than ODBC), Enum/Dictionary encoding, BIGNUM implementation, Python client, index joins, and zone maps.

Chief Operating Officer (COO)

2021 - 2023
DuckDB Labs, Amsterdam

Helped grow DuckDB Labs during its early phase, from a small research spinoff at CWI into a leading open-source database company. Contributed to core DuckDB development including the Arrow integration, Enum/Dictionary encoding, ART Persistent Storage, BIGNUM, Python client, index joins, and zone maps.

Post-Doctoral Researcher

2021 - 2022
Centrum Wiskunde en Informatica (CWI), Amsterdam

Database Architectures group. Research on progressive indexing and analytical database systems. Early DuckDB development alongside the core research team that created the project.

Research Intern

2019
Microsoft Research, Redmond

DMX group (4-month internship). Designed and implemented an experimental JIT-compiled execution engine for analytical query processing in SQL Server, working directly with the SQL Server team on next-generation query execution. Microsoft Research is one of the most selective research internship programs in the world.

Talks & Media

Speaker at FOSDEM, EuroPython, and international data conferences.

2025

DuckLake (Portuguese) - Data Hackers Community

2024

2023

DuckDB: Bringing Analytical SQL Directly to Your Python Shell - EuroPython — one of the world's largest Python conferences
DuckDB: Bringing Analytical SQL Directly to Your Python Shell - FOSDEM — one of the largest open-source conferences in the world
DuckDB Extensions - DuckCon #2, Brussels
Data Analysis with DuckDB - J on the Beach

Open Source Projects

DuckDB - The open-source analytical in-process database (36k+ stars, 28M+ monthly downloads). Core contributor since inception.
DuckLake - An integrated data lake and catalog format (2.5k+ stars).
Scrooge McDuck - A DuckDB extension for financial data analysis — Yahoo Finance and Ethereum blockchain scanning.

Selected Publications

Published at VLDB, ICDE, and SIGMOD — the leading peer-reviewed venues in database systems research.

2021

Multidimensional Adaptive & Progressive Indexes
Progressive Indexes
Pedro Holanda
PhD Thesis @ Leiden University/CWI [PDF]

2020

Dissecting DuckDB: The internals of the SQLite for Analytics
Pedro Holanda and Mark Raasveldt
SBBD (Tutorial) [PDF] [HANDS-ON]

2019

Progressive Indexes: Indexing for Interactive Data Analysis

2018

Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing
SIGMOD (DbTest) [PDF] [SOURCE CODE]

Blog Posts

2025

Arrow IPC Support in DuckDB
DuckDB Blog [Here]
DuckDB's CSV Reader and the Pollock Robustness Benchmark: Into the CSV Abyss
DuckDB Blog [Here]

2024

CSV Files: Dethroning Parquet as the Ultimate Storage File Format — or Not?
Pedro Holanda
DuckDB Blog [Here]
Driving CSV Performance: Benchmarking DuckDB with the NYC Taxi Dataset
Pedro Holanda
DuckDB Blog [Here]
Scrooge - DuckDB: Experimental Ethereum Blockchain Scanner
Pedro Holanda
Blog [Here]

2023

DuckDB's CSV Sniffer: Automatic Detection of Types and Dialects
Pedro Holanda
DuckDB Blog [Here]
DuckDB ADBC — Zero-Copy Data Transfer via Arrow Database Connectivity
Pedro Holanda
DuckDB Blog [Here]
From Waddle to Flying: Quickly Expanding DuckDB's Functionality with Scalar Python UDFs
Pedro Holanda, Thijs Bruineman, and Phillip Cloud
DuckDB Blog [Here]
Scrooge: Analyzing Yahoo Financial Data In DuckDB
Pedro Holanda
Blog [Here]

2022

Persistent Storage of Adaptive Radix Trees in DuckDB
Pedro Holanda
DuckDB Blog [Here]
Scrooge McDuck: A DuckDB Extension for Financial Data Analysis (Demo)
Pedro Holanda
Blog [Here]

2021

DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB
Pedro Holanda and Jonathan Keane
DuckDB Blog [Here]
DuckDB — The Lord of Enums: The Fellowship of the Categorical and Factors
Pedro Holanda
DuckDB Blog [Here]