Dr. Pedro Holanda - Principal Engineer at DuckDB Labs

Dr. Pedro Holanda

Principal Engineer @ DuckDB Labs

I've been building DuckDB's infrastructure since it was a research prototype at CWI. From the CSV engine to Arrow integration to DuckLake, I work on the systems that make analytical databases radically simpler.

PhD from CWI/Leiden University with publications at VLDB and ICDE. Former Microsoft Research intern.

DuckDB by the numbers

36k+ GitHub Stars
28M+ Monthly Downloads

About

I joined DuckDB at CWI Amsterdam when the project had two researchers, only a handful of users, and a bet that analytical databases could be radically simpler. I have been assisting with building its infrastructure from the early days.

When DuckDB Labs spun out of CWI, I took on the COO role - helping hire the early team, organizing events, shaping the open-source strategy, and building the company while still writing code. That experience gave me a perspective on the full lifecycle of open-source infrastructure that informs every design decision I make today. When the company was ready for dedicated operations leadership, I chose to return to engineering full-time.

As Principal Engineer, I lead the development of DuckLake and drive query processing performance. I mentor new contributors, lead design reviews across subsystems, and help set technical direction for the engine.

Engineering Principles

The future of analytical databases is in-process. Moving data to the database is the wrong abstraction - the database should come to the data. That is the bet I made when I joined DuckDB.

CSV will never die. Instead of replacing messy formats, build systems that handle their full complexity transparently. That philosophy drives the work I do on DuckDB's data ingestion layer.

Databases should meet users where they are. That is why I built DuckDB's zero-copy Arrow integration and ADBC - the best data system works seamlessly with every tool in your stack, not the other way around.

Engineering Contributions

DuckLake

Leading development of DuckDB's integrated data lake format. A SQL-native catalog that stores table metadata in any database while data lives in open formats like Parquet on object storage.

CSV Engine

Designed and built DuckDB's parallel CSV parser with automatic type, delimiter, and dialect detection. Scores highest on the Pollock robustness benchmark (2025) across all tested systems.

Arrow & ADBC Integration

Built the zero-copy integration between DuckDB and Python's data ecosystem via Apache Arrow. ADBC provides a modern connectivity standard that eliminates the serialization overhead of ODBC.

ART Persistent Storage

Designed and implemented the persistent storage layer for DuckDB's Adaptive Radix Tree indexes. Keeps index data durable on disk without sacrificing in-memory lookup speed.

Python Client

Built DuckDB's Python client foundations and the UDF framework that lets users extend the engine in pure Python - no C++ required.

BIGNUM Implementation

Implemented arbitrary-precision integer arithmetic for DuckDB. Handles HUGEINT and DECIMAL types so that financial calculations and scientific measurements stay exact beyond 64-bit limits.

Open Source Projects

DuckDB

36k+

The open-source analytical in-process database. Early contributor since the research prototype.

View on GitHub

DuckLake

2.5k+

Integrated data lake and catalog format designed to work with DuckDB.

View on GitHub

Career Timeline

2025 - Present

Principal Engineer

DuckDB Labs
2023 - 2025

Software Engineer

DuckDB Labs Returned to full-time engineering. Built the CSV engine, ART storage, and ADBC integration.
2021 - 2023

Chief Operating Officer

DuckDB Labs
2021 - 2022

Post-Doctoral Researcher

CWI Amsterdam Concurrent with COO role at DuckDB Labs.
2019

Research Intern

Microsoft Research DMX group. JIT-compiled execution engines for SQL Server.
2017 - 2021

PhD in Computer Science

CWI / Leiden University

Talks & Media

Speaker at developer conferences and data community events.

FOSDEM EuroPython

Selected Publications

Peer-reviewed work on progressive indexing, database benchmarking, and analytical query processing.

2021

Multidimensional Adaptive & Progressive Indexes

Extending progressive indexing to multiple dimensions for faster analytical queries.
2021

Progressive Indexes

Dissertation combining adaptive indexing with workload-driven optimization for interactive analytical queries.
Pedro Holanda
PhD Thesis @ Leiden University/CWI [PDF]
2020

Dissecting DuckDB: The internals of the SQLite for Analytics

A hands-on tutorial on DuckDB internals for analytical workloads.
Pedro Holanda and Mark Raasveldt
SBBD (Tutorial) [PDF] [HANDS-ON]
2019

Progressive Indexes: Indexing for Interactive Data Analysis

Indexes that adapt during query execution, enabling interactive data analysis.
2018

Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing

Identifying common pitfalls in database performance testing.
SIGMOD (DbTest) [PDF] [SOURCE CODE]

Blog Posts

Showing selected posts. Full list on the DuckDB Blog.

Introduced the zero-copy integration between DuckDB and Apache Arrow that became the default way to move data between DuckDB and the Python ecosystem.

Testing DuckDB's CSV parser against the Pollock robustness benchmark - the most adversarial collection of real-world CSV files available.

A benchmark comparison of CSV and Parquet ingestion performance. The results are more nuanced than the conventional wisdom suggests.

DuckDB's implementation of the Arrow Database Connectivity standard, providing a modern alternative to ODBC for high-throughput data transfer.

Get in Touch

I am always up for a conversation about database internals, query engine design, or open-source collaboration. Feel free to reach out.

Get in Touch