
Dr. Pedro Holanda
Principal Engineer @ DuckDB Labs
PhD from CWI/Leiden University with publications at VLDB and ICDE. Former Microsoft Research intern.
DuckDB by the numbers
About
I joined DuckDB at CWI Amsterdam when the project had two researchers, only a handful of users, and a bet that analytical databases could be radically simpler. I have been assisting with building its infrastructure from the early days.
When DuckDB Labs spun out of CWI, I took on the COO role - helping hire the early team, organizing events, shaping the open-source strategy, and building the company while still writing code. That experience gave me a perspective on the full lifecycle of open-source infrastructure that informs every design decision I make today. When the company was ready for dedicated operations leadership, I chose to return to engineering full-time.
As Principal Engineer, I lead the development of DuckLake and drive query processing performance. I mentor new contributors, lead design reviews across subsystems, and help set technical direction for the engine.
Engineering Principles
The future of analytical databases is in-process. Moving data to the database is the wrong abstraction - the database should come to the data. That is the bet I made when I joined DuckDB.
CSV will never die. Instead of replacing messy formats, build systems that handle their full complexity transparently. That philosophy drives the work I do on DuckDB's data ingestion layer.
Databases should meet users where they are. That is why I built DuckDB's zero-copy Arrow integration and ADBC - the best data system works seamlessly with every tool in your stack, not the other way around.
Engineering Contributions
DuckLake
Leading development of DuckDB's integrated data lake format. A SQL-native catalog that stores table metadata in any database while data lives in open formats like Parquet on object storage.
CSV Engine
Designed and built DuckDB's parallel CSV parser with automatic type, delimiter, and dialect detection. Scores highest on the Pollock robustness benchmark (2025) across all tested systems.
Arrow & ADBC Integration
Built the zero-copy integration between DuckDB and Python's data ecosystem via Apache Arrow. ADBC provides a modern connectivity standard that eliminates the serialization overhead of ODBC.
ART Persistent Storage
Designed and implemented the persistent storage layer for DuckDB's Adaptive Radix Tree indexes. Keeps index data durable on disk without sacrificing in-memory lookup speed.
Python Client
Built DuckDB's Python client foundations and the UDF framework that lets users extend the engine in pure Python - no C++ required.
BIGNUM Implementation
Implemented arbitrary-precision integer arithmetic for DuckDB. Handles HUGEINT and DECIMAL types so that financial calculations and scientific measurements stay exact beyond 64-bit limits.
Open Source Projects
DuckDB
36k+The open-source analytical in-process database. Early contributor since the research prototype.
View on GitHubCareer Timeline
Principal Engineer
DuckDB LabsSoftware Engineer
DuckDB Labs Returned to full-time engineering. Built the CSV engine, ART storage, and ADBC integration.Chief Operating Officer
DuckDB LabsPost-Doctoral Researcher
CWI Amsterdam Concurrent with COO role at DuckDB Labs.Research Intern
Microsoft Research DMX group. JIT-compiled execution engines for SQL Server.PhD in Computer Science
CWI / Leiden UniversityTalks & Media
Speaker at developer conferences and data community events.
Efficient CSV Parsing: On the Complexity of Simple Things
Selected Publications
Peer-reviewed work on progressive indexing, database benchmarking, and analytical query processing.
Multidimensional Adaptive & Progressive Indexes
Progressive Indexes
Dissecting DuckDB: The internals of the SQLite for Analytics
Progressive Indexes: Indexing for Interactive Data Analysis
Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing
Blog Posts
Showing selected posts. Full list on the DuckDB Blog.
Introduced the zero-copy integration between DuckDB and Apache Arrow that became the default way to move data between DuckDB and the Python ecosystem.
Testing DuckDB's CSV parser against the Pollock robustness benchmark - the most adversarial collection of real-world CSV files available.
A benchmark comparison of CSV and Parquet ingestion performance. The results are more nuanced than the conventional wisdom suggests.
DuckDB's implementation of the Arrow Database Connectivity standard, providing a modern alternative to ODBC for high-throughput data transfer.
Get in Touch
I am always up for a conversation about database internals, query engine design, or open-source collaboration. Feel free to reach out.
Get in Touch