Book Review: DuckDB in Action
Take your skills to the next level!
The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.
What if I told you that there’s a tool that makes it incredibly easy to process datasets from various sources, analyze them using SQL, and export the result in different formats? And that it doesn’t require all the hassle you know from typical relational databases or a custom implementation.
Sounds too good to be true, right?
I thought the same when Michael Simons asked me if I could provide feedback on some of the chapters he wrote for the book DuckDB in Action, which he wrote with Mark Needham and Michael Hunger.
Disclaimer: I received a free copy of that book to provide feedback on some of its chapters and to provide an honest review.
OK, so let’s dive right in.
A quick summary
DuckDB in Action is a fun and easy read that shows you what you can do with DuckDB. The book is packed with small code samples that you can follow along to get your feet wet and start experimenting.
In case you’re not familiar with DuckDB already (to be honest, I wasn’t when Michael reached out to me), it’s an embedded analytics database. What I found especially interesting about it is how easily it processes data from different sources, transforms it, analyzes it, and exports your results in various formats.
And the authors, of course, show you how to do all that in their book.
The content
At the time of this writing, the DuckDB in Action book is still part of the Manning Early Action Program, which means it’s not officially finished. However, all 11 chapters and the first part of the appendix are already available.
Chapters 1 and 2 – Getting started
At the beginning of the book, the authors give you a quick introduction to DuckDB and the things you will learn in the book. After that, they show you how to install DuckDB, import your first data set, and perform a few simple queries.
Chapters 3 and 4 – Analyzing your data
Of course, these were the chapters I was most interested in when I got the book, and I liked them a lot. If you’re not already an SQL expert, these chapters will help you improve your skills.
In chapter 3, the authors teach the most important concepts of SQL. They show you how to create a table model in DuckDB, import your data, and perform some basic queries. You also learn about some DuckDB-specific extensions to SQL.
After that, the authors dive into more advanced SQL and DuckDB-specific query features that you can use to analyze your data. Here, you learn about window functions, PIVOT statements, table functions, lateral joins, and more.
Chapter 5 – Processing data without storing it
While reading the book, this was the most fascinating chapter for me.
In the past, I worked on import and export use cases of various applications. Supporting different formats and implementing the necessary transformations was usually a complex task.
As the authors show in this chapter, DuckDB can be an interesting tool in these situations. Instead of storing and analyzing data in DuckDB, you can also use it to read data from different sources, transform it, and export it in your preferred format. These formats can be common file formats like Parquet, JSON, and CSV or even tables in a relational database.
I will try DuckDB next time I work on an import or export that requires support for multiple formats.
Chapter 6 – Using DuckDB with Python
I’m a Java developer with no practical experience in Python. So, I only skimmed over this chapter.
If you’re familiar with Python or want to learn it, this chapter shows you how to use DuckDB in your application and integrate it with different libraries.
But if you’re a Java developer like myself, Appendix A will be much more interesting. It explains how to use DuckDB with different client APIs and is strongly focused on Java.
Chapter 7 – DuckDB in the cloud
You can run DuckDB locally on your machine. Or you can sign up for MotherDuck to run your database in the cloud.
In this chapter, the authors show you how to do that and how to use some of the DuckDB APIs.
Chapters 8 and 9 – Building pipelines and apps
Based on what you learned in the previous chapters, you can start building data pipelines and applications.
Chapter 8 discusses data pipelines, explains DuckDB’s role in different situations, and shows how to integrate it with commonly used tools like dbt and Dagster.
If you or your users want to avoid using SQL to analyze and interact with your data, you should read chapter 9. Here, the authors show you how to use DuckDB to build custom data apps with StreamLit, visualize your data with Plot.ly, and create dashboards with Apache Superset.
Chapter 10 – Working with large data sets
As we all know, the more data you have to process, the more problems you will have. Because huge data sets add performance issues to your already complex processing logic. And that’s, of course, also the case when using DuckDB.
In chapter 10, the authors share many great tips and insights on handling huge data sets. Because the topic of this chapter is complex, it’s a little harder to read than some of the other chapters. But it is a must-read for every senior developer (and everyone who wants to become one).
Chapter 11 – Conclusion
The final chapter of the book gives you a quick overview of what you’ve learned so far, mentions topics not covered in the book, and provides links to further resources.
Appendix A – Client APIs (incl. Java)
As mentioned earlier, Appendix A is an interesting chapter for all Java developers. It shows you how to use DuckDB with different client APIs and includes a relatively long section about Java and the JDBC driver.
If you’re unfamiliar with Python, you should definitely read this chapter and might not want to wait until you finish all regular chapters of the book.
Summary
DuckDB is an interesting database, and the book DuckDB in Action provides a great introduction.
When I started reading the book, I was mostly interested in the SQL parts explained in chapters 3 and 4. However, I quickly realized that DuckDB has much more to offer and enjoyed reading the entire book.
If you want to learn about an interesting database or are looking for a new way to process and transform data from different sources, I highly recommend reading DuckDB in Action.