Session: From Files to Tables: Rethinking How We Store and Query Data
We’ve all worked with data files, CSV, JSON, Parquet, but turning those files into something reliable, queryable, and scalable is harder than it looks. Schema changes break things. Queries get slower over time. Deleting data is messy. And suddenly your “data lake” feels more like a swamp.
In this talk, we’ll rethink how we manage data by exploring what makes a table actually work. Starting from raw files, we’ll build up to the core ideas behind modern table formats like Apache Iceberg. Along the way, we’ll cover key concepts like metadata layers, time travel, schema evolution, and partitioning — explaining how they solve real-world problems you’ve probably already encountered.
Whether you’re new to data lakes or just tired of duct-taping Hive tables together, this session will give you a clear, open source–first mental model for building better data systems — with no cloud services or vendor lock-in required.
This session will be recorded