Python support of Iceberg seems to be the biggest unrealized opportunity right now. SQL support seems to be in good shape, with DuckDB and such, but Python support is still quite nascent.
i'm working on a project to do this with iceberg and sqlmesh executed via airflow at my job. sqlmesh seems really promising. i investigated multi-engine executions in dbt and it seems like you need to pay a lot of $$$ for it (multi-engine execution requires multiple dbt projects) and is not included in dbt core.
This article is about building open data lakehouse with the new open table format namely Iceberg.
For building single engine AWS based data lake house you can refer to this article [1], or just use Amazon Sagemaker that also support Iceberg.
Fun Amazon AWS data storage dictionary:
S3: Data Lake
Glacier: Archival Storage
DocumentDB: NoSQL Document Database ala MongoDB
DynamoDB: NoSQL KV and WC Database
RDS: SQL Database
Timestream: Time-Series Database
Neptune: Graph Database
Redshift: Data Warehouse
SageMaker: Data Lakehouse
Islander: Data Mesh (okay kidding, just made this up)
[1] Build a Lake House Architecture on AWS:
https://aws.amazon.com/blogs/big-data/build-a-lake-house-arc...
I am getting
An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
error when trying to run
aws s3 ls s3://mango-public-data/lakehouse-snapshots/peach-lake --recursive
This entire stack also now exists for arrays as well as for tabular data. It's still S3 for storage, but Zarr instead of parquet, Icechunk instead of Iceberg, and Xarray for queries in python.