The Ravit Show
Posts
Subsurface World Tour, Context for AI Agents, Data 3.0 in the Lakehouse era

Subsurface World Tour, Context for AI Agents, Data 3.0 in the Lakehouse era

Ravit Jain
September 02, 2025

Dremio is on the move this Fall — and you’re invited!

From hands-on workshops to our global Subsurface World Tour, we have plenty in store.

🌍 Subsurface 2025: The World’s Premier Lakehouse Community Event
This October and November, Subsurface returns with a focus on the agentic AI era — exploring how AI agents are reshaping enterprise data strategy.

📅 Mark Your Calendar

Paris — October 14
London— October 16
Nuremberg — October 29
San Francisco — November 6
New York City — November 13

⚡ Unlock the Future of Data Innovation with Free Workshops
Take your skills to the next level in our free, instructor-led, hands-on sessions — all in fully managed environments (no setup needed).

Apache Iceberg Lakehouse Workshop — Build and query Iceberg tables, run SQL in Dremio, and explore autonomous optimization for modern lakehouses. [Learn More]
Agentic AI Workshop — Connect AI agents like Claude to enterprise data, structure insights with Dremio’s semantic layer, and deliver sub-second GenAI responses. [Learn More]

Reserve your spot today — space is limited!

Bigger models won’t win the AI race.

Context will. For the past few years, the conversation in AI has been dominated by scale. Bigger models, more GPUs, massive benchmarks.

But by 2026, I think the power of raw models will be widely accessible, through cloud providers, open source, and enterprise licensing.

Access won’t be the issue.

The real moat will shift to context.

Think about it:

- A banking agent without access to transaction histories and regulatory frameworks is just a toy. Add context, customer behaviors, historical trading patterns, compliance rules, and it becomes indispensable

- A hospital agent without patient records is a liability. But with medical history, treatment protocols, and local guidelines, it becomes trusted

- A retail agent without inventory and supplier data can’t help. Give it purchase trends, logistics feeds, and pricing rules m, suddenly it can optimize your supply chain

Context turns raw intelligence into reliable decision-making.

And context isn’t one-dimensional. It’s a combination of:

- Corporate knowledge: documents, contracts, codebases, logs

- Domain expertise: the workflows and rules unique to each industry

- User memory: the ability to recall past interactions, preferences, and history

- Tool use: the capacity to pull the right information from the right system at the right time

By 2026, companies that control these context pipelines will define the next generation of AI. Models will be commodities. Context will be the currency.

We won’t be asking “who has the best model?”
We’ll be asking “who delivers the richest context at the moment of need?”

Let’s do this! I speak to so many leaders and get so many insights into how the space is evolving!

“Data 3.0 in the Lakehouse era,” using this map as a guide. Data 3.0 is composable. Open formats anchor the system, metadata is the control plane, orchestration glues it together, and AI use cases shape choices.

Ingestion & Transformation -
Pipelines are now products, not scripts. Fivetran, Airbyte, Census, dbt, Meltano and others standardize ingestion. Orchestration tools like Prefect, Flyte, Dagster and Airflow keep things moving, while Kafka, Redpanda and Flink show that streaming is no longer a sidecar but central to both analytics and AI.

Storage & Formats -
Object storage has become the system of record. Open file and table formats—Parquet, Iceberg, Delta, Hudi—are the backbone. Warehouses (Snowflake, Firebolt) and lakehouses (Databricks, Dremio) co-exist, while vector databases sit alongside because RAG and agents demand fast recall.

Metadata as Control -
This is where teams succeed or fail. Unity Catalog, Glue, Polaris and Gravtino act as metastores. Catalogs like Atlan, Collibra, Alation and DataHub organize context. Observability tools—Telmai, Anomalo, Monte Carlo, Acceldata—make trust scalable. Without this layer, you might have a modern-looking stack that still behaves like 2015.

Compute & Query Engines -
The right workload drives the choice: Spark and Trino for broad analytics, ClickHouse for throughput, DuckDB/MotherDuck for frictionless exploration, and Druid/Imply for real-time. ML workloads lean on Ray, Dask and Anyscale. Cost tools like Sundeck and Bluesky matter because economics matter more than logos.

Producers vs Consumers -
The left half builds, the right half uses. Treat datasets, features and vector indexes as products with owners and SLOs. That mindset shift matters more than picking any single vendor.

Trends I see
• Batch and streaming are converging around open table formats.
• Catalogs are evolving into enforcement layers for privacy and quality.
• Orchestration is getting simpler while CI/CD for data is getting more rigorous.
• AI sits on the same foundation as BI and data science—not a separate stack.

This is my opinion of how the space is shaping up. Use this to reflect on your own stack, simplify, standardize, and avoid accidental complexity!!!!

🔍 Stay Ahead in AI & Data! Join 137K+ Data & AI professionals who stay updated with the latest trends, insights, and innovations.

📢 Want to sponsor or support this newsletter? Reach out and let's collaborate! 🚀

Best,

Ravit Jain

Founder & Host of The Ravit Show