DES is a property of AIM Media House.

Data Engineering Summit 2026 speaker presenting on stage.

SCHEDULE

MAY. 14-15 2026
Hotel Radisson Blu, ORR / Bangalore,

MAY. 14-15 2026
Bengaluru

SCHEDULE

We are in the process of finalizing the schedule for 2026. Please check back this space again.

Expect more than 60 speakers to speak at DES 2026. To explore speaking opportunities with DES, write to info@aim.media

Expand All +
  • Day 1 | Hall 1 - Thought Leadership and Strategic Insights


  • In a world where AI can build pipelines and infer schemas, the role of the data engineer is undergoing a fundamental shift. The skills that once defined the profession are being automated and that raises an important question: where does a data engineer's value truly lie? Join this session to explore how a data engineer’s role is not being replaced, but rather elevated. Understand how you can move beyond manual execution and step into higher-impact roles, shaping business outcomes, driving adoption, and applying the human judgment that no model can match.

  • Creating a single source of truth for customer data is no longer optional. This session will dive into the strategies, architectures, and governance models needed to integrate fragmented data systems into a unified ecosystem that powers seamless experiences and data-driven growth.

  • Every enterprise is rushing to deploy AI. Few have fixed the foundation underneath it. Despite years of investment in data platforms, most enterprise AI systems still hallucinate, produce inconsistent outputs, and fail to reason reliably. The problem is not the model — it is a structural gap between how data platforms were built and what AI agents actually need. This session defines the AI Readiness Gap, diagnoses where enterprises are falling short across six critical dimensions and introduces the Knowledge Layer — the most overlooked component in modern data architecture that bridges raw data assets with AI agent cognition.

  • Every enterprise wants AI, and most have more raw material than they realize. Industry research by IDC shows that while over 80% of enterprise content is unstructured, less than 1% of it is currently ready for AI consumption. This gap is what we call the "Context Crisis", which is the difference between data that's stored and data that's truly seen. Closing it is what turns models that guess into models that know, and agents that assist into agents that can be trusted with real decisions. This keynote argues that AI reliability is a data engineering problem before it is ever a model problem. Drawing from real-world failure patterns, like dark data, broken lineage, missing metadata, unversioned documents, and more, we'll explore how modern DataOps disciplines turn this crisis into a competitive edge. From intelligent ingestion and semantic layers to data products and governance as code, we'll walk through the engineering blueprint that transforms fragmented documents and unversioned information into the contextual fuel agentic AI demands.

  • Today’s data stacks are complex and fragile. AI is changing that. This talk explores how AI can automate pipelines, improve data quality, and create a more reliable, self-healing data ecosystem, offering a glimpse into the future of AI-powered data platforms.

  • This session explores how generative AI is reshaping product discovery into a more intuitive, predictive, and user-centric experience.

  • India’s DPDP Act is redefining how organizations handle personal data, making privacy and governance non-negotiable. Enterprises must move beyond compliance checkboxes to embed privacy into the core of their data architecture. And have to do it by May 2027. Skyflow enables this shift by isolating sensitive data, enforcing strong governance, and minimizing exposure. This foundation allows AI and analytics to run on protected data without compromising user trust. The result: a secure, compliant, and scalable AI data stack built for India’s privacy-first future.

  • As organizations rush to layer AI on top of their data warehouses, most are discovering an uncomfortable truth. AI-driven analytics is only as trustworthy as the metadata underneath it. Without lineage, ownership, and semantic context, even the best models confidently produce wrong answers, and no one can tell until a dashboard lies to an executive. In this talk, I'll share how we built a data catalog that serves as the backbone of our AI-driven analysis infrastructure. We'll walk through the pillars that make it work: end-to-end lineage from source systems to dashboards, blast-radius analysis that prevents silent breakage, documentation and accountability that AI agents can actually reason over, and a metrics layer that turns all of this into reliable, explainable answers. You'll leave with a concrete blueprint for why catalogs aren't documentation tools, they're infrastructure, and why getting this foundation right is the difference between AI that hallucinates and AI that works.

  • In this session, we’ll explore how large analytics enterprises can evolve from traditional data systems to AI-driven, modular architectures. The talk introduces five engineering pillars—Data Foundations, AI Agent Building (including persona-based “ask” capabilities), Agent Catalog & Monitoring, Internal Agent Marketplace, and Business Integration. Attendees will learn how data engineers can design semantic layers, build reusable AI agents for merchandising and customer analytics, and integrate them into live business workflows. The session highlights practical strategies for measuring impact, optimizing AI performance, and creating scalable solutions that drive measurable business outcomes.

  • What does data governance look like when your biggest data consumer doesn't sleep, doesn't follow SOPs, and makes thousands of decisions every second? When a majority of your users bypass official AI tools entirely? When a single ungoverned prompt can leak training data, customer PII, or regulated information across your entire enterprise? These aren't hypothetical questions — they're the operational reality of GenAI in 2026. And the governance frameworks most enterprises rely on were built for an era when data consumers were human, deterministic, and slow. This talk presents DataGovOps: governance as code, dynamic access policies, AI-assisted classification, and end-to-end lineage from source through agent output. We'll explore why legacy governance fails under agentic workloads, how to architect consent and access as programmable policy, the recursion problem of governing AI with AI, and practical patterns for staying compliant under regimes like GDPR & DPDPA — without strangling the velocity that makes AI worth deploying in the first place.

  • In the era of Generative AI and autonomous decisioning, enterprises are rapidly investing in AI models. However, the reality remains stark most AI initiatives fail not due to model limitations, but due to fragmented, unreliable, and non-AI-ready data ecosystems. This session challenges the conventional narrative of AI being model-centric and repositions data as the true foundation of AI success.

  • In this high-impact session, Anurag Chaudhuri explores how modern enterprises are evolving from traditional data pipelines to intelligent, agent-driven systems that can autonomously make and execute business decisions. Using relatable e-commerce scenarios, the talk breaks down complex concepts like Agentic AI, decision automation, and system orchestration into simple, practical insights. Attendees will learn how organizations can move beyond dashboards and alerts to build systems that not only understand data but act on it in real time, driving faster decisions, improved customer experiences, and measurable business outcomes. This session offers a clear, future-focused perspective on how Agentic Operating Systems are redefining the way businesses operate at scale.

  • Day 1 | HALL 2 - Practical Insights and Best Practices


  • AI adoption often stalls between vision and execution, leaving many enterprises struggling to realize tangible value despite significant investments. This session explores how to operationalize AI by prioritizing high-impact use cases and embedding them into core workflows—while leveraging the right mix of talent, technology, and processes to drive sustained, measurable business outcomes

  • Most enterprise AI initiatives struggle not because the models are inadequate, but because enterprise data, workflows, and execution systems were never designed for operational AI. This session explores why GenAI initiatives fail to scale beyond pilots, how enterprises must evolve from traditional data platforms to intelligent decision systems, and what operating models are required to operationalize AI successfully. The discussion will also cover the evolving role of GCCs and showcase real-world enterprise AI business outcomes.

  • This talk will introduce hybrid search with Apache Doris as a next-generation retrieval solution for generative AI and context engineering. Addressing the semantic confusion problem by combining vector search, full-text search, and SQL to capture the exact-match intent with semantic similarity, resulting in a more accurate and cost-effective solution. Matt will then touch on how the native real-time capability that Apache Doris brings to OLAP can be extended to real-time RAG, helping organizations think about future challenges in this space.

  • Data engineers are the backbone of reliable AI, yet too often their expertise is locked inside repetitive recontextualization, one-off scripts, and ad hoc workflows that don't scale. This tech talk explores how agent skills empower data engineers to break that cycle by packaging domain knowledge into portable, reusable building blocks that eliminate the need to rephrase the same logic across every new request. From ingestion to governance, attendees will walk away with practical steps and tools to build skills that give AI agents the context to act autonomously, consistently, and at production scale. So, data engineers can stop repeating themselves and start multiplying their impact.

  • How L&T Finance is strengthening its data foundation to support scalable, reliable AI use cases—focusing on data quality, governance, and real-time accessibility.

  • In high-scale platform ecosystems- where a single backend powers diverse business verticals , the traditional boundaries between "Application Engineering" and "Data Engineering" are dissolving. When processing millions of transactions daily, data integrity cannot be a post-facto batch process; it must be an architectural first principle. This session explores the "Truth Platform" pattern, a blueprint for building near real-time (NRT) reconciliation and observability into the core of a multi-tenant business platform. We will move away from the "data as exhaust" mindset and dive into three transformative engineering patterns: Cohorted Observability, Recon as a contract and Handshake Pattern

  • Data evolution is no longer limited to migration—it requires intelligent modernization. By re-architecting data platforms with governance, scalability, and AI-readiness at the core, organizations can unlock advanced analytics, ensure trusted data, and accelerate innovation. This approach establishes a resilient foundation for enterprise-wide AI adoption and sustained business value.

  • RisingWave is redefining real-time data processing with its next-gen streaming database architecture. This session explores how organizations can move beyond batch systems to build low-latency, scalable pipelines that power instant insights thereby reducing operational complexity of traditional stream processing stacks

  • Traditional data lakes often struggle to deliver reliable and actionable insights at scale. This session shares how redBus evolved to a modern lakehouse architecture, built a unified knowledge layer for context and governance, and enabled AI-powered, natural language data access—transforming its data platform into a real-time, decision-centric system.

  • Traditional data pipelines were designed to move and transform data efficiently—but in today’s AI-driven world, that is no longer enough. This session explores the evolution from DataOps to AI-native pipelines, where data systems are not just enablers but active participants in intelligence generation. We will dive into how modern architectures integrate real-time data processing, feature engineering, model deployment, and feedback loops into a unified AI pipeline. Attendees will gain insights into designing scalable, reliable, and cost-efficient systems using technologies like Spark, Kafka, and cloud-native platforms.

  • Day 1 | Hall 3 - Closed Door Workshops


  • Your warehouse got you this far. Your lake promised flexibility. But the future lives in between — the lakehouse. This workshop tears apart the modern data stack and rebuilds it. You’ll go deep on Delta Lake, Apache Iceberg, ACID transactions, and schema evolution at scale. You’ll discover why semantic layers are quietly becoming the most important piece of the puzzle — turning data chaos into self-service analytics teams actually trust. And because no data platform in 2026 is complete without AI, we’ll design architectures where feature stores, ML pipelines, and analytics don’t just coexist — they reinforce each other. Bring your SQL skills and curiosity. Leave with a blueprint you can use on Monday.

  • Your pipelines work — until they don’t. And when they break at 2 AM, nobody knows why. This workshop fixes that. You’ll learn how to bring real engineering discipline to data — CI/CD, automated testing, version control, and observability applied to pipelines the way software teams have done it for years. We’ll get hands-on with orchestration frameworks like Airflow and Dagster, build data quality checks that catch issues before your stakeholders do, and implement lineage tracking so you always know what went wrong and where. We’ll also connect the dots to LLMOps and ML pipelines, because reliable AI starts with reliable data. Bring your Python or SQL chops. Leave with pipelines that don’t page you at midnight.

  • Day 2 | Hall 1 - Thought Leadership and Strategic Insights


  • Every enterprise is racing to deploy AI agents. Very few have asked the harder question: can your data infrastructure actually support them?In this session, we'll explore what the agentic era actually demands from a data platform — and why most existing architectures hit a wall before they get there. We'll walk through the architecture of the Agentic Data Stack — how ClickHouse unifies OLTP, real-time analytics, observability, and agentic AI workloads on a single platform — and why eliminating the old trade-offs between speed and scale, simplicity and volume is no longer optional. It's the baseline.

  • Many organizations have made bold investments in AI, yet the journey from strategy to real-world execution remains a significant challenge. This session explores how enterprises can bridge the gap between ambition and impact, moving beyond pilots to scalable, production-ready AI. From aligning business and data teams to building the right infrastructure, governance, and talent, discover what it truly takes to operationalize AI and deliver measurable outcomes.

  • AI prototypes might take minutes to develop yet to make them enterprise-grade requires a lot more thought, rigour, process, and tooling investments. In this talk, we will explore the key building blocks to design, develop, deploy, and operate production-worthy AI use cases for the enterprise. We’ll look at how to think about the quality attributes, guard-rails, observability, and many critical elements that will be essential for your AI projects to go from prototypes to production-worthy investments.

  • Enterprises are moving beyond traditional data foundations toward real-time, AI-ready ecosystems. This session explores how to modernize data architectures to enable faster insights and intelligent decision-making to solve business use cases at scale.

  • Internal data platforms are becoming a core competency. The panel can explore how data engineering is borrowing from DevOps and platform engineering self-serve infrastructure, developer portals, golden paths and whether it's working in practice.

  • As GenAI & Agenitc systems are increasingly built on complex data pipelines, real‑time platforms, and shared data products, traditional governance models struggle to keep pace. For data engineering leaders, the challenge is no longer limited to data quality or access controls—it is about ownership across pipelines, platforms, and AI outputs. This session examines how modern data engineering organizations are redefining accountability, control, and trust for GenAI‑driven systems. Focusing on platform design, data contracts, lineage, observability, and privacy‑by‑design, the talk presents leadership‑level patterns that embed governance directly into data and AI pipelines for Enterprise scale—enabling scale, reliability, and responsible AI adoption without slowing down engineering velocity.

  • Many data platforms fail to deliver impact due to fragmented data and lack of clear business alignment. This session focuses on what truly moves the needle, use-case-driven strategies, better data quality, and integrated ecosystems that deliver real outcomes.

  • In the era of AI-driven decision-making, data quality has become more critical than ever. This session explores how poor-quality data can amplify errors at scale, leading to flawed insights and risky outcomes. It will highlight the importance of robust data governance, validation frameworks, and continuous monitoring to ensure AI systems are built on reliable, high-quality data.

  • What if your data pipeline could generate user intelligence before a single real user touches your product? This session introduces Synthetic Users - AI agent swarms that simulate real human behaviour at scale and the data pipeline architecture that transforms their interactions into actionable product intelligence in hours, not months.

  • As AI rapidly reshapes how organizations build intelligence, this session explores what it truly means to go beyond the algorithm. Drawing from real enterprise experience at one of India’s most iconic consumer brands, the talk traces the evolution from classical analytics and ML to multimodal GenAI and agent‑driven decision systems. Using the end‑to‑end product lifecycle as a lens from design and manufacturing to go‑to‑market and continuous intelligence - it demonstrates how AI delivers impact only when grounded in strong data foundations, business context, and governance. The session also reflects on the changing role of data scientists into AI orchestrators, highlighting why system thinking, judgement, and business storytelling matter more than ever in the age of AI.

  • As audio data grows exponentially, building scalable storage systems becomes critical for efficient search and retrieval. This session will explore architectures and strategies to handle high-volume audio data while ensuring performance, reliability, and cost efficiency.

  • By adopting a lakehouse architecture, BigBasket transformed its data ecosystem into a unified, cost-efficient, and near real-time platform built for scale and reliability. The lakehouse enables trusted and well-governed, end-to-end data lifecycle management—from streaming ingestion to governed analytics—shifting operations from D-1 reporting to near real-time decision-making. Built on principles of interoperability and incremental evolution, this foundation powers critical use cases such as on-time delivery, inventory optimization, personalization, and smart store intelligence—creating a future-ready, data-to-action platform that supports 4x business growth.

  • Day 2 | HALL 2 - Practical Insights and Best Practices


  • A story of taming fragmented, unstructured healthcare data to unlock scalable AI and measurable patient impact.

  • Modern data stacks often promise a simple idea: a single query layer that can unify access across databases, data lakes, and SaaS sources. Engines like Trino make this vision technically feasible—allowing teams to query MySQL, MongoDB, S3-based lakehouses, and even spreadsheets through one interface. This talk shares a practitioner’s perspective on what it actually takes to make that promise work inside a regulated fintech environment, where data residency, auditability, and access control are not optional. Drawing from real-world experience building a federated query layer over heterogeneous systems, it explores the gap between architectural elegance and operational reality. The session will highlight challenges that emerge at scale: inconsistent semantics across sources, unpredictable query performance, governance complexities with role-based access, and the risk of turning federation into “distributed chaos.” It will also cover the practical guardrails required to make such a system usable—data contracts, curated layers, controlled abstractions, and user-focused enhancements like custom functions. Rather than presenting federation as a silver bullet, this talk reframes it as a powerful but disciplined capability. Attendees will leave with a clearer understanding of when a unified query layer accelerates data access—and when it quietly amplifies complexity.

  • In a world of always-on connectivity, telecom systems generate massive streams of real-time data every second. This session explores how modern data platforms process, analyze, and act on this data instantly, powering everything from network optimization to personalized customer experiences at scale

  • As data ecosystems evolve, building and operating reliable, maintainable, and scalable data pipelines becomes increasingly complex. This session introduces a modern shift in data engineering: a zero-code ETL platform where users define what the pipeline should do, and data engineers define how the platform should handle its execution at scale. It essentially abstracts pipeline complexities behind an intuitive UI and then standardised configurations. We extend this architecture with an LLM-powered segmentation layer on top of the data warehouse, turning raw data into actionable insights. It converts high-level user intent into SQL queries and downstream pipelines, allowing business users to run experiments on their own—without depending on engineering teams or facing bottlenecks. Zero-code ETL: users see magic, engineers maintain the illusion.

  • As organizations transition to AI-first models, building scalable and reliable data pipelines becomes critical. This session explores how real-time data platforms enable faster decision-making, seamless data flow, and robust AI deployment at scale.

  • Every enterprise is building something expensive. Lakehouses, real-time pipelines, RAG systems, agentic workflows — the data stack has never been more sophisticated, or more costly or more easier to build. Yet ask most data engineering teams what their platform costs per insight, per model inference, or per AI decision and you will struggle to find the right answer. With AI advancing almost quarter by quarter, the next competitive advantage in data engineering is not a better pipeline framework or a smarter vector store — it is engineering economics and AI economics. The ability to measure, attribute, and optimise the cost of intelligence at scale. Whether you are a data engineer designing pipelines, a platform lead managing infrastructure, or head of AI/ Data / Analytics justifying AI investment to the board — this session will give you a concrete language for the conversation that every data team is avoiding: what does our intelligence layer actually cost, and is it worth it? Target audience: Data Engineers, Platform Leads, CDOs, Head of Data Category tags: FinOps, Cost Engineering, AI Infrastructure, Platform Economics Key takeaway: A reusable framework for unit economics on data + AI platforms

  • Most startups bolt on privacy later. Designing systems where PII is isolated, tokenised, or abstracted early. This can include schema design, access control layers, and service boundaries.

  • DataOps gave us insights. DecisionOps makes them execute. The goal has changed from data to decisions & AI-native platforms are accelerating decision-making across the value chain. Let us hear about how is that advancing clinical research, transporting life saving medicines on time & saving lives!

  • AI Agents for Data Platforms: Discover, Model, Analyze Building AI-ready data platforms involves transforming traditional data systems into intelligent, self-service ecosystems powered by automation and AI. The foundation lies in creating an automated data catalog that continuously ingests, organizes, and enriches metadata, enabling better data discovery, governance, and contextual understanding through platforms Complementing this, AI-driven data modeling agents automate the identification of fact and dimension tables, generate optimized analytical schemas, and improve performance through intelligent recommendations. Together, these components create a unified, feedback-driven architecture that accelerates analytics, reduces manual effort, and enables scalable, AI-powered decision-making across the organization. On top of this, self-service analytics using Text-to-SQL empowers business users to interact with data using natural language, leveraging advanced LLMs such as Qwen, Gemini, and OpenAI GPT models to generate accurate, context-aware queries.

  • This session covers how organizations can use Generative AI in CI/CD pipelines while maintaining the right balance between fast delivery, strong security, and cost efficiency. It will highlight simple ways to scale GenAI in production without compromising performance or control.

  • Day 2 | Hall 3 - Closed Door Workshops


  • Everyone’s talking about prompts and models. Nobody’s talking about the pipes feeding them. Here’s the truth — your LLM is only as good as the data infrastructure behind it. This workshop puts data engineers where they belong: at the center of the AI stack. You’ll build the pipelines that power retrieval-augmented generation — from ingestion and transformation to embedding generation and vector store delivery, using tools you already know like Airflow and dbt. We’ll tackle the unsexy but essential stuff: dataset versioning, embedding lifecycle management, and keeping knowledge bases fresh so your LLM doesn’t hallucinate last quarter’s answers. Bring your Python or SQL skills and a basic understanding of embeddings. Leave knowing how to build the data backbone that makes AI actually work.

  • You shipped your LLM app. Congratulations. Now the hard part starts. Because production AI isn’t a launch — it’s a living system that drifts, degrades, and surprises you in ways traditional monitoring never prepared you for. This workshop is about what happens after deployment. You’ll learn to instrument the full pipeline — prompt telemetry, retrieval accuracy, embedding quality, and model output evaluation — using tools like LangSmith and Weights & Biases. We’ll dig into the problems nobody warns you about: dataset drift, prompt regressions, and retrieval pipelines that silently rot. Most importantly, you’ll build automated feedback loops that don’t just detect issues but make your system smarter over time. Bring experience with AI pipelines and monitoring tools. Leave with the operational playbook that keeps your LLM reliable long after the demo hype fades.


Our Pricing will change soon!

  • Workshop Pass

    Everything in Standard, plus:
  • Exclusive Half-day AI Workshops during the conference (Both Day 1 & Day 2)
  • VIP Pass

    Everything in Workshop, plus:
  • Dedicated Whatsapp Support (before, during, and after the show)
  • VIP check-in
  • Exclusive Platinum Lounge Access - A lounge for VIP pass holders and Speakers only!
  • Priority Lunch area
  • Post event recordings
  • Goodies bag with Exclusive Merchandise
  • 1 Year Digital Subscription of AIM
  • 30000

1000+

Attendees

50+

Speakers

5th

Edition

explore the frontiers of Data engineering.

Focused on data engineering innovation, this 2-day conference will give attendees direct access to top leaders & innovators from leading tech companies who will talk about the software deployment architecture of AI systems, how to produce the latest data frameworks and solutions for business use cases.

Speaker at Data Engineering Summit 2026, sharing insights on data technology and innovation.
Award-winning Data Engineering Project at Summit 2026.

The Finkelstein Awards for Data Engineering Excellence 2026

Secure Your Seat at the Frontier of Data + engineering.

MAY. 14-15 2026
Bengaluru