Data Engineering Summit 2024

India's first & only conference dedicated to the emerging field of Data Engineering


30 – 31st May 2024
Hotel Radisson Blu, Bengaluru


We are in the process of finalizing the schedule for 2024. Please check back this space again. Expect more than 50 speakers to speak at DES 2024. To explore speaking opportunities with DES, write to

Expand All +
  • Day 1 | HALL 2 - Practical Insights and Best Practices

  • Data is essential but the whole ecosystem is fragile, and as we are moving from era of consent to the cookieless world, data is also getting scarce. As consumers are becoming more tech savvy and regulations are getting tighter, for good reasons, it raises a legitimate question on the ability of brands to target and personalize experience for customers. In this era of ever-changing regulations to address data privacy and yet connect with users, it becomes an opportunity for Data Engineers to solve for: Creating first party data platform Developing the ecosystem of tools, technologies and data flow pipelines that can help targeting under stringent data governance regulations How to create Privacy safe APIs across self-owned and partner-owned data sharing platforms Helping to measure impact both effectively and accurately In the session we are going to cover the topics of First Party Data Platform, Data Clean Room, Conversion APIs along with identity mapping algorithms to build a future proof data ecosystem to empower organizations to help them target, personalize and measure their strategies in the evolving data governance landscape.

  • The abundance of data in today's world has transformed the way businesses operate, and MarTech has become a crucial part of a business's success. In this session, we will explore how data is essential to making AI and ML effective in the field of MarTech and provide real-world examples of how data-driven marketing strategies have led to increased customer engagement, higher conversion rates, and greater revenue growth. Discover how retail customers are leveraging data to drive their MarTech strategies and take their marketing efforts to the next level.

  • Modern businesses have vast amounts of data at their fingertips and are acutely aware of how enterprise data strategies positively impact business outcomes. Despite this, only a handful of organisations interact with all stages of the data life cycle process to truly distill information that distinguishes future-ready businesses from the rest. Companies have not treated the collection, movement, and tracking of data as a first-class problem resulting in an unmanageable number of disparate point-to-point integrations. It’s time to learn how to tame your streaming data pipelines and simplify your data movement.

  • We will be discussing scalable data lake solutions that can accelerate the decision-making process for descriptive or advanced analytics. We will be discussing two use cases. The first use case will be about a streaming solution that can efficiently ingest, process, and score prospective customers/leads to enhance the overall customer experience. The second use case discusses the creation of a semantic layer within Azure Synapse that can facilitate reporting on leads, target marketing, and conversion funnel metrics.

  • Traditionally, data landscapes in an enterprise have been tech heavy and ensured that they are catering to the business needs as and when they evolved. However, with the higher rate of data generation through various mediums and shrinking time to make a data driven decision, approach to data management has also gone through a constant change. Slowly industry is converging towards a trend where we are generating more “closure to business” data assets. I’ll be sharing my first-hand encounter of the evolving data ecosystem, and would love to have you as my audience.

  • This is a hands-on interactive technical workshop on real-time analytics using SingleStoreDB on AWS. Ramanuj Vidyanta from AWS and SingleStore’s Sr. Solutions Consultant Rakesh Puttaswamy will guide through the development of an operational real-time dashboard for a digital marketing use case. We’ll use SingleStoreDB Cloud on AWS, Amazon S3, and Amazon Quicksight to drive real-time analytics from streaming data. Discussion Topics How to build an operational database based on a real-time digital marketing use case How to use SingleStoreDB Cloud on AWS, Amazon S3 and Amazon Quicksight *Pre-requisites: Laptop with Wi-fi connectivity (Wi-fi is available onsite)

  • What is building? • Building a logistics service using Event-Driven architecture • Event sourcing for better inter-process communication in • Microservice-based architecture • Building Data Lake using a message broker to decouple the application dependency with Data Lake pipelines. Why are we building? • Reduce operating costs • Reduce ease of management by using Kafka infrastructure. • Denormalize real-time data in motion to take comprehensive decisions quickly, as well as store this data in Data Lake for analytical purposes with reduced cost while querying. How are we building? • Democratizing the Data Pipelines and Event sourcing with the Kafka-Connect ecosystem • Change data collection using Kafka-Connect Source connector • Using kafka State-Stores for Data Denormalization in kstreams • Doing the entire Data Lake integration using an S3 sink connector • Setting up Inter Process communication using various Kafka-connect connectors like HTTP, sns, dynamodb etc. What benefits it will bring? • Ease of management on managing the entire Kafka-connect framework, especially on a large-scale setup with observability • Low overhead of running partition redistribution, auditing etc on Kafka cluster. • Realtime Data Aggregation and Denormalization for taking comprehensive real-time decision-making. • Ease of integration in microservice-based architecture especially around event sourcing to help to take decisions in near real-time Why we are unique? • Doing real-time computation through our Addfix and Geocoder engine to correctly identify the address with lat-long to deliver the shipment at a defined location and all this is happening through Kafka. • Empowering operational dashboards to take decisions in near real-time to plan the operations accordingly on the ground. • Created a near real-time data lake to take decisions quicker and faster What did you learn? How others can be benefited • Extendable and developer-friendly ecosystem to onboard and get started with real-time data very quickly • We have powered multiple applications/3rd party applications/data lake all through event sourcing in Kafka, without the need to worry about scale.

  • Day 1 | Main Hall - Thought Leadership and Strategic Insights

  • As the data stack evolves into a sophisticated mesh of technologies (no pun intended), and AI is reaching the hands of billions of people with mobile devices, ChatGPT, Midjourney, etc., data platforms and pipelines have become the heart and blood vessels of organizations. Data engineering ensures that data that is stored, transformed, and consumed is reliable. Let us talk about data engineering in this bold new AI age.

  • In today's business landscape, data is a key differentiator that separates thriving companies from those falling by the wayside. Despite increasing investments in D&A capabilities to support business needs, multiple industry surveys from McKinsey & NewVantage Partners highlight the gap between the words of business & analytics – while 92% of data leaders say they deliver real business value, only 39% of business leaders agree. Something’s not working! As a D&A leader, if you want to make your business leaders successful or if you don’t want to be amongst the 85% of companies that fail to realize any value from AI in their day-to-day operations, you must think differently. In this session, we present the shifts in data & solution engineering that the industry must make to bring value to your business clients in what is going to be an exciting, but chaotic 24 months.

  • The definition of batch is fast changing with quickly maturing technology landscape in Cloud, batch is no longer about processing when business is away. It is about ingesting as the transactions occur, structure the data to be ready for consumption quickly. ETL paradigm has shifted into ELT and a lot of Enterprise tools have demonstrated benefits of ELT in place of ETL. Over the last 10 years, with transactional systems ecosystem gaining more maturity, T in ETL / ELT is no longer part of the conversation but significant investments in changing the way data is extracted, loaded, and managed. With low-code, no-code and zero ETL frameworks, E & L also may see the fate of T in the years to come.

  • Hundreds of millions of people around the world consume Condé Nast’s content across the company’s digital assets, generating trillions of data points and potential insights into their wants and needs. Correctly unlocking this rich data would allow the business to deliver powerful, personalized and relevant experiences that inform and delight — while opening up new, sustainable revenue streams. Join this session to find out more.

  • Data engineering and Generative AI's synergies can create powerful solutions that enable enterprises to make better decisions, automate processes, and improve customer experiences. Join us for an interactive session and demo to explore how enterprises can build, customize, and deploy these powerful solutions anywhere.

  • Recently, a lot of intelligence is being built into data engineering processes and platforms. If machines can become intelligent with the advent of the IoT, why not data platforms? This session will discuss the radical shift we are seeing in embedding intelligence into data engineering. Given the broad scope of this topic, we will focus on the key use cases that the industry is deploying today to drive efficiencies in the creation /management of data platforms and thereby improve confidence in the data provided to business users.

  • Explore how technology has evolved in sports, especially badminton, and the role data plays in this evolution.

  • Day 2 | HALL 2 - Practical Insights and Best Practices

  • In today's fast-paced world, where every millisecond counts, the ability to predict outcomes quickly and accurately is crucial. This is particularly true in fields such as finance, healthcare, and e-commerce, where timely decisions can mean the difference between success and failure. In this talk, we will explore the challenges associated with achieving millisecond prediction latencies, and discuss strategies for cost-effective training of predictive models. We will also examine techniques for ensuring the sanity of offline versus online performance of prediction models, as well as ways to enable faster experimentation through multiple versions of prediction models. Additionally, we will discuss the problem of model degradation due to data drifts or assumption changes, and explore solutions for addressing this issue. Finally, we will delve into the complexities of solving for batch and real-time use cases of prediction models, and discuss best practices for achieving optimal performance in both scenarios. Join us for an informative and thought-provoking discussion on the latest developments in predictive modelling, and learn how to stay ahead of the curve in this rapidly evolving field.

  • The global market for cloud migration services is expected to grow at a CAGR of 24.3%, reaching USD 340.7 billion by 2028 from USD 92.4 billion in 2021. However, traditional data management methods are often inefficient, costly, and error-prone, and organizations must keep up with evolving trends in cloud services. Major challenges include streamlining the end-to-end data lifecycle process, collaboration between different teams, creating a conceptualized framework for Ops-Governance of data, managing and scaling infrastructure, securing data and ensuring compliance with regulations, monitoring and alerting systems, along with adapting to new technologies and evolving business needs. Interestingly, data only accounts for 49% of enterprise decisions, while insights account for 10%; there is clearly room for growth in both categories. In this session, we take you through the journey on how to manage enormous amounts of data effectively and proactively, for identifying relevant opportunities, driving visibility, building reliability, and achieving scalability in an organization.

  • The data access layer, which plays a crucial role in enterprise applications reliant on relational databases, has evolved significantly over the past two decades. Various innovative solutions have emerged, enhancing the security, efficiency, and productivity of applications and relational databases. Notably, SQL Mappers such as Spring JDBC and MyBatis, as well as JPA (Hibernate ORM), have dominated the market for an extended period. However, in today's cloud-native landscape, fresh challenges have emerged, and Compile Time ORM has emerged as a novel perspective on these established solutions. In this talk, we will explore the latest trends in the data access layer, taking into account the changing requirements of modern application development.

  • The ‘T’ in ETL or ELT Pipeline A tool to equip the business to build its own data marts using SQL/Dbt and create dashboards to drive insights much faster with almost no dependency on the data engineers. A Data Transformation Pipeline to transform the data, which is built using DBT(data build tool) running on serverless containers (AWS Fargate) using Airflow as a workflow orchestrator.

  • Probabilistic data structures are a type of statistical algorithm designed to optimize the use of memory in storing and querying large datasets. These structures employ probabilistic algorithms to estimate the presence of elements in a dataset with a high degree of accuracy while minimizing the amount of memory required for storage. In this session, we will explore some fundamental analytical questions that, when answered accurately for very large datasets, require substantial resources and cost. This is particularly relevant in streaming data use cases such as real-time monitoring, fraud detection, social media analytics, and online advertising, where the timely availability of analytics takes precedence over their accuracy.

  • By implementing a semantic data layer, new businesses can achieve several benefits: - Improved data quality: Faster time-to-insight: A semantic layer can help reduce the time it takes to extract insights from data by making it easier to access and use. - Greater agility. - Improved collaboration. Overall, a data semantic layer can be a valuable investment for new businesses that want to make the most of their data and gain a competitive advantage in their industry.

  • Day 2 | Main Hall - Thought Leadership and Strategic Insights

  • The talk will focus on digital transformation key components, business agility and innovation, business innovation and three facets of digital transformation, five principles of business agility and the role of data engineering in digital transformation.

  • Competing in the real-time economy requires instant insights – and you can’t get them with manual data integration. Many businesses are now implementing data modernization, a strategy that enables them to take advantage of the latest innovations in the cloud without disrupting business-critical legacy processes and applications. Forward-thinking businesses want to leverage today's most advanced analytics platforms as well as affordable, scalable cloud services, and modernizing their legacy systems is essential for doing that. For example, providing a 360-degree view of customers to the front-line support team requires real-time data replication from the various source systems. And traditional batch-oriented integration approaches won’t meet those needs. Qlik Data Integration offers an automated data fabric that delivers reliable, analytics-ready data in near-real time with a low-code approach.

  • Data and analytics leaders seek more efficiencies in the data management assets across the organizations, figuring out approaches to unlock better business insights for their organizations and making the data assets a differentiator from the competition. Organizations are looking at the decentralized, low point of failure approaches, data product-based approach to minimize the risks and better, quicker ROI Data fabric is a design concept that serves as an integrated layer (fabric) of data and connecting processes. The fabric presents an enterprise-wide coverage of data across applications that are not constrained by any single platform or tool restrictions. Data Mesh is a socio-technical concept that addresses the common failure modes of the traditional centralized data warehouses or lake architecture, with a shift from the centralized paradigm to distributed architecture considering domains as a first-class concern, applying platform thinking to create a self-service data infrastructure, treating data as a product, and implementing open standardization to enable an ecosystem of interoperable distributed data products. Which approach best suits building a future-proofed data platform? How organizations could use these concepts to transform the legacy data ecosystems. How mature are the tools/technologies in the market that enables these concepts?

  • The talk would focus on What does Data Engineering mean in the gambit of Digital Transformation? Business First Approach to Data Engineering ROI of Data Engineering : How can DE enable business agility and drive innovation using a use case

  • Industrialization of analytics can be achieved by implementing DataOps to get the data ready followed by institutionalizing MLOps at scale and finally applying AIOps in post-production environment to run the business. Out of all 3 dimensions mentioned here, DataOps continues to play foundational and the most crucial role to apply analytics in scale for a business context. To realize maximum value of DataOps we need to monitor and control a set of metrics aligning each phase of Data lifecycle. We would like to demonstrate our learning after working with a set of client business problems and solving the same using DataOps at its core.

  • This talk entails -- * A typical batch data pipeline * Use case for real-time ingestion * A typical real-time data pipe * Optimizing a batch pipe versus migrating to real-time

  • This talk will cover the challenges of dealing with mistakes, inconsistencies, and subjectivity in human-labeled datasets. In this talk, we will discuss how to build, use, and secure representative datasets for AI problems, taking a special attention to crowdsourced data and data obtained from in-house annotation teams. We will start with typical issues of crowdsourced and human-labeled datasets, such as annotator biases and differences in their backgrounds. Then, we will focus on the annotator disagreement problem and answer subjectivity problem. We will present business case studies of how these problem are addressed in practice, leading to the creation of useful training datasets. We will also discuss Web-scale dataset poisoning problems and the ways to ensure the sustainability of the once created dataset. Finally, we will tackle the problem of learning from such data, showing convenient open-source tools for improving machine learning model quality.

Grab your ticket for a unique experience of inspiration, meeting and networking for the Data engineering industry

Book your tickets at the earliest. We have a hard stop at 500 passes.
Note: Ticket Pricing to change at any time.

  • Regular Pass

    Available from 12th Apr 2024 to 10th May 2024
  • All access, 2 day passes
  • Group Discount available
  • 4999
  • Late Pass

    Available from 11 May 2024 onwards
  • All access, 2 day passes
  • No Group Discount available
  • 6999