Data Engineering Summit 2024

[WHEN]

MAY. 30 31
2024

DES is Organized by AIM

Hotel Radisson Blu
Bengaluru India

Focused on data engineering innovation, this 2-day conference will give attendees direct access to top engineers & innovators from leading tech companies who will talk about the software deployment architecture of ML systems, how to produce the latest data frameworks, frameworks and solutions for business use cases.

1000+ Attendees

50+ Speakers

3rd Edition

India's first & only conference dedicated to the emerging field of Data Engineering.

2024 SCHEDULE

We are in the process of finalizing the schedule for 2024. Please check back this space again. Expect more than 50 speakers to speak at DES 2024. To explore speaking opportunities with DES, write to info@analyticsindiamag.com

Expand All +
  • Day 1 | HALL 2 - Practical Insights and Best Practices


  • The data landscape is in a constant state of flux, demanding ever-more agile and scalable data processing solutions. While the Lambda architecture has served well, the Kappa architecture emerges as a powerful evolution. This session, designed for senior data leaders, explores the fundamentals and nuances of the Kappa architecture and its potential to revolutionize data pipelines. We'll embark on a journey beyond the Lambda architecture, unpacking the core principles of Kappa. You'll gain insights into how Kappa streamlines data processing by unifying batch and real-time processing into a single, continuous flow. This approach eliminates the complexity of managing separate Lambda layers, fostering a more agile and maintainable data pipeline. The session dives deep into the technical aspects of Kappa, including: Real-time Stream Processing: Leveraging powerful stream processing engines for low-latency data ingestion and transformation. Stateful Stream Processing: Enabling complex event processing and state management within the streaming pipeline itself. Simplified Data Pipelines: Reducing code duplication and operational overhead through a unified processing approach. Tailored Delivery Options: Providing flexibility to deliver data in real-time, near real-time, or in batches based on specific use cases. This session is ideal for data Leaders, data architects and data engineers seeking to push the boundaries of data processing agility. We'll explore real-world applications and best practices, empowering you to architect data pipelines that are not only scalable but also adaptable to the ever-changing needs of your organization.

  • Discover the strategic advantages of modernizing data architecture on AWS in the age of big data. Explore best practices for architecting scalable data pipelines, optimizing storage and processing, and harnessing advanced analytics and machine learning. Gain insights into emerging trends and technologies and learn how to unlock new opportunities for data-driven growth on AWS. The talk will cover: - Introduction to Modern Data Architecture on AWS - Rationales for Modernizing Data Architecture on AWS - Architecting Scalable Data Pipelines on AWS - Optimizing Data Storage and Processing - Future Trends and Considerations

  • Over the past couple of years, we have experienced how Analytics and Machine learning have come into action in several sectors like E-commerce, health, education, Finance, and Agriculture, and organisations could see tremendous value out of data-driven decision-making. Although we adopted distributed computing platforms in building analytics services, the outlook on unlocking insights from analytics has still been traditional, that is, by building data warehouses and data marts. The Enterprise Data Warehouse (EDW) technologies were able to integrate and harmonize data, enabling BI analysts and users to extract information reliably, but flexibility and addressing the evolving data needs have been a constant challenge. In this talk, I would like to reveal how some hidden patterns could be extracted by realising the problems as Graphs. I'll in-brief state some of the limitations of the existing EDW and how these could be addressed through OLAP Graph technologies. OLAP graph technologies and their implementation, known as knowledge graphs, can link various heterogeneous data sources. I would also like to take you through some of the real-world challenges addressed by embedding Graph + AI design principles into our strategy. The real data challenges in our day-to-day work revolve around entities and their respective attributes. The talk will include: 1. Why Graph + analytics ? 2. What problems could we realise as a Graph? 3. Graph Technology for Data Integration 4. Defining consumable patterns for analysts and business stakeholders. 5. Use-cases Entity resolution in E-commerce 360-degree view of customers Improve enterprise decision-making by enabling cross-channel communication Deduplicating entities 6. OLAP Graph Data Warehouse

  • How Aquaconnect leverage satellite imagery to monitor coastal regions, enabling us to data driven decisions which drives sustainable aquaculture development. This bird's-eye view allows for precise and timely decision-making, ensuring optimal conditions for aquatic species. Approach integrates advanced remote sensing technologies with artificial intelligence to provide unparalleled insights into aquatic ecosystems

  • How can the integration of Gen AI & Data Engineering help to accelerate the growth of any technology organisation, what it takes to adopt the Gen AI ecosystem, and how to work towards a successful discovery & delivery model using the combination of 2 very different approaches

  • Embeddings are a ubiquitous term in discussions surrounding AI models, particularly in GenAi. While numerous resources explore the mathematical and theoretical underpinnings of embeddings and their significance in training Transformers or ML models in general, there remains a scarcity of material exploring their practical applications. Embeddings serve as data structures containing contextual information essential for executing intelligent tasks. Among the myriad applications of AI, semantic search stands out as one of the most prominent, directly influenced by the quality and scale of embeddings. The selection of embeddings significantly impacts search capabilities and the supported modalities, ranging from text-only to image search or even a fusion of both, known as Multi-Modality search. Recognizing embeddings as pivotal data structures, devising efficient storage mechanisms is imperative. Vector databases emerge as specialized solutions optimized for storing and retrieving embeddings. In this presentation, I will delineate establishing a reference architecture using Vector Database & Embeddings for intelligent search, encompassing options for text, image, and multi-modal searches. Also, I will highlight a few applications leveraging this architecture and outline potential avenues for future exploration.

  • Day 1 | Main Hall - Thought Leadership and Strategic Insights


  • As GenAI revolutionizes our world, data engineers often wonder; 'Will I still matter?' The resounding answer - Yes, and more than ever! Uncover the secrets to not just staying relevant but thriving in the GenAI era as a data engineer. Journey through the evolution of data engineering, uncover the power of AI for data engineers, and learn indispensable strategies for maintaining your edge in this ever-changing landscape. Brace yourself for inspiring real-world case studies and cutting-edge insights into embracing continuous learning and agility.

  • As the field of data engineering continues to evolve at a rapid pace, professionals must anticipate and prepare for the future demands of the industry. This panel discussion will explore the skills that are expected to become crucial for data engineers over the next five years. Experts will delve into emerging technologies and methodologies, such as advancements in cloud computing, real-time data processing, and the integration of AI and machine learning in data pipelines. They will also discuss the increasing importance of security practices, data governance, and ethical considerations in data management. This forward-looking dialogue aims to equip current and aspiring data engineers with insights on how to stay relevant and excel in their careers as the technological landscape shifts.

  • Day 2 | HALL 2 - Practical Insights and Best Practices


  • The evolution of data engineering encompasses transformative concepts driving advancements in knowledge graphs, data discovery, synthetic data generation, text-to-SQL transformations, data quality management and generative AI datasets. Knowledge graphs facilitate semantic data modelling and interconnected data querying. Data discovery empowers insights extraction through metadata analysis and automated profiling. Synthetic data generation mitigates data scarcity using generative AI for realistic datasets. Text-to-SQL bridges natural language queries with structured databases, enhancing accessibility. Data quality management ensures integrity, consistency, and reliability through profiling and error remediation. Generative AI datasets fuel innovation and creativity in generative models. This talk will take the audience through real-world cases of the above-mentioned concepts.

  • MaaS, or Metrics as a Service, is a cloud-based model that provides organisations access to many metrics and analytics capabilities without building and maintaining complex data pipelines or infrastructure. In a MaaS platform, users can define, compute, visualise, and manage metrics and key performance indicators (KPIs) using a centralised and scalable solution.

  • Booking Holding is the world's leading provider of online travel and related services and operates through 5 primary consumer facing brands : Booking,com, Priceline, Agoda, Kayak and Opentable, as well as through a network of subsidiary brands. The next phase of growth at booking will be driven by providing a unified connected trip experience and modernisation which will make it easier for everyone to experience the world. As part of this talk Ankur and Sucharita will highlight how booking is modernizing the data platform to Provide a modern, governed, self service Data and ML platform - facilitating data driven decision making and be future ready to support experimentation and AI/ML use cases while supporting the large booking business.

  • Data observability has garnered considerable interest in recent times. It has evolved from Observability, which relied on monitoring, logging, tracing, etc. However, the expectations associated with data observability and its key tenets have shifted significantly from traditional ways of thinking and implementing it. This session will delve into more practical aspects of Data Observability and how individuals, organisations, and platforms should think differently to gain maximum value from it in a cost-efficient manner.

  • Day 2 | Main Hall - Thought Leadership and Strategic Insights


  • Revolutionizing NoSQL Analytics: Real-Time Analytics & Data Warehousing with Couchbase delves into the transformative potential of utilizing Couchbase, a popular NoSQL database, for real-time analytics and data warehousing purposes. This discussion explores how Couchbase's unique NoSQL Columnar and MPP architecture, along with Zero ETL real-time data ingestion capabilities, enable organizations to efficiently process and analyze large volumes of data in real time, leading to actionable insights and enhanced decision-making capabilities. The discussion will also explore how leveraging the NoSQL Data Model for analytics enables organizations to respond to rapidly changing business requirements compared to traditional relation data warehouses.

  • Data storage and management are pivotal in the realm of data engineering, with data lakes and data warehouses representing two fundamentally different approaches. This panel will contrast these two architectures, discussing their respective advantages and ideal use cases. Panelists will cover data lakes' ability to store vast amounts of unstructured data, offering flexibility and scalability, particularly beneficial for big data analytics. Conversely, the discussion will highlight data warehouses' structured environment, which is optimized for efficiency and speed in querying, making it suitable for business intelligence and reporting. The debate will also touch on considerations such as cost, complexity, data integrity, and future-readiness of each architecture, providing a comprehensive understanding of when and how each should be implemented in a business context.


Schedule from last Year

  • Future of LLMops: Deployment and Scaling
  • Engineering Practices for Data Resilience
  • Building Resilient Data Pipelines
  • Real-time Streaming for Enterprise Data Lakes
  • Conceptualizing Data as a Product
  • Real-time Dashboards and Unified Databases on Cloud
  • The Emergence of Data Mesh Architecture
  • Real-Time Data Processing with Apache Kafka
  • From Data Chaos to Organized Value Generation
  • Architecting Data Pipelines for Generative AI Models
  • Generative AI’s Role in Data Engineering
  • Designing Real-time Data Stream Processing Architectures
  • Data Modeling and Schema Design for Business Efficiency
  • Ensuring Data Trust and Quality in Modern Data Stacks
  • The Value of Operations Data
  • Modernizing the Data Access Layer
  • Monetizing Data at Scale
  • Evolving ETL Practices for Modern Data Integration
  • Operationalizing Foundational Models
  • Optimizing Big Data with Probabilistic Data Structures
  • The Importance of a Data Semantic Layer 
  • Data Fabric vs. Data Mesh for Future-Proofed Platforms
  • Building Efficient Data Lakes and Warehouses
  • Leveraging DataOps for Effective Data Management
  • Transitioning to Real-Time Data with Streaming Pipelines
  • Managing Data Noise and Subjectivity

Register for DES 2024

  • Early Bird Passes

    Expired
  • All access, 2 day passes
  • Group Discount available
  • Late Pass

    Available from 11 May 2024 onwards
  • All access, 2 day passes
  • No Group Discount available
  • 20000

What to expect?

Get ready for the 3rd Edition of the Data Engineering Summit in 2024, a not-to-be-missed event spanning two action-packed days, featuring two distinct tracks designed to cater to a wide array of interests and expertise in the field of data engineering. This summit promises to be an immersive experience, combining enlightening keynote speeches, interactive workshops, and in-depth panel discussions led by renowned industry leaders and innovators.

Alongside the learning tracks, attendees will have the unique opportunity to explore exhibitions showcasing the latest technologies, tools, and services in data engineering.

Topics covered

The Data Engineering Summit will feature a range of presentations, panel discussions, and workshops. Our speakers at the Data Engineering Summit 2024 will cover a wide array of vital topics, including the complexities of big data architectures and the best practices for managing streaming data pipelines.

Topics to be explored encompass the entire lifecycle of data pipelines, the journey of data models from experimentation to production, the integration and utility of data fabric, and the critical aspects of Data Provenance & Governance.