Data Engineering Summit 2024

[WHEN]

MAY. 30 31
2024

DES is Organized by AIM

Hotel Radisson Blu
Bengaluru India

Focused on data engineering innovation, this 2-day conference will give attendees direct access to top engineers & innovators from leading tech companies who will talk about the software deployment architecture of ML systems, how to produce the latest data frameworks, frameworks and solutions for business use cases.

1000+ Attendees

50+ Speakers

3rd Edition

India's first & only conference dedicated to the emerging field of Data Engineering.

2024 SCHEDULE

We are in the process of finalizing the schedule for 2024. Please check back this space again. Expect more than 50 speakers to speak at DES 2024. To explore speaking opportunities with DES, write to info@analyticsindiamag.com

Expand All +
  • Day 1 | HALL 2 - Practical Insights and Best Practices


  • Unstructured data such as customer conversations, meetings, social media contents, media files - images & audios etc. presents a significant analytics challenge for businesses. This presentation explores how Generative AI can transform such data into structured insights, enhancing decision-making and operational efficiency. We will discuss applications including real-time AI copilots and bots that automate tasks and improve services. Key highlights include case studies from NoBroker.com and ConvoZen AI, showcasing how these innovations lead to efficiency gains and better experiences. Attendees will leave with actionable strategies to harness Generative AI in their own organisations, turning data chaos into a competitive advantage.

  • During the session, Varun will be talking about how the process of ETL has evolved to ELT, and the different types of data storage/warehouse solutions.

  • This session will delve into our Gen AI capabilities seamlessly integrated into Microsoft Fabric. You will understand more about the hurdles confronting organizations and discuss customer scenarios showcasing our accelerators and solutions on Microsoft's cutting-edge SaaS Platform. Get ready to discover the tangible value we deliver to our global clients through these innovations.

  • Discuss how adopting a product mindset towards data encourages organizations to improve data accessibility and interpretability, ensuring that it meets customer needs much like any other product. Roles and responsibilities of data product managers—professionals who oversee the lifecycle of data products from creation to deployment and refinement. Examine case studies from leading companies that have successfully implemented this strategy, highlighting how they manage data as a product to enhance decision-making and customer satisfaction. Address the challenges of this approach, including the need for cultural shifts within organizations, the importance of cross-functional collaboration, and the continuous investment required to maintain and improve data products. Conclude with actionable insights on how attendees can start thinking about and treating their data as a product, setting the stage for enhanced innovation and efficiency in their processes. This shift not only improves service delivery but also ensures a competitive edge in the data-driven marketplace.

  • The data landscape is in a constant state of flux, demanding ever-more agile and scalable data processing solutions. While the Lambda architecture has served well, the Kappa architecture emerges as a powerful evolution. This session, designed for senior data leaders, explores the fundamentals and nuances of the Kappa architecture and its potential to revolutionize data pipelines. We'll embark on a journey beyond the Lambda architecture, unpacking the core principles of Kappa. You'll gain insights into how Kappa streamlines data processing by unifying batch and real-time processing into a single, continuous flow. This approach eliminates the complexity of managing separate Lambda layers, fostering a more agile and maintainable data pipeline. The session dives deep into the technical aspects of Kappa, including: Real-time Stream Processing: Leveraging powerful stream processing engines for low-latency data ingestion and transformation. Stateful Stream Processing: Enabling complex event processing and state management within the streaming pipeline itself. Simplified Data Pipelines: Reducing code duplication and operational overhead through a unified processing approach. Tailored Delivery Options: Providing flexibility to deliver data in real-time, near real-time, or in batches based on specific use cases. This session is ideal for data Leaders, data architects and data engineers seeking to push the boundaries of data processing agility. We'll explore real-world applications and best practices, empowering you to architect data pipelines that are not only scalable but also adaptable to the ever-changing needs of your organization.

  • Do you know about DataHub, the #1 Metadata Platform loved by data engineers? It's like a superpowered tool for finding and understanding data, keeping an eye on it, and making sure it follows the rules. And it’s open source! Big names like MYOB, DPG Media, Notion, PayPal, Airtel, Netflix, Expedia, and LinkedIn are already on board. DataHub plays nice with lots of other tools, like Snowflake, Redshift, BigQuery, S3, Airflow, NiFi, dbt, Great Expectations, Looker, Tableau, and more. It's super easy to plug into whatever setup you've got with a rich set of APIs for producing and consuming metadata. But of course, you have to run it yourself! With Acryl Cloud (the cloud hosted version of DataHub), DataHub gets even better with additional features for collaboration, governance and data quality. It's like having a Swiss Army knife for data management. We'll dive into why the Credit Saison team picked Acryl Cloud and how it's helping them. The talk will also cover cool new stuff coming to DataHub and Acryl Cloud soon! Think smarter ways to track where data comes from, searching using plain old English, a handy Slack bot, making sure data is what it says it is, using AI to make metadata even more useful, and a bunch more exciting features.

  • Over the past couple of years, we have experienced how Analytics and Machine learning have come into action in several sectors like E-commerce, health, education, Finance, and Agriculture, and organisations could see tremendous value out of data-driven decision-making. Although we adopted distributed computing platforms in building analytics services, the outlook on unlocking insights from analytics has still been traditional, that is, by building data warehouses and data marts. The Enterprise Data Warehouse (EDW) technologies were able to integrate and harmonize data, enabling BI analysts and users to extract information reliably, but flexibility and addressing the evolving data needs have been a constant challenge. In this talk, I would like to reveal how some hidden patterns could be extracted by realising the problems as Graphs. I'll in-brief state some of the limitations of the existing EDW and how these could be addressed through OLAP Graph technologies. OLAP graph technologies and their implementation, known as knowledge graphs, can link various heterogeneous data sources. I would also like to take you through some of the real-world challenges addressed by embedding Graph + AI design principles into our strategy. The real data challenges in our day-to-day work revolve around entities and their respective attributes. The talk will include: 1. Why Graph + analytics ? 2. What problems could we realise as a Graph? 3. Graph Technology for Data Integration 4. Defining consumable patterns for analysts and business stakeholders. 5. Use-cases Entity resolution in E-commerce 360-degree view of customers Improve enterprise decision-making by enabling cross-channel communication Deduplicating entities 6. OLAP Graph Data Warehouse

  • How Aquaconnect leverage satellite imagery to monitor coastal regions, enabling us to data driven decisions which drives sustainable aquaculture development. This bird's-eye view allows for precise and timely decision-making, ensuring optimal conditions for aquatic species. Approach integrates advanced remote sensing technologies with artificial intelligence to provide unparalleled insights into aquatic ecosystems

  • How can the integration of Gen AI & Data Engineering help to accelerate the growth of any technology organisation, what it takes to adopt the Gen AI ecosystem, and how to work towards a successful discovery & delivery model using the combination of 2 very different approaches

  • Embeddings are a ubiquitous term in discussions surrounding AI models, particularly in GenAi. While numerous resources explore the mathematical and theoretical underpinnings of embeddings and their significance in training Transformers or ML models in general, there remains a scarcity of material exploring their practical applications. Embeddings serve as data structures containing contextual information essential for executing intelligent tasks. Among the myriad applications of AI, semantic search stands out as one of the most prominent, directly influenced by the quality and scale of embeddings. The selection of embeddings significantly impacts search capabilities and the supported modalities, ranging from text-only to image search or even a fusion of both, known as Multi-Modality search. Recognizing embeddings as pivotal data structures, devising efficient storage mechanisms is imperative. Vector databases emerge as specialized solutions optimized for storing and retrieving embeddings. In this presentation, I will delineate establishing a reference architecture using Vector Database & Embeddings for intelligent search, encompassing options for text, image, and multi-modal searches. Also, I will highlight a few applications leveraging this architecture and outline potential avenues for future exploration.

  • Day 1 | Main Hall - Thought Leadership and Strategic Insights


  • The data engineering lifecycle has 3 profound stages – Data Input, Data Processing and Data Output. There are various problems that need to be tackled in various stages of Data Lifecycle. To name a few challenges – Data input mismatch, Data Relationship Management, Data Coding, Data Catalogue enrichment etc. These challenges can be bucketized into 3 profound stages – Input, Process and Output. At Genpact, have tried to create GenAI based solutions to tackle such challenges and showcased to Clients. While lot more solutions and use cases are being thought through, wanted to share a perspective of a few important solutions and their potential impact in Data Engineering space.

  • The world of Generative AI is evolving rapidly, yet its implementation into real-world applications lags behind. What could be the reasons for this, and could this trend reverse soon? You could be a Business Leader, Data Engineering Practitioner or Data Engineer/ML Engineer/Data Scientist; the developments in GenAI are poised to impact your field significantly. In my keynote, The All-In GenAI Gamble: Who Should Bet Big and Who Should Check, I will provide insights from a practitioner's perspective on GenAI. This session will help you understand how to stay relevant and adapt to this rapidly evolving world of GenAI.

  • Emerging trends and industry wide Innovation Developer productivity from enterprise POV Productivity and experience: Two sides of the same coin Enterprise and developers considerations on Data accuracy, privacy and security

  • This session covers how Data Engineering teams across industries can process from multiple sources and types of data, simplify complex data pipelines, code in language of choice with unlimited processing power for meaningful insights, build products and solutions on data to monetize, incorporating and supercharging AI-ML story in the entire spectrum.

  • As GenAI revolutionizes our world, data engineers often wonder; 'Will I still matter?' The resounding answer - Yes, and more than ever! Uncover the secrets to not just staying relevant but thriving in the GenAI era as a data engineer. Journey through the evolution of data engineering, uncover the power of AI for data engineers, and learn indispensable strategies for maintaining your edge in this ever-changing landscape. Brace yourself for inspiring real-world case studies and cutting-edge insights into embracing continuous learning and agility.

  • Moderator - Chiranjeev Singh Sabharwal
    As the field of data engineering continues to evolve at a rapid pace, professionals must anticipate and prepare for the future demands of the industry. This panel discussion will explore the skills that are expected to become crucial for data engineers over the next five years. Experts will delve into emerging technologies and methodologies, such as advancements in cloud computing, real-time data processing, and the integration of AI and machine learning in data pipelines. They will also discuss the increasing importance of security practices, data governance, and ethical considerations in data management. This forward-looking dialogue aims to equip current and aspiring data engineers with insights on how to stay relevant and excel in their careers as the technological landscape shifts.

  • In this talk, Sai will explore how Intuit improved the efficiency of its Datalake in response to the challenges caused by the lack of governance over table creation. With a massive 300K hive tables and several thousand new tables added weekly, the Datalake had become inefficient, costly, and difficult to govern. He will delve into the challenges faced with data discovery, lack of ownership, and unused data, including the difficulty in finding necessary data due to repetitive table names and schemas. Intuit tackled these challenges by introducing Datalake Observability, a tool designed to provide usage states, access patterns, and recommendation algorithms for each table in the Datalake. This led to the removal of unused tables, streamlined discovery, and the establishment of ownership for many tables. Join him to explore how Intuit’s Datalake Observability improved the efficiency of the Datalake while boosting governance.

  • This topic explores the synergy between ClickHouse, an open-source column-oriented database management system and Apache Kafka, a distributed event streaming platform. Together, they offer a robust solution for real-time data ingestion, processing, and analytics. Kafka efficiently handles high-throughput data streams, while ClickHouse excels in storing and analyzing vast amounts of data with lightning-fast query performance. This duo is a formidable combination for businesses aiming for rapid data processing and actionable insights.

  • In a remarkably short period, Generative AI has emerged as a transformative technology. But like any other Analytics capability, Gen AI will need strong integration with and rely on Data Platforms for high-quality data, processing at scale, and a secure way of serving. In this session, we will share thoughts on strategy for evolving existing & traditional Data Foundation Architectures to enable Generative AI capabilities within an enterprise.

  • Discover the strategic advantages of modernizing data architecture on AWS in the age of big data. Explore best practices for architecting scalable data pipelines, optimizing storage and processing, and harnessing advanced analytics and machine learning. Gain insights into emerging trends and technologies and learn how to unlock new opportunities for data-driven growth on AWS. The talk will cover: - Introduction to Modern Data Architecture on AWS - Rationales for Modernizing Data Architecture on AWS - Architecting Scalable Data Pipelines on AWS - Optimizing Data Storage and Processing - Future Trends and Considerations

  • Day 2 | HALL 2 - Practical Insights and Best Practices


  • Join this immersive workshop, delving into the intricacies of managing vast datasets for training generative models. Learn advanced techniques in data preprocessing, storage, and manipulation tailored for large-scale Generative AI projects. Equip yourself with the essential skills needed to drive innovation in AI. Join us and harness the potential of managing massive datasets to propel AI innovation forward, ensuring you're ready to tackle the challenges of tomorrow's AI landscape with confidence and expertise.

  • This session will explore how data engineering practices come into play when operationalizing foundation models. We'll move beyond the model itself to understand how data engineers prepare and manage the data these models rely on, ensuring they function effectively in real-world applications. We will cover: Foundation models as data representations: We'll discuss how foundation models can be seen as powerful representations of underlying data, and how data engineering is crucial for leveraging this potential. Data pipelines for foundation models: The focus will shift to the role of data engineers in building and maintaining data pipelines that continuously feed the model with high-quality data. Fine-tuning foundation models: We'll explore how data engineers can collaborate with data scientists to prepare data for fine-tuning foundation models for specific tasks. Data management for foundation models: This section will delve into the data management challenges associated with foundation models, such as data versioning and monitoring.

  • The evolution of data engineering encompasses transformative concepts driving advancements in knowledge graphs, data discovery, synthetic data generation, text-to-SQL transformations, data quality management and generative AI datasets. Knowledge graphs facilitate semantic data modelling and interconnected data querying. Data discovery empowers insights extraction through metadata analysis and automated profiling. Synthetic data generation mitigates data scarcity using generative AI for realistic datasets. Text-to-SQL bridges natural language queries with structured databases, enhancing accessibility. Data quality management ensures integrity, consistency, and reliability through profiling and error remediation. Generative AI datasets fuel innovation and creativity in generative models. This talk will take the audience through real-world cases of the above-mentioned concepts.

  • This talk will explore two key perspectives: What will be Generative AI's role in shaping Data Engineering, and How should Data Engineering practices evolve to support Generative AI. Data engineering is a critical discipline to build capabilities that source, model and transform data to create usable information. This enables users to derive actionable insights, drive automation and create business value through Analytics, Reporting, AIML and now Generative AI. Generative AI will have significant implications on the discipline across its life cycle. It'll play an instrumental role in enriching the existing data ecosystem and compel stakeholders to re-think end-to-end architecture and implementation. Primary focus areas include: Data pipeline and workflow (Extract, Transform, Load) automation and modernization Data governance – Metadata (and data) enrichment, Tagging, Cataloguing, Lineage, Usage and Documentation Synthetic data generation and augmentation Feature engineering automation Data labelling and annotation assistance Data re-usability Natural language-based analytics Additionally, data engineering practices need to evolve to support Generative AI adoption across Organizations, expanding to unstructured, multi-modal, and diverse data sources. This would create unique opportunities/challenges that must be addressed: Storage and compute scalability Frameworks and best practices for data processing – Indexing, Chunking, Embedding Effective and efficient querying and retrieval Standards for data quality, lineage, versioning (recency), drift (changes), metadata requirements Standards for governance, security and regulatory compliance Custom firm-specific labelling and annotation to fine-tune and/or evaluate Generative AI solutions Unified data warehouse and feature store In summary, data engineering discipline would require transformational changes to take advantage of Generative AI while also enabling organizations to leverage their data and processes to adopt Generative AI to derive efficiencies and create value.

  • MaaS, or Metrics as a Service, is a cloud-based model that provides organisations access to many metrics and analytics capabilities without building and maintaining complex data pipelines or infrastructure. In a MaaS platform, users can define, compute, visualise, and manage metrics and key performance indicators (KPIs) using a centralised and scalable solution.

  • Booking Holding is the world's leading provider of online travel and related services and operates through 5 primary consumer facing brands : Booking,com, Priceline, Agoda, Kayak and Opentable, as well as through a network of subsidiary brands. The next phase of growth at booking will be driven by providing a unified connected trip experience and modernisation which will make it easier for everyone to experience the world. As part of this talk Ankur and Sucharita will highlight how booking is modernizing the data platform to Provide a modern, governed, self service Data and ML platform - facilitating data driven decision making and be future ready to support experimentation and AI/ML use cases while supporting the large booking business.

  • Data observability has garnered considerable interest in recent times. It has evolved from Observability, which relied on monitoring, logging, tracing, etc. However, the expectations associated with data observability and its key tenets have shifted significantly from traditional ways of thinking and implementing it. This session will delve into more practical aspects of Data Observability and how individuals, organisations, and platforms should think differently to gain maximum value from it in a cost-efficient manner.

  • Unlock the potential of a high-performance, scalable, and cost-effective data pipeline. Learn how Kafka enables real-time data ingestion, Airflow streamlines workflow orchestration, Parquet optimizes data storage, and Superset enhances data visualization. Discover practical insights and real-world examples to elevate your data infrastructure, boost performance, and save costs.

  • Day 2 | Main Hall - Thought Leadership and Strategic Insights


  • In today's data-driven landscape, the ability to extract actionable insights from vast volumes of data is crucial for organizations aiming to maintain a competitive edge. In this discussion, we explore how we harness the power of GenAI, Machine Learning models, and advanced Data Analytics techniques to unearth hidden patterns, trends, correlations, and influential features within extensive data repositories. We delve into how these technologies enhance Business, Customer and Operational Insights, enabling proactive strategies, personalized experiences, and operational efficiencies at scale. Through real-world examples, we illuminate the transformative potential of AI in driving innovation, uncovering new opportunities, and fostering a data-centric culture, where intelligence is democratized and accessible throughout an organization.

  • Revolutionizing NoSQL Analytics: Real-Time Analytics & Data Warehousing with Couchbase delves into the transformative potential of utilizing Couchbase, a popular NoSQL database, for real-time analytics and data warehousing purposes. This discussion explores how Couchbase's unique NoSQL Columnar and MPP architecture, along with Zero ETL real-time data ingestion capabilities, enable organizations to efficiently process and analyze large volumes of data in real time, leading to actionable insights and enhanced decision-making capabilities. The discussion will also explore how leveraging the NoSQL Data Model for analytics enables organizations to respond to rapidly changing business requirements compared to traditional relation data warehouses.

  • The evolution of Web2.0 and the modernization of Big Data has led to massive disruption in traditional ways of handling business in services, manufacturing, health care, and various sectors. This, in turn, has led to massive opportunities in enabling business growth through technological innovations and transformations. As a result of this transformation, the opex and capex have been increasing massively, thereby impacting the profitability across sectors and organisations. Every organization now has the need to increase its revenue. This is where the Monetization of Data at scale comes in. Monetization of Data provides 3 key advantages to organisations. It is a competitive advantage over competitors, offers new revenue source streams, and streamlines operations to reduce cost. This requires 3 stages for successful implementation: assess existing data to determine revenue stream, decide the target segment and objective of data assert, adhere to compliance, governance, and cybersecurity. While a lot more solutions and use cases are now being made available, with this talk, Arun wants to share his perspective on a few important use cases and their benefits.

  • CatExpert.ai, the self-service Assortment and Planogramming product developed by AB InBev, is significantly driving growth of beer as an industry by keeping retailers at the heart of everything we do. Through CatExpert.ai, we bring the power of data, analytics, and data science to retailers to help them unlock incremental sales from efficient shelf planning. As they grow, we grow. Today CatExpert.ai is being used in many countries, and this has been possible largely due to the highly scalable tech architecture and forward-thinking data engineering practices we have deployed. CatExpert.ai leverages cloud native infrastructure to achieve scalability, elasticity, and global reach. It is built upon a microservices architecture, which allows for modular development, scalability, and fault isolation. Each microservice handles a specific business function, enabling rapid updates and flexibility as per the needs of different markets. All our data gets organized in a readily usable form by utilizing feature stores and gets used for various data science applications. On top of its data warehouse, CatExpert.ai employs machine learning algorithms to optimize the best combination of SKUs that’d maximize the retailer’s revenues from beer as an overall category. As the scale of CatExpert.ai operations grows, a lot of focus is on data quality & standardization, model governance & reusability, and security of the platform.

  • Data storage and management are pivotal in the realm of data engineering, with data lakes and data warehouses representing two fundamentally different approaches. This panel will contrast these two architectures, discussing their respective advantages and ideal use cases. Panelists will cover data lakes' ability to store vast amounts of unstructured data, offering flexibility and scalability, particularly beneficial for big data analytics. Conversely, the discussion will highlight data warehouses' structured environment, which is optimized for efficiency and speed in querying, making it suitable for business intelligence and reporting. The debate will also touch on considerations such as cost, complexity, data integrity, and future-readiness of each architecture, providing a comprehensive understanding of when and how each should be implemented in a business context.

  • Renowned leadership coach Marshall Goldsmith explores the inspiration behind his book "What Got You Here Won’t Get You There," emphasizing the behavioral changes leaders need to achieve new success levels. He introduces LPR (Leadership Potential Recognition) and its role in creating MarshallBot, an AI coaching tool poised to revolutionize leadership development. Goldsmith shares insights from coaching industry titans like former Ford CEO Alan Mulally, highlighting a personalized approach. He reflects on how Buddhist principles have shaped his coaching, fostering mindfulness and self-awareness in leaders. Looking forward, Goldsmith envisions MarshallBot as a transformative tool in leadership training, offering personalized feedback and continuous improvement. He discusses integrating this AI tool into existing development programs and shares user feedback that has refined its functionality. Goldsmith also contemplates AI's broader impact on the future of work and leadership, identifying key skills leaders must develop to collaborate effectively with AI technologies, ensuring they stay agile in an ever-evolving landscape.

  • In an era where data serves as the compass guiding successful brands, the journey of Boat stands as a beacon of inspiration. Join Aman Gupta, Co-founder and Chief Marketing Officer of Boat, as he shares the captivating narrative of Boat's rise to prominence in the consumer electronics industry. From navigating uncharted waters to charting a course for success, Aman offers invaluable insights into leveraging data-driven strategies to craft a compelling consumer brand. Drawing from the rich tapestry of Boat's story, attendees will discover the transformative power of data in shaping marketing initiatives, driving business growth, and fostering enduring customer relationships. Prepare to set sail on a voyage of discovery, as Aman Gupta unveils the secrets behind Boat's triumphant journey and imparts actionable lessons for aspiring brand builders.


Schedule from last Year

  • Future of LLMops: Deployment and Scaling
  • Engineering Practices for Data Resilience
  • Building Resilient Data Pipelines
  • Real-time Streaming for Enterprise Data Lakes
  • Conceptualizing Data as a Product
  • Real-time Dashboards and Unified Databases on Cloud
  • The Emergence of Data Mesh Architecture
  • Real-Time Data Processing with Apache Kafka
  • From Data Chaos to Organized Value Generation
  • Architecting Data Pipelines for Generative AI Models
  • Generative AI’s Role in Data Engineering
  • Designing Real-time Data Stream Processing Architectures
  • Data Modeling and Schema Design for Business Efficiency
  • Ensuring Data Trust and Quality in Modern Data Stacks
  • The Value of Operations Data
  • Modernizing the Data Access Layer
  • Monetizing Data at Scale
  • Evolving ETL Practices for Modern Data Integration
  • Operationalizing Foundational Models
  • Optimizing Big Data with Probabilistic Data Structures
  • The Importance of a Data Semantic Layer 
  • Data Fabric vs. Data Mesh for Future-Proofed Platforms
  • Building Efficient Data Lakes and Warehouses
  • Leveraging DataOps for Effective Data Management
  • Transitioning to Real-Time Data with Streaming Pipelines
  • Managing Data Noise and Subjectivity

Register for DES 2024

  • Early Bird Passes

    Expired
  • All access, 2 day passes
  • Group Discount available
  • Regular Pass

    Available till 10th May 2024
  • All access, 2 day passes
  • Group Discount available

What to expect?

Get ready for the 3rd Edition of the Data Engineering Summit in 2024, a not-to-be-missed event spanning two action-packed days, featuring two distinct tracks designed to cater to a wide array of interests and expertise in the field of data engineering. This summit promises to be an immersive experience, combining enlightening keynote speeches, interactive workshops, and in-depth panel discussions led by renowned industry leaders and innovators.

Alongside the learning tracks, attendees will have the unique opportunity to explore exhibitions showcasing the latest technologies, tools, and services in data engineering.

Topics covered

The Data Engineering Summit will feature a range of presentations, panel discussions, and workshops. Our speakers at the Data Engineering Summit 2024 will cover a wide array of vital topics, including the complexities of big data architectures and the best practices for managing streaming data pipelines.

Topics to be explored encompass the entire lifecycle of data pipelines, the journey of data models from experimentation to production, the integration and utility of data fabric, and the critical aspects of Data Provenance & Governance.

Thanks for being part of DES 2024

Check our upcoming Conference