Skip to Content

How to use data and AI to cut costs and deliver personalized shopping experiences

How are you responding to today’s unprecedented macro volatility, fast-moving competitive landscape and constantly shifting customer expectations?

How to use data and AI to cut costs and deliver personalized shopping experiences

The Retail and CPG brands that survive this tsunami of disruption will be those that serve up personalized, engaging customer experiences at every turn. But to do that, they’ll have to adopt platforms and practices that leverage advanced analytics and AI — all while supporting exponential scalability and ever-changing scope.

In this article, you will learn:

  • How leading brands are leveraging advanced analytics and AI to thrive
  • Common use cases for retail and consumer goods brands
  • How to take advantage of proprietary data to increase revenues and cut costs

Content Summary

Retail’s rapidly changing landscape
The disrupted state of supply and demand
Leveraging data and AI on the front lines
Common challenges retail brands are facing today
The need for a retail lakehouse
Start delivering data and AI results with Solution Accelerators

Retail’s rapidly changing landscape

Real-time, personalized customer experiences have evolved from nice-to-have to absolute necessity over the past decade.

Personalization has become more than just smart recommendations. It is about influencing the perceived value that you deliver to a customer. But, with an economy turned unpredictable by COVID-19, that shift has gone from slow burn to explosive change agent. Many retailers have had to pivot — often dramatically — to limit economic risks. Big retailers that already had strong e-commerce presence have gained market share, while smaller brands, who were already struggling to differentiate themselves, are being forced online at great cost or face restructuring. With an astronomical surge in digital demand and more customers than ever relying on e-commerce, scaling operations to meet that demand has never been harder. It’s also never been more important.

In 2020, e-commerce sales accelerated at breakneck speed, with retail growth taking place over 10 weeks’ time that previously took 10 years to achieve. Suddenly, retailers are having to reform strategies and implement entirely new methods for data analysis, engaging consumers, forecasting, and tracking inventory and supply chain checkpoints. The businesses that survive this tsunami of disruption will be those that serve up personalized, engaging customer experiences at every turn. But to do that, they’ll have to adopt platforms and practices that leverage advanced analytics and AI — all while supporting exponential scalability and ever-changing scope.

The disrupted state of supply and demand

With uncertainty at the core of today’s decision-making, retail and consumer goods companies must focus on speed and agility when it comes to how they ingest and interpret data. They need to be faster at collecting data so that customers can consistently get what they want. They need to be nimble about data analysis so that they can reduce costs and waste fewer resources throughout the supply chain. And they must be flexible because data is dynamic.

Consider SKU rationalization: the cross-functional process of optimizing inventory and making informed decisions by looking at data. In the past, using historical data to weigh the cost of production and stocking items against the potential benefit of selling might have happened once per month. Today, retail merchants need to optimize store-level SKUs every single day to strategically scale alongside growing demand. We estimate that this has increased the models necessary for retailers to run SKU rationalization from tens of thousands per month to over 1.7 billion per month.

In a pandemic-era marketplace, typical supply-and-demand patterns have been thrown off course. Waves of consumers are making different, sometimes erratic purchasing decisions that inaccurately reflect demand. Merchants struggle to understand what inventory should look like from one moment to the next, and suppliers are experiencing production slowdowns as workforces are cut for safety and financial reasons.

Companies can still reduce costs and win market share to drive stronger growth, but this requires new ways of understanding consumer behavior. Big data and AI allow retailers and consumer goods companies to assess buying behaviors in real time, so that they can refocus their efforts on areas that will rapidly deliver value and drive expansion well into the future.

How Starbucks stays agile with faster and smarter use of data

Starbucks serves millions of coffee lovers every day all across the globe. In order to maintain the highest level of customer service, they focus on building lasting customer connections, creating innovative products and accelerating the digital experience for customers. The key to their success lives within their data and migration from legacy on-premises software to a cloud-based platform. As a result, they are able to leverage fine-grained forecasts to substantially increase inventory demand accuracy.

With Databricks, they have put a data lakehouse in place that can be leveraged enterprise-wide, build fast data pipelines at petabytes-scale which allow them to rapidly build ML models that improve inventory management, and unlock new product and service innovations.

  • 1,000+ data pipelines
  • 50x-100x faster data processing
  • 15 minutes to deploy ML models

Already, retailers are rethinking how and where they serve customers while still observing social distancing guidelines. As a result, most have seen a surge in BOPIS/BORIS transactions (buy online, pick up/return in store). These curbside orders increased by 208% during the pandemic, and online sales grew by almost 50% as homebound consumers headed into lockdown and turned to digital storefronts for their shopping needs.

With Databricks, we can now take a strategic view into data analytics. Our teams can spend time focusing on business problems up the value chain, rather than simply moving data from point A to point B. – VISHWANATH SUBRAMANIAN, Director of Data Engineering and Analytics at Starbucks

How COVID-19 has impacted retail’s long-term growth

In the wake of COVID-19, we’ve seen many booming segments brought to their knees by huge, rapid declines in sales volume and the evaporation of partners. This loss of revenue and profits has led to the corresponding reduction of workforces across the supply chain, contributing to instability as consumers tighten their purse strings in response to the economic downturn. But some verticals have seen growth as a direct result of our new normal, and we expect expansion to continue for the next 12–24 months.

  • Grocery
  • Retail drug
  • Home improvement
  • E-commerce
  • Consumer goods (e.g., food, staples)
  • Restaurant e-commerce

Leveraging data and AI on the front lines

With retail and consumer goods markets in flux, accurate forecasting requires a solution that considers variations in day-to-day product demand and distribution — needs that are well beyond the capabilities of legacy, data warehousing-based tools. Organizations working with ever-growing, day-to-day data need a centralized hub — a supply chain control tower — to orchestrate the technology, tools and processes they rely on to capture data across all stages of the supply chain. Innovations in time series forecasting help generate more reliable demand forecasts for retailers, and give retailers the power to develop timely, deeply granular forecasts so they can make precise adjustments to their inventories.

Databricks unifies all the information and tools that data teams at retail and consumer brands need to simplify data and AI, and accelerate omnichannel innovation.

Refining accuracy and improving forecast accuracy with Databricks

Consumers expect personalized omnichannel experiences, whether they’re shopping on mobile or in store. Databricks gives merchants unmatched insight into what their customers are after and provides a real-time view of their supply chain.


From in store to mobile, consumers are expecting personalized omnichannel experiences.

With Databricks, retailers and consumer brands can truly support a 360 understanding of their customers and have a real-time view of their supply chain.


Improve accuracy in inventory predictions by understanding customer demand, enabling you to reduce excess inventory while avoiding lost sales.

  • Supply Chain Control Tower
  • Time Series Forecasting
  • Causal Forecasting
  • Safety Stock Analysis
  • On-Shelf Availability (OSA)
  • SKU Rationalization


Drive incremental revenue through enhanced segmentation based on deeper behavioral insights

  • Consumer Segmentation
  • Customer Lifetime Value
  • Survival Analysis and Churn
  • Propensity To Buy
  • A/B Testing
  • Personalized Recommendations


Optimize pricing during key moments of the customer lifecycle or season

  • Dynamic Pricing
  • Price Optimization
  • Promotion Optimization
  • Promotion Effectiveness

Common challenges retail brands are facing today

Managing data at scale is hard. Managing AI is much harder. While expectations surrounding the customer experience have certainly risen to new heights, few retailers have invested in the right technology for meeting these new standards. First, privacy and security are the cornerstone of modern AI initiatives. Brand loyalty is dependent on brand trust, which means they will trust you with their data so long as you provide value. They’ll stop trusting you (and move on to a competitor) if they do not feel like their data is secure with you.

Another issue many companies face is the skills gap. Hiring good data scientists is very competitive right now, so even large enterprises often struggle to find enough people relative to the needs across the business. Similarly, an enterprise needs people who are not just good data scientists, but also understand your industry. When you consider regulatory dynamics, competitive dynamics, consumer behaviors and more, data science isn’t just about getting a bunch of data and churning out algorithms. You need to be solving business problems, like reducing costs or increasing revenue, for data teams to get the influence they deserve.

Finally, the speed of business has changed. Organizations can’t wait months or years for a PoC to be tested and then additional time to scale out. You need to deliver value in days or weeks. One of the primary blockers to rapid time-to-value is an IT architecture that is overly complex and not built for performance, agility and scale. Like many other verticals, retail brands have historically cobbled together their on-premises IT infrastructure environments from separate solutions with separate storage and separate servers, ultimately creating a labyrinth that’s too complex to access or manage. While some workarounds were viable for a little while, most of these environments have grown into zoos of technology, completely out of control, and require extensive resources to maintain. In fact, data engineering teams are often forced to spend more time on simply managing legacy infrastructure than preparing the data for analytics.

Essentially, legacy IT environments are far too inefficient at a foundational level. From the moment the data is gathered, it becomes almost entirely unusable for the real-time needs of the modern day customer.

The need for a retail lakehouse

Today, customer data comes in fast and constantly evolving. It seems like there are new channels cropping up every day, and the shelf life of the information they provide is constantly decreasing. Because the window for valuable insights is so small, brands need to unify data at the very start of its journey in order to use it to its full potential.

Why Databricks for retail and CPG is proven

DELIVER HIGH-IMPACT ANALYTICS IN TIGHT SLAs: Databricks enables customers to deliver the most demanding of analytics to the frontline in your service windows. Need to allocate inventory or predict on-shelf availability for every store and SKU today? It’s not a problem.

POWER PERSONALIZED EXPERIENCES WITH REAL-TIME DATA: Real-time awareness drives higher relevancy in recommendations, leading to higher incrementality and stronger customer satisfaction. Databricks enables companies to incorporate batch and real-time data of all types to power your e-commerce and mobile experiences.

GREATER AGILITY LEADS TO IMPROVED RESILIENCY: In this era of volatility, retailers and consumer goods companies need a platform that enables them to respond to changes in real time. Deliver new insights and analytics in days and weeks, not months with your traditional warehouse.

Databricks provides retail and CPG companies with a fully managed, cloud platform that accelerates innovation by unifying all your data, analytics and AI workloads. Databricks accelerates innovation by making all data actionable and by unlocking new ways to explore and analyze data to drive new use cases, which enable hyperpersonalized experiences that build trust and create long-term value.

The Databricks platform brings the openness and simplicity of the lakehouse. A lakehouse is a new, open architecture that combines the best elements of data lakes and data warehouses. Lakehouses are enabled by a new open and standardized system design: implementing similar data structures and data management features to those in a data warehouse, directly on the kind of low cost storage used for data lakes.

Building performant and reliable data pipelines at scale

One of the most common problems organizations face when dealing with massive volumes of real-world data across disparate sources is that it can become unreliable, low quality and challenging to manage. Many organizations turn to data lakes to aggregate their big data cost-effectively, but this poses its own challenges.

Delta Lake is an open source technology that adds reliability and performance to your data lake and is natively integrated in Databricks. As part of The Linux Foundation, it has become one of the fastest growing, big data open-source projects. Delta Lake is a software layer that sits on top of your data lake, which enforces data quality to the data that enters your data lake. Retailers and CPG companies can ingest structured, semi-structured and unstructured data — both in batch and in real time — to a single Delta Lake to ensure that the supply of data is clean and usable.

Unlock the value of data lakes for BI, data science experimentation and ML

Databricks provides a Lakehouse Platform that helps retail and consumer brands simplify data and AI — accelerating omnichannel innovation.

Databricks provides a Lakehouse Platform that helps retail and consumer brands simplify data and AI — accelerating omnichannel innovation.

Data Challenge Databricks Solution
Data Ingest: Processing batch and streaming data can be slow and error-prone, impacting downstream analytics. Connect real-time inventory data with real time customer experience data.
Data Lake Management: Data silos can limit the ability to gain a complete view of the customer. Easily handle large volumes of data from multiple sources (clickstream, PoS, social, etc.) built on a strong privacy foundation.
Data Query: Fragmented, siloed, and inconsistent data sources for BI and data science. Ability to rapidly and inexpensively experiment, manage and push out at scale from a single platform.

Extracting business insights for analytics

With data centralized for easy access, Databricks enables you and your team of data analysts to easily and directly connect and query your most complete and recent data in the data lake with Delta Lake and Spark SQL. Connectors with popular BI tools like Tableau and Power BI allow your analysts to use their preferred BI visualization and reporting tools for real-time customer insights.

For a completely seamless experience, you can leverage Redash, a natively integrated visualization tool, to easily visualize and share your data via intuitive dashboards and queries.

Redash makes it easy to explore, query, visualize and share data.

Bring data science and analytics teams together to accelerate innovation

Key to ensuring a rapid pace of data-driven innovation is to foster a collaborative environment that empowers data teams to work better together across the enterprise. When data teams work together effectively, they are able to more easily focus their ideas, skills and energy toward accomplishing amazing things. Through an interactive workspace, data engineers and scientists can easily collaborate on data, share models and code, and manage the entire machine learning lifecycle in one place. Databricks notebooks natively support Python, R, SQL and Scala so practitioners can work together with the languages and libraries of their choice and then push results to business stakeholders with built-in dashboards and visualizations.

Streamlining the machine learning lifecycle to create customer value

Successfully building and deploying a machine learning model can be difficult but essential to retailers. MLflow is an open source framework that streamlines the machine learning lifecycle — allowing data scientists to reproduce a pipeline, compare the results of different versions, track what’s running where, and redeploy and roll back updated models. MLflow is natively integrated into the Databricks Lakehouse Platform, allowing your teams to seamlessly connect their data pipelines to their models in development and track them across the entire ML lifecycle.

All this is made possible by a unified data analytics platform that enables data scientists, data engineers and analysts to explore big data and draw on actionable insights that can deliver personalized and engaging customer experiences.

Start delivering data and AI results with Solution Accelerators

Based on best practices from our work with the leading brands, we’ve developed Solution Accelerators for common data analytics and machine learning use cases to save weeks or months of development time for your data engineers and data scientists.

To stay up to date on our latest solution releases, visit us at

Inventory optimization


The growth of e-commerce, volatility with suppliers, and risk of global pandemics have shocked and accelerated the demands on supply chains. Companies have found existing models and approaches to predicting demand and managing inventory insufficient for the new normal in retail. A company may have run weekly or monthly aggregate forecasts with limited data sets in the past, but competing in the era of e-commerce where consumers can easily switch stores requires that companies have the ability to predict demand for a SKU at a day and store level.

BLOG AND NOTEBOOK: New methods for improving supply chain demand forecasting

WEBINAR: Granular demand forecasting at scale


Improving the speed and accuracy of time series analyses in order to better forecast demand for products and services is critical to retailers’ success. In this notebook, we discuss the importance of time series forecasting, visualize some sample time series data, then build a simple model to show the use of Facebook Prophet. Once you’re comfortable building a single model, we’ll combine Prophet with the magic of Apache Spark™ to show you how to train hundreds of models at once, allowing us to create precise forecasts for each individual product-store combination at a level of granularity rarely achieved until now.

BLOG AND NOTEBOOK: Time series forecasting with Facebook Prophet and Apache Spark

CASE STUDY: Starbucks


Natural disasters, pandemics, societal unrest and other factors have all recently caused disruptions to our global supply chains. Ensuring that we have enough product to serve demand, while not carrying too much inventory is a key challenge for every business. This solution provides a modern way of helping retailers and manufacturers identify the optimal safety stock to carry to prevent business disruption while freeing working capital.

BLOG AND NOTEBOOK: How a fresh approach to safety stock analysis can optimize inventory



Marketers want to invest their resources in the most engaged and valuable customers. Investing in these customers generates stronger growth and higher ROI. This solution focuses on how marketers can segment consumers by lifetime and value, and help to improve decisions on product development and personalized promotions.


  • Part 1: Estimating lifetime duration
  • Part 2: Estimating future spend

WEBINAR: Virtual workshop


Retailers and direct-to-consumer brands are increasingly offering subscription services to consumers. These services provide the consumer with convenience while building a steady source of annuitized revenue. As membership in subscription models increases, keeping those customers becomes crucial to maintaining profitability. This solution offers new ways of analyzing customers to understand what factors lead to greater retention and to identify when and why customers churn.

BLOG AND NOTEBOOK: How to analyze customer attrition

WEBINAR: Virtual webinar


The key to effectively managing retention, and reducing your churn rate, is developing an understanding of how a customer lifetime should progress and examining where in that lifetime journey customers are likely to churn. Armed with more reliable predictions of churn risk, we can more carefully examine the residual CLV associated with individual customers and make more targeted decisions regarding when and how to intervene.

BLOG AND NOTEBOOK: Profit-driven retention management with machine learning


Recommenders are key to helping customers navigate through a sea of choice, whether it’s finding the right piece of content, or SKU, or personalizing a product with several customization options. To help our customers understand how they might use Databricks to develop various recommenders, we’ve made available a series of detailed notebooks leveraging a real-world data set to show how raw data may be transformed into one or more recommender solutions (Collaborative Filter vs. Content-based).

BLOG AND NOTEBOOK: Personalizing the customer experience with recommendations

WEBINAR: Virtual webinar


When retail brands obtain a real-time picture of the diverse data sets that constitute an accurate and real-time understanding of their customers, they can personalize the customer journey with relevant experiences based on most-recent behaviors — no matter how they differ from moment to moment. In short, unified data engineering and data science is the key to realizing success in today’s fast-moving landscape.