Tech Insights

Data-driven marketing and data science: From machine learning to machine learning ops


Marketing activities such as online advertising, webinars, social media, and other campaigns have been digitized over the past decade, and the pace of this conversion has accelerated in response to challenges presented by COVID19. Digitization has become requisite in our ability to reach and influence customers.

Marketing-data-science_blog.pngDigital marketing has made it possible to collect and analyze big data such as customer responses and behaviors in real time, and to provide precise product and service offerings that meet the needs of target customers. The key here is to build a hypothesis in line with business objectives, analyze the collected data, and link it to effective metrics. 

Data science is indispensable for such data-driven marketing activities. This blog will introduce some examples of how data science, such as statistical analysis and machine learning (ML), can be applied to several use cases in the planning of marketing strategies and tactics. In addition, we will touch on ML ops, and learn why it’s such a hot topic these days.

Data science in marketing

In the long history of marketing, multiple frameworks and concepts have been created and used by many companies. Segmentation, Targeting, Positioning (STP), Product, Price, Place, Promotion (4P), Funnel, and Customer Journey are some typical examples. By utilizing data science in these areas, it is possible to build and verify effective hypotheses based on numerical data, and promote data-driven marketing.

Segmentation, Targeting, Positioning. This framework is based on similarities in needs and preferences, what groups that customers can be classified into, and which of these groups will be most receptive to the company's products and services. STP also looks at how your company is perceived among other similar companies and products, and how you differentiate yourself from competitors. Methods such as cluster analysis are useful for segmenting customers and understanding the positioning of your company.

Product, Price, Place, Promotion. For products, association analysis can be used to determine which servers and storage devices are most frequently purchased together, and how to increase opportunities for cross-selling and up-selling between products. It may be possible – and desirable – to include results in product planning and development. Product bundling is a good example of how this information can be leveraged.

A/B testing is often used to measure the effectiveness of promotions by assessing reactions to changes in the title of an email, the color and position of a button on a website, and so forth, to better understand how customers make choices. In this instance, a statistical analysis technique called the Chi-square test is used.

Another popular tool in online shopping is recommendation. You may have noticed that items that match your particular tastes are displayed as recommendations when you are browsing the web. This approach to cross-selling and up-selling provides information about other customers who have similar buying behavior and preferences, or by providing information similar to your own past buying behavior. The former uses collaborative filtering, while the latter uses a machine learning technique called content-based filtering.

Other analytics can be applied in areas such as channel analysis to determine desired place of business, pricing, and so on.

Funnels and customer journeys. Digital marketing activities create many customer touch points, and the information about prospects generated from these touch points is then nurtured and scrutinized for quality through telemarketing follow-up. Those respondents who meet certain criteria are then forwarded to the sales team, to be converted into deals by advancing to the negotiation stage, eventually leading to an order. In each juncture of the funnel and customer journey, it is possible to analyze the rate at which leads and deals advance to the next stage (conversion analysis), and the relationship between factors, such as type of campaign or customer attributes can be studied. In addition, when a marketing lead is converted into a deal by the sales team, it is also possible to predict the final order size, based on information such as the probability of receiving the order. In this case, a method called regression analysis can be used.

Based on the above, the following table summarizes the use cases where data science such as machine learning and statistical analysis can be used in marketing operations. 

Figure 1.png

Use case examples

I will take a few use cases from the table above and show you how they can actually be used.

Segmentation and targeting by cluster analysis

  • If you have data on the participants of a campaign such as a webinar, you can group customers with similar interests based on the theme of the session they attended.
  • Typical examples of cluster analysis are K-means method and hierarchical clustering, which are classified as unsupervised learning. (It can also be performed with free tools such as Python and R)
  • In the example below, we use Principal Component Analysis to figure out, in advance, that the number of groups can be divided into about three, and then we use the K-means method to divide the customers into three groups.
  • To understand the characteristics of the three groups, we use a radar chart, and each of the radiations represents a theme of the webinar. We can make sense of customer interest by predicting the interest, based on the size of the value of the theme.
  • This enables targeting and personalization, such as understanding the interests of each group of customers and making offers tailored to their needs for the future.

Figure 2.png

Determining the effectiveness of tactics through A/B testing

Suppose you have the following data on the open rates of newsletters that contain the same content, but have two different titles. Let's say that Title A is longer and has a 12.1% open rate, while Title B is shorter and simpler and has a 13.4% open rate. Is this difference just a coincidence, or does the difference in length between Title A and B make a significant difference in the open rate?

  • One tool that can be used in such a case is the Chi-square test.
  • The probability that the two titles have no effect on the open rate is 4.5%, which is a low number, and can be determined by the Chi-square test, which can be calculated in Excel.
  • We can determine that there is a significant difference between different titles and open rates, and we gain the insight that different titles effect open rates, and shorter titles are preferred.

Figure 3b.png

Forecast orders for opportunities generated from marketing leads

  • If the marketing team has a quarterly target for orders from the campaign, it is possible to forecast the final order amount at the beginning of the quarter.
  • If information on the win probability of the pipeline (large, medium, small), generated from marketing campaign is available from SFA tools, it will be possible to predict the final order amount at the beginning of the quarter.
  • If we use regression analysis, which is classified as supervised learning, we learn from past data, create a model, and determine the parameters. For example:
    Predicted amount of orders at the end of the period = 0.8 x amount of deals for high probability + 0.5 x amount for medium probability + 0.2 x amount for small probability + amount of orders up to that date + constant value

By creating such a model and entering data for a new period, we can predict the amount of orders at the end of the period at the beginning of the period. In the following figure, as an example, the black line represents the actual orders at the end of each period, and the green line represents the value predicted for each week during the period. In this case, if you can predict the amount of dollar ($) value as close to the black line as possible at the beginning of the period, you can say that the model has good accuracy.

Figure 4.png

  • You can also use correlation analysis to correspond to the cost invested in online advertising with impressions, clicks, and HVA (High Value Actions).

In addition to forecasting, it is also important to visualize the current situation. For example, the figure below shows a box-and-whisker diagram of the number of days until the opportunities are closed for each type of campaign. The thick solid line in the middle of each box represents the median. The median is the value that is in the middle when the data is arranged in decreasing order. At both ends of each box, the bottom line is the value that is 25% from the bottom of the box and the top line is the value that is 75% from the top of the box. As you can see, you can use the box-and-whisker diagram to get an idea of how the data is scattered. And if the boxes for chat, search, etc. are located lower than the other tactics, we can say that the number of days to order tends to be shorter because the customer has a clear intention to take action.

Figure 5.pngNeeds and benefits of ML Ops

If a single marketer or data scientist is responsible for preparing data, creating models, training, inference, and verification on his or her own PC, there may not be much of an operational problem. However, if you want to develop a company-wide approach to machine learning, regardless of marketing operations, and collaborate with many members such as software engineers, operations staff, and data analysts, you need to have a solid process, structure, and appropriate tools in place.

In fact, in many companies, the data science team spends a lot of time building models that address specific business challenges. These models do not create business value until they are deployed in another application that uses the model to achieve the desired outcome. What is needed are the tools and processes to seamlessly migrate models into production environments, and ML Ops is getting a lot of attention for helping to make this happen.

ML Ops covers the entire machine learning lifecycle, supporting each stage from data preparation, model building, model training, model deployment, collaboration, and monitoring. DevOps is used as a way to rapidly deploy IT development and operations, and it can be said that this concept is applied to the machine learning lifecycle.

Some of the benefits include

Faster time-to-value. You can manage and provision development, test, or production environments in minutes as opposed to days, instantly onboarding new data scientists with the preferred tools and languages – without creating siloed development environments.

Improved productivity. Data scientists spend their time building models and analyzing results rather than waiting for training jobs to complete. HPE Ezmeral ML Ops helps ensure no loss of accuracy or performance degradation in multitenant environments. It increases collaboration and reproducibility with shared code, project, and model repositories.

Reduced risk. It provides enterprise-grade security and access controls on compute servers and data. Lineage tracking provides model governance and auditability for regulatory compliance. Integrations with third-party software provide interpretability. High availability deployments help ensure that critical applications do not fail.

Flexibility and elasticity. You can deploy on-premises, cloud, or in a hybrid model to suit your business requirements. HPE Ezmeral ML Ops automatically scales clusters to meet the requirements of dynamic workloads.

‏Figure 6.png

HPE also offers HPE EZMERAL ML OPS, a solution that brings DevOps-like agility to the entire machine learning lifecycle, with features such as model building, model learning, model deployment, model monitoring, collaboration, security and controllability, and hybrid deployment.

HPE ML Ops - Operationalization for the ML Lifecycle

Summary and references

  • Data-driven digital marketing and data science, such as statistical analysis and machine learning, are useful in targeted situations and use cases.
  • In order to implement machine learning in an enterprise-wide environment and get desired and actionable results, you also need to implement ML Ops.

If you are interested in HPE EZMERAL ML OPS and want to know more about it, please watch the online video seminar @ Self-service data science and GPU optimization.







About the Author


Our team of HPE and other technology experts shares insights about relevant topics related to artificial intelligence, data analytics, IoT, and telco.

HPE Webinars
Find out about the latest live broadcasts and on-demand webinars
Read more
Online Expert Days
Visit this forum and get the schedules for online Expert Days where you can talk to HPE product experts, R&D and support team members and get answers...
Read more
View all