HPE Ezmeral: Uncut

Second project advantage: Lowering barriers to AI and machine learning

HPE-Ezmeral-2nd-project-advantage.blog.pngA second AI or machine learning project is usually very different from the first one you try. One difference is the second (or third or fourth) should start with distinct advantages. 

Why does that matter? AI and machine learning projects have huge potential value, but they often are speculative -- there’s no guarantee of the outcome. You’re more likely to reap the value they offer if you can lower the barriers and entry costs so you can try more things. That goal, in turn, makes it more likely that some of them will be winners.

The point is, experimentation comes with risk and costs. How can you bound the risk and costs, and how do they stack up against potential benefits?

The answers lie in both human choices and technology. Let’s examine in a kind of reverse journey why a second AI project starts with advantages over the initial attempt. Next, we’ll look at what makes a good choice for a first AI project, and finally we’ll explore how to lower barriers to getting started. 

Second AI project starts with advantages

An obvious advantage of the second AI project is sunk costs: you’ve already set up a system that can provide the data needed for the development of the project and on which your models will run in production. Furthermore, you may have already collected large-scale data needed to serve as the source for feature extraction for training data -- that’s a step ahead. Knowing which aspects of your available data are most predictive and what kinds of decisions can best be automated in the context of your business is another big step ahead.

One of the biggest advantages is a human one: experience. Even though each project has its own challenges and requirements, basic skills are required that carry over. For example, you may wonder how to plan a project that is set in the right context to have real impact, how to frame appropriate questions, how to think about what data will be useful for training models, and what machine learning tools are best suited for particular types of modelling. 

Many people new to AI and machine learning are taken by surprise at how important data is or how much work it is to prepare training data. They also may not realize the process is iterative. It’s not just code up and train a model, run, and done. Ongoing model evaluation, retraining and deploying new models are all to be expected. And of course, me telling you this is not the same as experiencing it for yourself – it is a challenge that requires considerable effort – but you are a step ahead if you expect and plan for the process of data preparation to be a substantial part of the overall effort.

The advantage of experience also can extend beyond the data science and data engineering teams. A first AI project can raise awareness across your organization and, especially if it was successful, serve as a proof-of-concept to make adoption of new projects easier. 

So, the fastest way to get to the advantages of a second AI project is, obviously, to get started on the first one!

What makes a good first AI project?

Simplicity is key to a good first project. Even if you work for a really forward-thinking organization – one that makes room for innovation by accepting some risk of failure with speculative projects – it’s still helpful for a first AI project to deliver real value and to be feasible in a realistic time frame. Remember, simple can be powerful. Often the key to value is sophistication in domain knowledge. Knowing what processes would benefit from automation through machine-based decisions made by simple models, for instance, can be more effective than building a really complex model. 

Another tip for a good first project is one that makes use of existing data. You’ll still have a lot of work to do in terms of data engineering, but you won’t have to wait to collect large-scale data.

Overall, the best guidance for getting started on an initial AI/ML project is to pick low hanging fruit. Choose a project that focuses on some business process you already knew you wanted to do. It may be one for which you have the appropriate data and the right domain knowledge. Or it may be one where simple modelling techniques will be effective or you have a way to take action based on the data insights your project reveals. Leave the broader experimentation of building a new line of business for subsequent projects. 

You can make it easier to get started on that first AI project by lowering the entry costs. How? Give it second-project advantages. 

Give your first AI project the second-project advantage

There’s a widespread misconception that large-scale analytics and AI/machine learning projects must be built on separate systems. If you assume that, it’s a flag that your underlying data infrastructure is imposing unnecessary limitations. In a truly scale-efficient system, AI and data analytics can co-exist well on the same system, accessing the same data from the same cluster. By doing that, you’ve given even your first AI project many second-project benefits, including taking advantage of sunk costs in a system already built to run essential business processes. By lowering entry costs, you make it more likely that speculative machine learning and AI projects (even initial projects) will deliver value. 

Unifying data infrastructure supports AI and analytics together

An additional benefit of putting AI and analytics together is that you can develop a comprehensive data strategy. This helps foster collaboration between data scientists and others and allows you to add complexity and scale with less burden on IT resources. This is a special case of an overall strategy to build an organization that fosters innovation because risk has been bounded.


Data scientists and analysts benefit by sharing data on the same system.

What is required to make this level of multi-tenancy practical? A scale-efficient data platform should provide flexibility in direct data access by modern analytics applications, legacy applications, and machine learning/ AI applications. The data infrastructure should also offer safeguards for data protection and for workload management with a high level of reliability, so innovative new applications do not interfere with essential basic processes. Data persistence for containerized applications is also useful in this type of multi-purpose, multi-tenant system. Furthermore, much of the data logistics needed for any of these projects should be handled in an automated way at the platform level, rather than having to be coded into applications. The HPE Ezmeral Data Fabric is an example of a data platform engineered to support AI and analytics on the same large-scale system.

Additional resources

To find out more about how to provide second-project advantages for your AI and machine learning projects, download a free pdf of the ebook, AI and Analytics at Scale: Lessons from Real World Productions Systems. Chapter 3 titled “AI and Analytics Together” includes several real-world customer stories that illustrate this approach.

Additional information about the impact of unifying data infrastructure that supports data sharing by AI and analytics projects is found in the blog post “The case for radical simplicity in data infrastructure”. 

You may also want to read the blog post “Budgeting time for AI/ML projects”.

Ellen Friedman

Hewlett Packard Enterprise




0 Kudos
About the Author


Ellen Friedman is a principal technologist at HPE focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies for seven years prior to her current role at HPE, where she was a committer for the Apache Drill and Apache Mahout open source projects. She is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.