HPE Ezmeral: Uncut
1754731 Members
3203 Online
108825 Solutions
New Article
EzmeralExperts

Chat with your data without playing rag roulette

HPE Ezmeral Software lets you unlock the power of large language models without submitting a single IT ticket. Read the blog to understand how.

HPE-Chat-with-your-data.png

Amidst the social media storm surrounding large language models (LLMs), influencer hype often distracts from the crucial question every data team must ask: what's in it for the business? Stakes are high, and stakeholders rightfully balk at ambitious goals without practical budgets. Platforms, data SME salaries, and associated costs hit organizations where it hurts, and the bill always comes due. What MLOps teams see as semantic search enhancers, knowledge discovery accelerators, and end-user cognitive overhead reducers, stakeholders see as a risky gamble. Even with tools and strategies touted to improve success rates, the 80% ML project failure statistic looms large[1].

Organizational leaders want the benefits of generative AI but fear getting lost in another "tool rabbit hole" with no tangible return. I understand the allure of new tech; after all, I've been dubbed a "tool guy" and burned the midnight oil for Kubernetes certifications. But I've also seen projects succumb to the shine of flashy tech and charismatic sales teams only to see the projects die before reaching their promised profitability.

It's déjà vu all over again. Remember the "cloud everything" craze? Then it was “run everything on Kubernetes” to solve the “run everything on the cloud” problem. Now LLMs are the darling, and the hype machine is cranking louder than ever, promising riches from every new tech buzzword. But before we get swept away, let's take a reality check.

We, as data professionals, must acknowledge this cycle. We need to meet stakeholders where they stand, understand their bigger business picture, and make a clear case for how our LLM and retrieval augmented generation (RAG) capabilities derive real value.

That's the mission behind this blog series and our accompanying workshops. We're here to answer the provocative questions we hear every day from customers embarking on new AI projects:

  • What's the bottom line on these fancy acronyms for business?
  • How can we use these technologies without falling victim to the "cool factor" trap?

Why RAG matters

While RAG isn't a fresh face, its ability to bridge the gap between large language models (LLMs) and real-world applications is generating excitement. LLMs are adept at generating human-like text, but their output can be unreliable and prone to biases. RAG steps curate and filter information, ensuring that responses are based on factual, access-controlled, and prompt data.

Think of it this way: Imagine your LLM as a super-powered librarian, but one who needs a little guidance. RAG acts as the trusty knowledge curator, pointing the LLM to the most relevant and reliable sources within your own documentation or other trusted repositories. This results in correct responses that you can reliably iterate on.

No more garbage in, garbage out

The beauty of RAG lies in its focus on data quality, not just data quantity. We're moving beyond the “bigger is better” mentality of massive models trained on internet data that often include misinformation and biases. RAG puts the emphasis on smaller, more valuable models that use curated, trustworthy data sources.

Imagine this scenario: Instead of asking users to sift through mountains of documentation, you can empower them to ask the docs directly through a natural language interface. The LLM, guided by RAG, retrieves the relevant information from your trusted sources, ensuring the user receives exact and actionable insights.

Benefits beyond the buzzwords

By embracing RAG, you can unlock a range of benefits for your organization:

  • Improved decision-making: Accessing rights and trustworthy information empowers better choices and strategies.
  • Enhanced customer experience: Delivering reliable answers and insights builds trust and satisfaction.
  • Reduced risk and compliance: Curated data sources minimize the risk of misinformation and ensure compliance with regulations.
  • Increased efficiency: Streamlining access to information saves time and resources.

RAG is not a magic bullet, but it offers a powerful tool for building trust and value in the age of information overload. By focusing on data quality and reliable sources, we can move beyond the hype and unlock the true potential of generative AI for businesses of all sizes.

Reading the river

Alright! Maybe you are calling my bluff here. Sure, I promised RAG would be your ace in the hole for search accuracy, document-backed responses, and user-speed boosts that feel like magic. It is a promising technology, but seasoned data pros like us know the real game isn't played with smooth patter and royal flushes. We're talking riverboat rides with rapids, bluffs to call, and hidden cards we need to expose before placing our chips.

Execution that matters

Not ALL execution matters though. Just because you can execute on something or read a blog doesn’t mean you are ready to drive value for your business. Recently we did a “Chat with your Data” workshop. We supplied real-world context around how to build a RAG prototype. The architecture we proved is pictured below.

Figure 1. System architecture that details steps from question to generated answer using LLMs.Figure 1. System architecture that details steps from question to generated answer using LLMs.

The stacked deck

This blog is an abridged version of our technical brief, so I am going to mention the workflow, but if you want to go deeper check out the technical brief, the workshop recording, and the Ezmeral Software GitHub repo to learn more!

Our flow is as follows: 

  1. User: Transform raw documents into sentence embeddings.
  2. User: Ingest document embeddings, documents data, and metadata into the vector store.
  3. User: Ask a new question.
  4. LLM ISVC Transformer: Intercept the request, extract the user's query, and create a new request to the vector store ISVC predictor passing the user's question in the payload.
  5. Vector Store ISVC Predictor: Extract the user's question from the request of the LLM ISVC Transformer and ask the Vector Store for the k most relevant documents.
  6. Vector Store: Respond to the Vector Store ISVC Predictor with a relevant context.
  7. Vector Store ISVC Predictor: Respond to the LLM ISVC Transformer with the relevant context.
  8. LLM ISVC Transformer: Get the most relevant documents from the Vector Store ISVC predictor response, create a new request to the LLM ISVC predictor passing the context and the user's question.
  9. LLM ISVC Predictor: Extract the user's question as well as the context and answer the user's question based on the relevant context.
  10. LLM ISVC: Respond to the user with the completion prediction.

For sentence embedding we used the all-MiniLM-L6-v2 model because it excels at understanding and extracting information from given texts.

For the document ingesting and storage, we used a Kserve inference service and Chroma. We also used Kserve for the transformer and predictor services. We had to build a custom transformer, details are in the workshop replay.

For persistence we used MLflow.

For our LLM we used orca-mini-3b.ggmlv3.q4_0 model to generate natural language responses.

We also used HPE Ezmeral Software’s ability to import applications, import a chat application to call our transformer model, and let the orchestration happen.

Lastly, we built this all by running Jupyter notebooks in order then manually running the pipeline. Since this is a prototype, we opted out of building reusable Kubeflow components. Maybe in a future workshop we will show the modularity of Kubeflow pipelines!

HPE Ezmeral Software: The casino floor that keeps the game running

Our RAG game wouldn't be possible without a well-oiled casino floor, and that's where HPE Ezmeral Software shines. This platform is more than just fancy card tables and velvet drapes — it's the infrastructure that ensures smooth gameplay, from shuffling the deck to delivering the final winnings.

Imagine HPE Ezmeral as a seasoned casino manager, expertly orchestrating the entire play with Kubeflow, KServe, MLflow, and Jupyter notebooks at its disposal. All working together seamlessly on top of the solid foundation of Kubernetes. HPE Ezmeral takes care of all the practical details — networking, authentication, scaling — so you can focus on the art of the game, crafting the perfect RAG experience.

Open-source tools are the lifeblood of this casino, ensuring a vibrant community and constant innovation. No vendor lock-in here just a healthy pool of talent and ability that’s always ready to deal a new hand. Security is paramount, with APIs, authentication, and namespaces playing the role of vigilant bouncers, keeping the game fair and secure. Collaboration is encouraged, with notebooks and environments easily shared and replicated, letting everyone learn from each other's plays.

Cashing in our chips: The RAG gambit continues

Remember, not every RAG experiment strikes gold. But this technology holds immense promise for keeping your teams smarter, safer, and faster when it comes to navigating the information jungle. Building a RAG prototype requires more than just a deck of large language models — you need vector databases, containerization skills, distributed systems knowledge, and a hefty dose of resilience. Luckily, the open-source tools seamlessly integrated within HPE Ezmeral Unified Analytics Software dealt us a winning hand by abstracting away the heavy lifting.

We're not stopping here. We're pulling up a chair at the AI/ML table, pouring another cup of virtual coffee (something that I genuinely believe fuels any good tech demo), and inviting you to join us for our upcoming workshops. We'll dive deeper into various workloads, tackle real-world deployment scenarios, and show you how to run these systems without a constant state of pager anxiety.

We can't promise a glitch-free ride — HPE Ezmeral Software is a platform, a canvas for your creativity, not a magic spell against problems. But what we can offer is the flexibility to solve those problems on your terms, with the tools and insights to avoid technical debt avalanches. Consider these workshops your Avalanche Institute curriculum but for tech debt.

So, join us on this AI/ML journey! We'll share our knowledge, troubleshoot your challenges, and celebrate your successes together. The world of AI/ML is moving fast, and we're not slowing down. Come sharpen your skills and make the most of this exciting revolution!

Keep the conversation going! Share your thoughts on this blog post, ask questions about RAG, and let us know what topics you'd like to see covered in future workshops. The more we share, the smarter we all become.


Chase-Christensen.pngMeet Chase Christensen, HPE Global Field CTO, Unified Analytics

Chase helps organizations unlock the potential of technology by building and integrating impactful solutions. He leverages his expertise in automation, Kubernetes, and ML to empower data teams through open-source tools and frameworks.

 

 

Ezmeral Experts
Hewlett Packard Enterprise

twitter.com/HPE_Ezmeral
linkedin.com/showcase/hpe-ezmeral
hpe.com/Ezmeral

[1] Keep your AI projects on track, Harvard Business Review, December 2023.

About the Author

EzmeralExperts