- Community Home
- >
- HPE AI
- >
- AI Unlocked
- >
- A Data Scientist’s Dream: Zero to Hero in 15 minut...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Printer Friendly Page
- Report Inappropriate Content
A Data Scientist’s Dream: Zero to Hero in 15 minutes with Containerized AI
Agility, data-driven, self-service, business intelligence (BI), big data, artificial intelligence (AI), internet of things (IoT), (insert your favorite buzzword) are all popular candidates for buzzword bingo. Yet for a data scientist, two of these are more than just buzzwords: agility and self-service. They’ve each had their moment in the buzzword spotlight, but they represent fundamental capabilities that allow you to get stuff done (#GSD) with your data. Over my career, agility and self-service have been the nirvana every organization strives for; the only things that change are the tools and technologies to get there.
Early days of business intelligence
In my first real job out of college at Syntricity, my colleagues and I served up data logs from thousands of testers to help fabless semiconductor companies improve their yields and manufacturing processes. First, we solved the data problem with a file system backend and served up a self-service GUI that lets users select the lots, devices, program rev, etc. with the base statistics already pre-calculated via Oracle-based star schemas. Data was loaded within 24 hours and users could get answers to known questions within minutes instead of fumbling with 65k row limited Excel spreadsheets.
But pre-canned statistics were not enough, and as BI tools became prevalent in the mid-2000s, Syntricity almost went out of business trying to build their own drag and drop BI tool. In hindsight, they probably should have just integrated with Spotfire or another up and coming BI tool from the start.
The focus on time to value
Next, on to Teradata, the gold standard of massively parallel processing (MPP) databases. The performance bottlenecks were removed, and the data was curated and modeled in 3NF so users could ask any question of their data. This solution was great, in theory, but proprietary front-end BI tools added limitations, and IT security typically had policies so strict that users couldn’t actually do what they wanted with the data.
So Teradata developed a sandbox concept of a data lab for agility and self-service (shocking). They carved out performance-protected areas with read-only access to the underlying data and a personal scratch space to let the high-level business analyst (we would call these folks data scientists today) have access to the data they wanted, use the SQL tools they wanted, and integrate into their own personal data. This was remarkable because as an analytics consultant, I could bring in new data and analytics capabilities in a matter of hours to prove out the value of new concepts.
Containerization delivers agility for data scientists
Then came big data and the concept of the data lake. The original concept was simple: let’s throw all this data that we don’t know what to do with into a big lake and let our data scientists have at it. Isilon was the quintessential data lake storage technology with its scale-out OneFS, global namespace, and concurrent read capabilities. Companies stored hundreds of Terabytes, and in many cases – Petabytes of images, video, audio, and freeform text, but very few IT departments actually made this data available for analytics. There were just too many tools (many of them open source) for IT to keep up with the demands of the business and traditional infrastructure procurement timelines measured in months didn’t meet the agility needs of the data science teams.
It was during this time that containerization was beginning to become popular, and a start-up named BlueData was focused on containerizing stateful analytics apps like Cloudera and Hortonworks, enabling Big Data as a Service (BDaaS) for the enterprise. The Isilon team quickly packaged BlueData along with Isilon, and overnight, data scientists could now leverage existing infrastructure to spin up the open source tools they wanted and tap into their existing data lake sources with minimal IT involvement. What would normally take quarters to procure, configure, and deploy was now available in hours.
Modern state of agility and self-service: HPE Container Platform
Fast forward to today, and BlueData has matured into the full-fledged HPE Container Platform complete with open-source Kubernetes to maximize ease of application portability. Plus, the HPE Container Platform includes the MapR Data Platform (now known as HPE Data Fabric) to provide a native high-performance, scale-out data fabric. The HPE Container Platform is the modern data scientist’s dream state of agility and self-service—complete with an app store that comes out of the box with over 50 one-click application images plus a self-service process to download, install, and add the latest open source images to the app store.
As a proof point, I validated the self-service solution from the perspective of a new data scientist wanting to use the latest TensorFlow image available in the NVIDIA GPU Cloud (NGC). Following HPE’s step-by-step tutorial on adding new NGC application images, I was able to download, configure, and install the NGC sourced TensorFlow image in the HPE Container Platform App Store in under 15 minutes. Once configured, the application became a selectable tile in the App Store that allowed me to spin-up a cluster on-premises or in the cloud with this new app running, configured, and provisioned in under 2 minutes. This new cluster had immediate access to my data sources plus the native benefits of the HPE Data Fabric to connect my data from edge to core to cloud.
In approximately 15 minutes, I went from zero to data scientist hero, armed with my tool of choice, the infrastructure required to do my job, and access to all of my data. And best of all, this environment was delivered by IT but required no IT involvement. Needless to say, containerization is one of my new favorite buzzwords because it delivers the agility and self-service capabilities to #GSD.
Watch the HPE Container Platform demo to learn how the HPE Container Platform apps store provides curated, pre-built application templates for a variety of use cases.
Matt Hausmann
Hewlett Packard Enterprise
twitter.com/HPE_Ezmeral
linkedin.com/showcase/hpe-ezmeral
hpe.com/ezmeral
Matt_Hausmann
Over the past decades, Matt has had the privilege to collaborate with hundreds of companies and experts on ways to constantly improve how to turn data into insights. This continues to drive him as the ever-evolving analytics landscape enables organizations to continually make smarter, faster decisions.
- Back to Blog
- Newer Article
- Older Article
- Dhoni on: HPE teams with NVIDIA to scale NVIDIA NIM Agent Bl...
- SFERRY on: What is machine learning?
- MTiempos on: HPE Ezmeral Container Platform is now HPE Ezmeral ...
- Arda Acar on: Analytic model deployment too slow? Accelerate dat...
- Jeroen_Kleen on: Introducing HPE Ezmeral Container Platform 5.1
- LWhitehouse on: Catch the next wave of HPE Discover Virtual Experi...
- jnewtonhp on: Bringing Trusted Computing to the Cloud
- Marty Poniatowski on: Leverage containers to maintain business continuit...
- Data Science training in hyderabad on: How to accelerate model training and improve data ...
- vanphongpham1 on: More enterprises are using containers; here’s why.