The Cloud Experience Everywhere
1754366 Members
4551 Online
108813 Solutions
New Article ๎ฅ‚
ServicesExperts

The missing step in designing and building enterprise-grade modern IT monitoring

Learn why many enterprises underestimate the power of metrics-based IT monitoring, what step theyโ€™re missing, and how Thanos delivers the functionality they need.

Enterprise-IT-Monitoring-Thanos.png

Cloud native computing technologies are ubiquitous. They are the de facto standard for runtime environments of enterprise workloads to simultaneously drive business innovation, agility, and optimization. With the adoption of cloud-native computing technologies in enterprises, the demand for seamless, consolidated, enterprise-level monitoring is increasing, and many believe Prometheus is the answer. Prometheus is widely used to enable enterprise-grade, metrics-based monitoring that provides real-time monitoring of targeted components on enterprise IT platforms.

However, it is not sufficient to enable true seamless consolidated enterprise-grade monitoring (IT) as defined by the Cloud Native Computing Foundation (CNCF). The CNCF Technical Advisory Group (TAG) defines IT monitoring with the following three main functions: metrics, real-time monitoring and alerting, and trending and analysis. Prometheus provides two of the three key features, namely metrics and real-time monitoring and alerts, but not trends and analysis. In order to achieve the full functionality of seamless enterprise-grade consolidated IT monitoring, it is crucial to take a step forward. 

This article is about the missing final step of metrics-based monitoring for "trending and analytics" from our experience deploying and operating cloud native computing technologies like Kubernetes and Prometheus for enterprises.

Prometheus-based monitoring systems are increasingly being deployed by many enterprises in parallel with the proliferation of Kubernetes. Prometheus is a cloud-native computing technology that provides monitoring and alerting capabilities for cloud-native environments, including Kubernetes. It can collect and store metrics as time series data by recording information with a timestamp. It can also collect and record labels, which are optional key-value pairs. Prometheus is a key component of a metrics-based monitoring platform, but often does not meet the needs of enterprise customers because its functionality and data storage are limited and, in our experience, do not meet trending and analysis requirements.

Prometheus focuses on real time data display and alert with 15 days period of data store (which can be extended to a bit more with configuration changesโ€ฆ  But it requires pre-work of local disks design and build due to the limitation of local disk usage by Prometheus.)  As considering Prometheus for an enterprise IT production environment, the following functionalities need to be implemented along with it:

  • Analyzing a long-term trend
  • Analyzing a cyclical trend
  • Backup of monitoring data
  • Availability of data
  • Long-term log retention

To provide the functionalities for the challenges, Prometheus needs Thanos, who provides the missing functionalities.

Thanos is open-source software that helps any organization using Prometheus that needs high availability and virtually unlimited storage of historical data. Using Thanos makes Prometheus more scalable and a complete enterprise monitoring platform IT. The purpose of the Thanos project is described here:

  • Global query view of metrics
  • Unlimited retention of metrics
  • High availability of components, including Prometheus

Many companies need the requirements due to comliance and governance. To meet these requirements, Thanos plays the key role in transferring data to external storage with flexibility and adaptability.

The integration architecture of Thanos and Prometheus can be found as follows. The main considerations to be made are the storage design and the data store process.

Thanos-Prometheus-Architecture.png

 The first consideration is storage design with an external storage.  One of the following object storages can be used:

  • Public Cloud: Google Cloud Platform, Amazon Web Services, Azure, etc.
  • On-premise: S3-compatible storage

From a data intelligence perspective, our experience with design and implementation recommends using local external storage. You can easily use Ceph with Rook or Minio if you have an on-premises Kubernetes cluster running. If not, you can choose Scality or Cohesity.

Another consideration that must be made is the storage of metrics on external storage. Thanos can be configured in two ways, either "deployment with receive" or "deployment with sidecar" This depends on the architecture guidelines and IT requirements. However, from a purely technical perspective, deployment with sidecar is recommended.

  • Deployment with Receive: data is accessed via Prometheus and its API. The Receive component of Thanos collects metric data via the API. This API, as the name implies, was originally implemented for reading and writing from external storage, and the method is used by many solutions such as Elasticsearch and influxDB.
  • Deployment with Sidecar: Data stored in the Prometheus TSDB is accessed directly by Thanos. The TSDB architecture of Prometheus controls where metric data is stored. The sidecar component of Thanos periodically uploads newly created blocks to external storage to analyze and parse the data.

As with each of the above configurations, the same data is uploaded to external storage and saved. The sidecar configuration is recommended from an availability and resource usage perspective.

Thanos also has deduplication and data lifecycle features to optimize data volume and delete data whose retention period has expired.

Thanos seamlessly delivers holistic IT monitoring

The main advantage of Thanos integration, besides the main goal of enabling "trending and analytic", is the minimal change in configuration and operation from the user's point of view. From the user's point of view, using Thanos does not change much. The change is only the user's endpoint from Prometheus to Thanos. For access, users access Thanos and make queries for metrics instead of Prometheus. Often, users use Grafana for visualization. Grafana can display data on its dashboard by simply changing the data target source to Thanos, as the Thanos API is also compatible with the Prometheus API.

As cloud native computing adoption rises, it is increasingly important to have a holistic IT monitoring capability that covers the infrastructure, platform, applications and services, both in real time and taking trends into account. Without these two viewpoints, modern IT monitoring is not possible, and your enterprise environment IT is at risk.

Every enterprise needs to have more understanding of modern IT monitoring and securely manage and operate the enterprise IT. HPE GreenLake Advisory & Professional Services has deep expertise and experience in modern IT monitoring, backed by hundreds of cloud native computing projects, and can help improve the overall customer experience with modern tools and infrastructure.

Learn more about HPE Advisory and Professional Services.


Shinji-Arai.pngMeet Shinji Arai, Chief Solution Architect, Hybrid Cloud Practice 

Shinji Arai has been involved in technology since his school years. From the start of his career, he has been engaged in a number of key software development projects such as localization (Hebrew) and IA64 porting of OpenVMS, etc.  After moving to the service business unit in HPE, he has dealt with many cloud transformation projects leveraging his strong software development expertise and experiences as a chief solution architect. The His current focus is running mission critical workloads on cloud native computing platform leveraging Kubernetes to bring it mission critical flexibility and functionalities.


Services Experts
Hewlett Packard Enterprise

twitter.com/HPE_Pointnext
linkedin.com/showcase/hpe-pointnext-services/
hpe.com/pointnext

 

About the Author

ServicesExperts

HPE Services Team experts share their insights on the topics and technologies that matter most for your business.

Comments