HPE Storage Tech Insiders
cancel
Showing results for 
Search instead for 
Did you mean: 

Advanced Data Services in CI/CD pipelines using Docker Datacenter

mmattsson

In typical multi-tier applications, there are challenges performing tests with high quality data. Either data is copied out of production, mocked, stubbed or the tests are running on empty datasets. In the former case, the copying of data might have performance impact on the production environment and may resort to backed up data. The further data moves from the production environment, quality degrades for testing. In many cases the backups may not be available to run tests on for CI/CD pipelines, as they are just that, backups. In the latter case the effectiveness of the test degrades as running tests on stubs or mocked datasets might not reveal problems present on a fully populated production system.

In today’s highly efficient CI/CD systems, code changes are integrated, tested and deployed to production multiple times per day. In order to achieve high quality and confidence, tests need to be accurate and performed often. Problems need be discovered fast and mitigated even faster. Discovering issues in production needs to be avoided at all cost. Bugs will always be introduced for the fact that new features need to be added continuously to stay competitive and relevant. Recovering gracefully and correcting errors and issues faster than anyone notices them could very well be a business differentiator that allows small teams to iterate fast to deliver a high-quality experience for their customers.

Containers are extremely efficient at improving quality of software delivery due to their ability to package all the application runtime in a format that runs verbatim on any platform. Transporting stateful persistent data in container images is not very practical as the container is ephemeral in nature. This means that data needs to be accessible independently to containers across multiple environments to serve production, staging and development. There also needs to be a secure and clear separation between these environments to ensure they’re capable of running autonomously without cross dependencies.

Organizations treating sensitive data that is somehow regulated or restricted face even greater challenges performing tests on accurate datasets. These could be credit card details, medical records, social security numbers or confidential financial records. Making these datasets available for testing requires intermediary steps to be taken by either scrambling, discarding or masking the sensitive parts from being accessed by developers or test teams.

Solution

In the following scenario, we’ll discuss an artificial build pipeline that uses Git, Jenkins, Ansible and Docker to build, ship and run a containerized Python application accessing a 850GB MySQL containerized database. Docker Datacenter is the centerpiece of the container solution utilizing the Nimble Storage Docker Volume plug-in to clone the production database from a production cluster to a development cluster where the application will be built and tested. The application will then be deployed to a staging area where it will be kept running after successful tests and in the final phase will be deployed to production.


A ten minute narrated screencast is available on YouTube that glance over the details outlined below.

cicd-firstframe-light-web.png

Fig. The TL;DR version

Environment

The pattern being assumed is that dev, stage and prod are isolated islands without any cross-dependencies. Each environment is a nine-node Docker Datacenter cluster, but depending on the level of sophistication this could easily fit this onto one Docker Datacenter cluster using teams and labels for your resources creating nodes with label affinity towards dev, stage and prod. A partial goal of this exercise is to demonstrate the capability of application multi-tenancy and the ability to isolate Docker Volumes to certain clusters being served from the same Nimble array.

Docker Datacenter advantages

Having a secure and reliable platform for any container orchestration is paramount to allow the right abstractions. Docker Datacenter provides Active Directory/LDAP integration, central syslog support, a trusted registry and foremost the Universal Control Plane (UCP). UCP allows users to access the Docker environment without having access to the nodes themselves. Through object labeling it’s also possible to achieve role-based access controls (RBAC) for users and teams. Developers use the local native Docker client on their laptops and remotely build, ship and run their applications, including external resources such as networks and volumes. Docker Datacenter and the Docker CS Engine is used exclusively throughout in this example.

Nimble Storage advantages and nomenclature

Nimble provides the capabilities to manage thousands of volumes and over one hundred thousand snapshots per group of arrays. While it may not be practical to expose that many volumes to the Docker environment, the Docker plug-in may be scoped into different folders and pools which allows for application multi-tenancy. Metaphorically, a pool is like a hard drive and a folder is a directory, the files would correspond to volumes you may expose over a block target protocol. The system administrator then has the means to lock-in the Nimble Storage Docker Volume plug-in into a certain pool or folder. This enables having multiple Docker environments for different purposes or tenants.

docker-cicd-eng-revB-arch.png

Fig. Architecture Overview

In this exercise, three different environments are being used, separating development/build, staging and production. It’s also possible to clone and move resources around in the folders directly from the Docker interface, which yields the capability of cloning production data to the build environment and later importing it to the staging environment.

Infrastructure cluster

The infrastructure used in this exercise features a plethora of standard tools used for various tasks. Most of the applications use standard docker images from Docker Hub, some with very little modifications. Apps requiring persistent storage are being served by Nimble Docker Volumes off the same Nimble array serving the CI/CD pipeline. In no particular order:

  • Git - Version control system used for the skeleton application.
  • Jenkins - Continuous integration and continuous delivery/deployment application framework orchestrating the build, ship and run aspect of the entire workflow.
  • Ansible - Application and infrastructure management used to define the build, ship and run steps for the entire pipeline.
  • InfluxDB - Time-series database used to track application metrics to measure performance.
  • Grafana -  Data visualization of the skeleton application.
  • Docker Registry - Insecure local registry used for the shipping steps (Docker Trusted Registry is encouraged for production deployments).
  • generatedata- Used to randomly generate 850GB of dummy MySQL data. Hosted locally for performance reasons.
  • nginx - Webserver used as a reverse HTTP proxy for all the web applications (uses a custom image, no persistent storage).

The dev, stage prod clusters

In the lab setup, all three environments are identical. Best described as a nine-node Docker Datacenter cluster built on top of KVM virtual machines using CentOS 7.2 on CentOS 7.2. The Jenkins application have its own separate credentials for all three environments when deploying applications.

The skeleton application: Populous

There are no good “Hello World” applications to use for data management at scale and a custom Python application was simply made up to fit this exercise where the amount of data is the most critical point of this demonstration.

The application consists of two container images:

  • app - Gunicorn Python WSGI serving a custom application using the minimalist Falcon REST server framework. Exposes a number of REST resources used to populate the database, gather application performance metrics and generate a 64KB BLOB used to speed up filling of the database.
  • db - Uses the stock Docker Hub MySQL image with a custom initialization statement. The database itself is a single table with a few columns and best described with the create statement. The database is roughly 850GB with 13 million rows.

CREATE TABLE main (

      id int unsigned NOT NULL auto_increment,

      guid varchar(36) NOT NULL,

      pid bigint default NULL,

      street varchar(255) default NULL,

      zip varchar(10) default NULL,

      city TEXT default NULL,

      email varchar(255) default NULL,

      name varchar(255) default NULL,

      imprint longblob,

      PRIMARY KEY (id)

) AUTO_INCREMENT=10000000;

ALTER TABLE main ADD INDEX guid (guid);

Fig. Database create statement

The pipeline

Jenkins is installed from the stock Docker Hub image with the default set of plug-ins. A few custom layers were added, such as Ansible and Docker to allow building the application. The only custom plug-ins used are the Ansible and ANSI-color output (Ansible produces colored logs which are easy to read). The Jenkins job is fairly simple. It has a build hook used by a git post-receive hook which essentially kicks off the build after each successful push to master.

jenkins.png

Fig. Jenkins job overview

Three separate build steps are defined which essentially run the same Ansible playbook against each of the environments dev, stage and prod. Different roles are honored depending on the target environment. A cheap Ansible inventory trick is used to execute against each of these environments as ‘localhost’ is where the playbook is executed and the Docker commands only care about certain environment variables to point to the specific environments Docker Datacenter.

# Populous Docker Datacenter environments

dev ansible_host=localhost ansible_connection=local ucp_host=tme-lnx1-dev.lab.nimblestorage.com

stage ansible_host=localhost ansible_connection=local ucp_host=tme-lnx1-stage.lab.nimblestorage.com

prod ansible_host=localhost ansible_connection=local ucp_host=tme-lnx1-prod.lab.nimblestorage.com

Fig. Ansible inventory configuration

The ‘populous.yml’ playbook and roles live with the source code of the application and therefor all the build processes and tests are version controlled and potentially peer reviewed. From a high-level, the steps to build, ship and run the application from source code to production would encompass these three steps:

$ ansible-playbook --vault-password-file=$ANSIBLE_VAULT -l dev -e build_number=$BUILD_NUMBER populous.yml

$ ansible-playbook --vault-password-file=$ANSIBLE_VAULT -l stage -e build_number=$BUILD_NUMBER populous.yml

$ ansible-playbook --vault-password-file=$ANSIBLE_VAULT -l prod -e build_number=$BUILD_NUMBER populous.yml

Fig. Actual Ansible commands executed by Jenkins

As part of the host variables there is a "secrets.yml" file which is encrypted with Ansible Vault. This allows for safe keeping of the Docker UCP credentials for the Jenkins account. In Jenkins, we create a binding that exposes this secret file to the build workspace when the build job executes. The "secrets.yml" file is safely stored in git.

---

- hosts: all

  tasks:

    - name: Ensure only one host is targeted

      fail: >

        msg="More than one host specifed, use -l to limit to either dev, stage

        or prod"

      when: "{{ play_hosts|length }} != 1"

    - name: Determine build_number

      set_fact:

        build_number: 0

      when: build_number is undefined

    - name: Determine build_version

      set_fact:

        build_version: "{{ lookup('file', 'VERSION') }}"

    - name: Set build_string

      set_fact:

        build_string: "{{ build_version }}-{{ build_number }}"

- include: util_docker_env.yml

- hosts: dev

  environment: "{{ local_docker_env }}"

  roles:

    - build

    - ship

- hosts: stage

  environment: "{{ local_docker_env }}"

  roles:

    - destroy

- hosts: all

  environment: "{{ local_docker_env }}"

  roles:

    - run

- hosts: none

  environment: "{{ local_docker_env }}"

  roles:

    - mask

- hosts: all

  environment: "{{ local_docker_env }}"

  roles:

    - smoke

- hosts: dev

  environment: "{{ local_docker_env }}"

  roles:

    - destroy

Fig. populous.yml

Examining the build steps more closely in detail, each of the phases conducts the steps outlined below. Assume ‘status quo’ that the production application is up and running and at a high-level, the “prod" database volume is cloned to the “dev” environment and when built/tested on, gets imported (moved) to the “stage” environment where the application continue to run until the next build.

docker-cicd-eng-revB-build.png

Dev

In the “dev” environment the application has a short lifespan; it builds, ships, runs and is tested. In the run phase, the Nimble Docker Volume plug-in clones the production volume. After that, the application is removed and the cloned volume is removed from Docker and off-lined on the array.

  • Build - Calls docker build on the app and db container’s Dockerfile with the current tags.
  • Ship - Tags and pushes Docker images to the Docker Registry.
  • Run - Creates a clone of the production volume. Runs docker service create to deploy the app as a global service and the db container as a single instance service.
  • Mask (optional) - This is step is not performed. Please see section below for discussion.
  • Smoke - Verifies a correct JSON response from the application and passes once the correct (current) version is returned.
  • Destroy - Issues a docker service rm and also removes the docker volume. This simply removes the volume from Docker control and offline the volume on the array.

Stage

The “stage” environment’s purpose is mainly to provide a sandbox where the application will continue to run for exploratory and manual testing. Depending in the confidence put on automatic testing, some might pause the build pipeline here and only deploy to production manually after manual testing has been approved by human interaction.

  • Destroy - Issues docker service rm and permanently destroy the previous clone.
  • Run - Imports the offline “dev” cloned volume. Runs the app.
  • Smoke - Same tests performed as in the “dev” step.

Prod

In “prod”, only the “app” container is updated to demonstrate how disruption is minimized. The application service impact for continuous deployment is discussed in the next section.

  • Run (update) - Issues a docker service update and bulk updates the running images to the new version.
  • Smoke - Same tests performed as in the previous steps to ensure consistency.

Production impact analysis

Having a fully automated continuous integration, delivery and deployment method brings enormous gains to the software supply chain in terms of speed, agility and quality. If code is being pushed several times per day, is it reliable? What are the risks and what is the cost of updating a containerized application during business hours?

The Populous app is a global service that runs in Swarm-mode. That means that each node in the cluster will run exactly one instance of the application. When an update occurs, containers will be restarted with the new image in bulk at a configurable parallelism. With the built-in load-balancer in Docker Swarm, an outage will never occur as the application will always be responding from a running container.

The following screenshot displays the container replacement process during the last “run” step in the “prod” environment.  Notice the versioning in the “Image” column.

ddc-update.png

Fig. View from Docker Datacenter UI while performing a docker service update

As mentioned previously, application response times are being measured by one of the REST calls. The time being measured is to retrieve a random row from the 13 million records which signifies our potential user’s application response times. This is the REST response being retrieved every five seconds:

{

  "version": "1.0.1-28",

  "served_by": "e380bca895a7",

  "response_time_ms": 37.60504722595215

}

Fig. JSON output from the _ping REST call

The “served_by” key signifies which container served the request. In the below screenshot from the Grafana dashboard it’s being observed that cutting over between containers has zero impact on the end user’s experience and previous cloning and importing steps do not impact production response times what so ever. We also have evidence that the entire pipeline executes in about five minutes.

grafana-update.png

Fig. Grafana dashboard

A note on data masking

As an optional step, it’s quite trivial to insert transformation DML to the database in the “dev” step. This is useful for masking sensitive data that may reside in the database that should not be part of the “stage” environment as it may be exposed to developers and users without any clearance to access such data. Doing this type of operation on a terabyte sized transactional database is not practical from a CI/CD perspective. It’s purely left here as an example of what is possible in such a scenario.  It would be completely feasible in the event of having nightly builds instead and the impact wouldn’t have developers waiting for results on their code push.

Comparison

For this particular database, a baseline copy using mysqldump <source> | mysql <destination> took roughly five hours. The build pipeline executes normally between four to five minutes, which is roughly a 60x improvement! This translates to having more productive developers as they get answers sooner with real data and problems get addressed before they reach production and impact users. There is also potential risk with doing full database dumps as they might have a negative impact on performance.  Those operations may be scheduled for off-hours, but in today’s day and age all systems are expected to perform optimally around the clock.

Summary

Whether adopting CI/CD for data intense or data sensitive applications, using Nimble Storage arrays and Docker Datacenter ensures that containerized applications and their data is secure, reliable and available. It also provide all the right abstractions for the development teams. Regardless of the tools being used, Nimble Storage caters well to any standards-based CI/CD system that may interact with REST APIs or use our application specific integration such as the Docker Volume plug-in or the Oracle Application Data Manager. Improving quality in CI/CD build pipelines with real data without disrupting the production environment has never been easier. Please let us know below if you want to gain a 60x improvement in your software supply chain today!

About the Author

mmattsson

Data & Storage Nerd, Containers, DevOps, IT Automation

Events
Apr 24 - 25, 2018
Online
Expert Days - 2018
Visit this forum and get the schedules for online HPE Expert Days where you can talk to HPE product experts, R&D and support team members and get answ...
Read more
June 19 - 21
Las Vegas, NV
HPE Discover 2018 Las Vegas
Visit this forum and learn about all things Discover 2018 in Las Vegas, Nevada, June 19 - 21, 2018.
Read more
View all