IT Operations Management (ITOM)
Showing results for 
Search instead for 
Do you mean 

How to Play with HPE Diagnostics

MichaelProcopio ‎03-07-2016 04:15 PM - edited ‎03-08-2016 12:50 PM

Guest post by Piotr Findeisen

Introducing the Play framework


The Play framework’s main goal is increasing developers' productivity but it can also be characterized by a number of novel approaches to the application architecture.

One of them is a complete departure from the J2EE framework standards and conventions but for the purpose of this blog the most interesting and the most challenging is its widespread use of functional programming principles and asynchronous I/O.

This post will discuss the Play framework and how to configure HPE Diagnostics to monitor applications using Play.

Traditional execution

The traditional execution model for business applications written in Java was devoting a thread to each server request instance. The same thread was responsible for reading the HTTP request in, including possible payload, analyzing and validating the request, invoking all necessary back-end functionality, including SQL statements execution, and finally writing the response back to the invoking party.

Most often the threads were a part of a thread pool and got reused. However, even with dynamically adjusted pool size which reacted to the application load, the scalability of these applications was limited. Threads are expensive resources. They consume a lot of memory (each thread has its own stack), and put a pressure on the operating system kernel and the Java Virtual Machine. A huge number of threads, even if they are mostly idle, affect the performance of thread scheduling and Java runtime operations such as garbage collection.

Most of the application monitoring tools, including Diagnostics, have no difficulty supporting the traditional execution model. The thread which starts execution of a server request is the same thread which performs all actions on its behalf, and is the same thread which drives the server request to its completion. Therefore measuring all necessary latencies and coordinating all captured artifacts becomes straightforward.

New programming techniques

To improve application scalability and at the same time increase its performance, new programming techniques emerged. With the new approach, a thread executing a server request abandons the request whenever it is expected that nothing needs to be done for it for a while; for example, after sending a query to a database.

Since the application needs the query result to proceed, the executing thread can get utilized more efficiently by serving another server request instead of waiting for the database response to arrive. Naturally, the execution of the previously abandoned server request must continue when the result is ready, but it can be done by a different thread. This technique, when skillfully applied, is capable of executing the same load as the traditional application by using a thread pool of an order of magnitude smaller than originally required.

Monitoring applications based on these new techniques is not always easy, but HPE Diagnostics provides powerful API to empower the end user to configure the Diagnostics agent to “understand” the logic of server requests which switch threads. More on that later …

The Play framework

Let's start with a simple example of a Play framework based application. You are welcome to look up more advanced examples on the web.

Here is a very basic web application using Play:

package controllers;

import play.*;
import play.mvc.*;
import views.html.*;

public class Application extends Controller {

  // Play Action
  public static Result index() {
    return ok(main.render());
  }
}


After pointing a web browser to this test application root URL the default application configuration (not shown here) will cause the "index" action to be invoked. The action just displays the configured application name ("HP Diagnostics Test Application" in this case).

 

piotr blog1 p1.jpg

 

Play goes to a new level


Play has pushed the asynchronous model to a new level. One of the goals of the Play framework is that the application code never blocks (or waits). At the same time the framework makes extensive use of "futures" or "promises". These programming constructs are especially useful for asynchronous programming. They represent place holders for values that will be available in the future, but can be passed around and manipulated before that. With advent of Java 8 and lambda expressions, it became really easy to write code that combines functional approach with asynchronous programming. This will be demonstrated by the second example.

Let's look at this slightly more advanced example.

     1    package controllers;
       
     2    import play.*;
     3    import play.libs.F;
     4    import play.libs.ws.*;
     5    import play.mvc.*;
     6    import views.html.*;
       
     7    public class Application extends Controller {
       
     8      // Play Action
     9      public static Result index() {
    10        sleep(5);
    11        return ok(main.render());
    12      }
       
    13      // Play Action - make a web service call to url and count characters received
    14      public static F.Promise<Result> pageSize(final String url) {
    15        Logger.info("url = " + url);
       
    16        WSRequestHolder holder = WS.url(url);
    17        F.Promise<WSResponse> responsePromise = holder.get();
    18        F.Promise<Result> result = responsePromise.map(response -> processResponse(response.getBody()));
       
    19        return result;
    20      }
       
    21      private static Result processResponse(String responseBody) {
    22        Logger.info("Processing response");
    23        return ok("The response contains " + responseBody.length() + " characters.");
    24      }
       
    25      // Helper function to sleep the specified number of milliseconds
    26      private static void sleep(int milliseconds) {
    27        try {
    28          Thread.sleep(milliseconds);
    29        } catch (Exception e) {
    30        }
    31      }
    32    }

This example adds one more Play action called pageSize (line 14). This action takes a URL string as the argument, and makes an asynchronous call to it, using a built-in facility to make web service calls. Upon receiving a response, it simply counts the characters received and reports the result to the browser. Since the call is asynchronous, the action pageSize returns before the response is received. Instead of returning a value of Result type, it returns a promise of the result, i.e. F.Promise<Result> (lines 18-19). Similarly, the asynchronous call (line 17) returns a promise of a web service response instead of the response itself.

Promises, promises

Invoking the map method on this promise (line 18) creates another promise, but this time it is the promise of a Result. When the actual web service response for the call arrives, the responsePromise gets satisfied, and the processResponse() method will be called. This method does the actual conversion a web service result to a Result (to satisfy the result promise). Once the Result is available (line 23), the framework will convert it to an HTTP response which will get returned to the browser. Keeping track of the pending promises is performed automatically by the framework, the above code is all I had to write to make the example work.

For the sake of simplicity, I specified the URL used in the first example as the argument for the new action. Thus, the application is asked to make a call to itself, using the "index" action from the previous example as the call target. The result displayed by the browser is shown below.

piotr blog1 p2.jpg

 

The Play framework and Diagnostics

Given the complexities of the Play framework, you might guess that monitoring Play-based applications is very challenging. So let’s cut to the chase and see how Diagnostics can show the invocation from the last example. The captured Call Profile is shown below. More technical details of the Diagnostics configuration for Play framework support will be discussed in a later section.

piotr blog1 p3.jpg

 

One of the first things to note is that this is a Cross-VM call profile, as the application made a call to itself (to make the index method invocation visible, I inserted a short sleep into this method). There are a number of things to point out in this Call Profile:

  • the outbound call itself is shown as Asynchronous, with the latency corresponding to the actual external call rather than the true internal latency of the method that started the call (which cannot be larger than the latency of any of its parents)
  • the big white gap in the middle of the Call Profile is not a bug or mistake - no code ran for the /pageSize server request at this time at all: the top bar shows the whole server request and the outbound call bar shows an asynchronous operation - these bars are somewhat artificial and none of them corresponds to a Java method execution; however, they represent the timing relationship between the server request and the asynchronous call
  • the I/O time, even for asynchronous operations, is correctly recognized and shown for the whole server request (125 ms)
  • the Play framework internally uses a lot of promises, futures and callbacks - some of them are captured by this Call Profile
  • the example uses the Play framework standard logging, which has been instrumented for the purpose of this example - you are welcome to find both logging calls in the application code and in the Call Profile

Configuring Java Agent to support the Play framework

Before jumping into Play specific instrumentation, it is necessary to present some of the mentioned before Diagnostics Java Agent features which are fundamental for monitoring server requests executed by multiple threads.

To accommodate performance monitoring of the new type of applications, Diagnostics Java Agent offers two operations which can be executed by code snippets associated with selected instrumentation points:

  • park - to be executed whenever a server request ceases to be executed by a thread, but is not done yet (for example, after a query to a database is sent)
  • resume - to be executed when a pending server request is picked up by another thread to continue its execution (for example, when the result from the database arrives)

Execution with multiple queries

As the server request execution may involve multiple database queries, the above pair of operations may be executed multiple times for the same server request instance. The Diagnostics probe will keep track of the server request state and accumulate all performance data for the server request. When the server request is finally completed, it will get reported in very much the same way as the traditional server requests are.

However, the Call Profiles for the server request will show the characteristic white gaps, representing the time elapsed between park and resume operations. The gaps illustrate the fact that nothing was going on with the server request during this time - there was no thread associated with it, and there was no Java code that could be identified as currently running. See the Call Profile in the previous section.

Between park and resume

So what exactly happens with a server request between the park and resume operations? There surely must be some references to it to allow the application complete the request. The answer is slightly different depending on whether we look at the server request from the application perspective or from the Diagnostics probe perspective.

The application must retain in memory all data structures needed to continue the server request. It will always involve some kind of manager to keep track of those server requests that require continuation when the conditions allow it. Furthermore, if we continue with our database example, it must maintain a mapping from the possible query results to the server requests so it passes the right results to them, and continues the execution of those server requests that received their results.

The details of such a mapping will be different for each application. For example, one can imagine that the database connection object can be used here. Other possibilities include objects representing the web session, or session context, or incoming HTTP server request objects.

Anchors

On the Diagnostics side, the application performance specialist who creates the instrumentation points and code snippets for the application must be sufficiently familiar with the application architecture to be able to identify the objects that the application may use for server request continuation. References to these objects (we call them anchors) must be passed as arguments to the park and resume calls. Thus the Diagnostics agent identifies the server request based on the anchors. However, the application may still identify the server request by different objects.

Using park and resume operations with the Play framework poses a new challenge with finding suitable anchor objects, however. One reason for that is that the framework strongly promotes a REST-full execution model, so there are no session objects at all. The other reason is that thread switching is triggered also by very low level layers of the framework - where any high level execution context (like incoming HTTP request) is simply not available.

Tokens

The solution was to create some artificial token objects, which are used only by the Diagnostics agent to identify the server request. These objects are created by a code snippet executed whenever a new HTTP request comes in. The actual instrumented method for this action is 

com.typesafe.netty.http.pipelining.HttpPipeliningHandler.messageReceived().

You may want to find it in the Call Profile shown in the previous section. These token objects are used later as anchors for park and resume operations.

Since the tokens are created only by Diagnostics instrumentation, the application code cannot maintain them. They are manipulated exclusively by the Diagnostics agent. The agent needs to make sure that the tokens do not get lost (i.e. garbage collected) prematurely and that they can be accessed by the park and resume operations. This goal has been achieved by following the following rules:

  • while the server request is being executed by one of the Java threads, its identifying token is stored in the thread local storage
  • if the server request gets parked, the token is removed from the thread local storage (since the thread will no longer be executing this server request) and stored in one of the application objects (a room for it is made by the instrumentation)
  • if the server request is to be resumed, the token is retrieved from the appropriate application object and stored in the current thread local storage; the reference within the application object is removed

Strict rules for tokens

Thus, with the exception of brief periods while the Diagnostics code snippets run, the token is referenced by exactly one reference. At most one application object or one thread via its thread local storage can reference the token. This simplifies the logic of involved code snippets substantially and makes the determination of whether to park or whether to resume at a given execution point quite straightforward.

Since the Play framework uses several sub-frameworks (which also switch threads), these also had to be instrumented. When we instrumented our first real-life Play application at a big Telco customer, the instrumentation points had to be added for:

  • Scala Futures, Promises and Execution Context
  • Akka Actors, Messaging and Dispatching
  • Ning Http Client and WS Client
  • Netty HTTP server
  • DataStax Cassandra client

The added instrumentation handles the server request identifying tokens according to the rules outlined above. In particular, this means that if the token is not present (for example when any of these frameworks is used outside of the Play framework context) then the instrumentation has no effect. This protects robustness of Diagnostics Java Agent in case of environment or application changes.

This exercise demonstrates that Diagnostics is a highly customizable product with unmatched flexibility, providing value even in the most challenging environments. All configuration changes to support the Play framework can be made by the end user.

PiotrF_blog.jpg

 

About the author: Piotr Findeisen

A Senior Java Engineer in the Diagnostics agent team, I like to share my experience with configuring the Diagnostics Java Agent to support applications built on top of  the Play framework (https://www.playframework.com/). For this experiment, I've used Play version 2.2, but similar results are expected with all 2.x versions.

 

 

Diagnostics software monitors application transaction health in traditional, virtualized and cloud environments—allowing quick isolation and resolution of issues. It gives you a common tool to easily collaborate across the entire application lifecycle and release higher-quality applications.

Tweet to us at @HPE_ITOps  and let us know what you think! | Follow HPE Software on LinkedIn  | Friend HPE Software on Facebook

 

 

 

 

 

About the Author

MichaelProcopio

HPE Software Product Marketing. Over 20 years in network and systems management.

Events
June 6 - 8, 2017
Las Vegas, Nevada
Discover 2017 Las Vegas
Join us for HPE Discover 2017 in Las Vegas. The event will be held at the Venetian | Palazzo from June 6-8, 2017.
Read more
Apr 18, 2017
Houston, TX
HPE Tech Days - 2017
Follow a group of tech bloggers for a new HPE Tech Day, a full day of sessions about how to create a hybrid IT, from hyperconverged to Composable Infr...
Read more
View all
//Add this to "OnDomLoad" event