Saturday, December 1, 2018

Deep dive into distributed tracing with Spring Boot and Flight Recorder

At Oracle CodeONE 2018 Marcus Hirt held very interesting presentation called "Diagnose Your Microservices: OpenTracing/Oracle Application Performance Monitoring Cloud [DEV5435]". This presentation gave an audience pretty intensive quick inside view to the large scale cloud application tracing.
Although his talk was not recorded, he has created pretty nice example, Robotshop, that he open-sourced.  You can find all information in his recently published blog posts (CLICK HERE). 
  His post also touched the Application Performance Management (APM) solutions that has been formed before OpenTracing standard has born.
  All intensive work crystallized into the previously mentioned OpenTracing standard, but what is it ?  As the web-site says It's a vendor-neutral APIs and instrumentation standard for distributed tracing (note: it's more about instrumentation). Companies like Datadog, Skywalking, Jaeger, LightStep are actively contributing into this standard, which is great news, it shows that the topic is pretty hot and we should probably think about it.
  Let's get back to my post topic. Many developers, world wide, are using spring-boot libraries stack to create a Spring based applications.

In this post you will learn how to connect some tracers (Jaeger, Zipkin) with your project. You will also learn how to connect Java Mission Control/Java Flight Recorder for very deep service analysis.

The post also stress the question how worthful it is to think about the application logging strategies. Why ? To be honest, who is reading couple of gigabytes big log files in order to spot the root case ;).  Who is writing continually very sophisticated regular expressions to get such information ?.

The source code is available on my GitHub account: [here] branch: [vehiclefactory-jfr-tracer]

   In my previous post I've shown how to configure spring-boot project using opentracing [CLICK_HERE]. It was just  a simple producer consumer pattern based demo. In the following text we create a VehicleShop project with 4 independent micro-services : VehicleShopService, FactoryService, StorageService, CustomerService. All those services are spring-boot based ones.
The example project uses the Gradle build system. It does contain all services and you can start each of them separately, in docker or you can configure the docker-compose file.

  The monitoring is about to "extracting meaningful stories" from the defined Metrics (four golden signals, RED method etc.) or Logs (app events, stack-traces etc.), in other hand tracing is more about Spans, Spans analysis. Tracking gives the answer to individual request.
  What is such Span ? The Span is the primary building block of distributed trace. Such individual Span represents the work done within the distributed system. How we define such Span ?
By the opentracing standard Span has following properties:

  • an operation name
  • start and finish timestamp
  • key:value Span Tag
  • a SpanContext 
    • carries date across the process boundaries
Having defined Span we define the space from where such Span is coming from. Let's consider one individual Span,  we name it "Active Span". Such Span is responsible for the work accomplished by the surrounding code. Active Span has actually one very important property:
  • it can be only one Active Span inside the thread at time T.
Such property is managed by the Scope, which formalizes the activation and deactivation of a Span. 
OpenTracing standard defines the term Trace. Trace represents the interface which implementation creates the Spans.

The VehicleShop example 
Previously has been mentioned that VehicleShop project is the collection of micro-services. Those services are exchanging information between each other according to the schema (Img.1.)
Img.1.: VehicleShop project schema
The Customer service represents the set of the schedulers. Those schedulers generate a traffic that simulates the multiple customers behaviour like buying , searching or upgrading the car. The VehicleShop service sends requests to the vehicle elements storage (StorageService) in order to get information about available pieces. The VehicleShop sends requests to the FactoryService to check which cars are available to sell out.  The FactoryService before any car gets produces send the request to the StorageService whether the VehicleElement, necessary to build the car, are available (Img.2.)
Img.2.: example communication diagram
All services have enabled opentracing support by the following libraries:

when you add any of those libraries into the project class-path they will automatically configure the default tracer settings. You simple can start the tracer inside the docker container:
$docker run -d -p 6831:6831/udp -p 16686:16686 jaegertracing/all-in-one:latest
$docker run -d -p 9411:9411 openzipkin/zipkin:latest

Of course you can use the custom configuration of the tracer, but in such case you need properly re-configure spring @Configuration beans. The another alternative is to use the environment variables.
When the all services are up, you can open the Jaeger UI : http://localhost:16686/ (Img.3.)

Img.3.: JagerUI : VehicleShopService traces
The interface allows to analyze taken spans and observe a real application communication over the micro-services (Img.4.)

Img.4.: Tracing a specific request 
The Jaeger UI offers one very neat feature. It's ability to compare two spans by IDs. The span Id you can get from available traces (Img.5.)

Img.5.: Span comparison
As you can see the example architecture is very simple but it already shows the value of the tracing.

  Let's connect the example with Java Mission Control / Flight Recorder and see the traces from different perspectives.
  As the tracing provides you the ability to get the view into the application communication layer and here recognizing some potential issues, when the JFR is attached to the micro-service JVMs, you can directly analysize the potentially suspicious code.  It means that the trace may be very good initial signal to start the investigation.

Attaching Flight Recorder to the VehicleShop example
Marcus has recently published very neat library java-jfr-tracer . This library allows to record Scopes and Spans into the JDK Flight Recorder for very deep analysis.  To enable such feature it necessary to add following library into the project.
implementation "se.hirt.jmc:jfr-tracer:0.0.3"

After having the library we need to register a new JFR Tracer, which is the wrap of the OpenTracing tracer, in following way:

public CustomTracerConfig(Tracer tracer) {
     GlobalTracer.register(new DelegatingJfrTracer(tracer));

Now information provided by tracers will be available also through the Flight Recorder interface through recorder Events (Img.6.)

Img.6.: Flight Recorder with recorded spans
The Flight Recorder gives you a chance to recognize a suspicious trace ids which may bring the significant value during the issue solving. The Spring framework is not just a one thread framework and using it may bring/open potential challenges (Img.7.)

Img.7.: VehicleShop threads view

Distributed Tracing is great and challenging.  As you can see from the image (Img.7.) there are bunch of threads and by adding new libraries to the project a new are coming. Distributed tracing is opening the way how to understand to what is really happening across the micro-services.
It means distributed tracing allows/supports:
  • distributed transaction monitoring
  • performance and latency optimization 
  • root cause analysis
  • service dependencies analysis
  • distributed context propagation
The only OpenTracing standard implementation may not be enough. Often you may need to know what is happening inside the specific one and the trace may forward you directly to the issue.

Using Flight Recorder for Distributed Tracing is like using "weapon" from different galaxy, it's not the silver bullet but it may be very close to it .

Enjoy the Demo and happy tracing!

No comments: