This blog will explain the basics of the Plan Visualizer tool, which is the performance analysis tool inside HANA. The different sections show:
- How a plan can be created directly from HANA Studio,
- how and why you should invest in a consistency check before the analysis,
- what the overview page and the links to the analysis tools are good for,
- what is shown in the Tables Used list and how the synchronization into the plan works,
- what you should know about the selection – and the semi-join operators,
- how the timeline tool can be used to find the main processing bottlenecks,
- and where to get an overview of the executed plans.
Preface – The Example Statement
As an example, a rather simple statement is used, it is a join of two tables:
a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
from BSEG as a
innerjoin ACDOCA as b
on b.rclnt = a.mandt
and b.rbukrs = a.bukrs
and b.gjahr = a.gjahr
and b.belnr = a.belnr
where a.mandt = ‘715’
and b.rldnr = ‘0L’
and b.docln = ‘000001’
and a.buzei = ‘001’
and a.gjahr = ‘2018’
–and a.gjahr = ‘2017’
–and a.gjahr = ‘2019’
–order by a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
Technical remarks on the statement: It is just complicated enough to show selection and join operators which are the most important operators in PlanViz. The two tables from financials are filled with some data in most systems. Please note, the displayed results and sizes are from a test system and not meaningful. The statement is artificial, but it can be adapted easily to other systems. The filter on gjahr controls the amount of processed data, the other filters produce a 1:1 join. The comments are statement alternatives which will be used also.
The real importance of the PlanVisualizer will become obvious, when the statements become more complicated, for example by CDS views, which are frequently used in the S/4HANA applications. The analysis of CDS views called from ABAP will follow in a future blog.
- Using PlanViz – How to Create a Useful Plan
Creating a plan for a statement in the HANA Studio is rather straightforward. First, you execute the statement a few times to see whether the runtime becomes stable.
Fig 1.a: Executing the SQL statement to get behavior without PlanVisualizer.
The first execution is usually much slower as the execution plan must be determined. Subsequent executions are fast as the execution plan can be taken from the plan cache. This is visible in the HANA Studio result:
Fig 1.b: Three successive measurements without PlanVisualizer.
In the HANA Studio editor, you mark the statement and select from the context menu, Visualize Plan -> Execute. The statement is executed again and creates the visualized plan.
Fig 1.c: Measurements with PlanVisualizer.
The PlanVisualizer is the visualization of the Executed Plan including measured execution data, i.e. the actual number of processed records and actual execution times. Be aware that there is also a Prepared Plan, which shows the determined execution plan before execution only with estimated values, i.e. the estimated records and the estimated percentage of runtime. As the Prepared Plan is much more difficult to interpret and sometimes insufficient to understand performance problems, we do not use it here.
Fig 1.d: Visualized Executed Plan with measured values (rows, times) and Save As Function.
Plans can be stored for later usage with the ‘Save As’ Button creating a plv-file. Please note, that ‘Save As’ works only on the area of the executed plan and not on the overview plan or on the sub-tools area. With ‘File’ -> ‘Open File’ in the top left corner, stored plans can be uploaded and displayed again.
For ABAP-based CDS views the plan creation from ABAP SQL Trace (ST05) will become interesting. There are also more influencing factors, like session variables and host variables, which can cause differences between the executions without plan and with plan. These extensions will be discussed in a later blog.
This section shows the PlanViz plv-file can be created directly from HANA Studio. Display and analysis can only be done in the HANA Studio. Before jumping right into the analysis you should invest a few minutes to evaluate the consistency of the plan.
2. The Consistency Check – Beware of Useless Plans
Before you dive into the analysis, I would highly recommend you, to do a high-level check to confirm whether the plan fits the measurement without the plan. You should compare the runtime and the number of records in the result set. The execution without Plan Visualizer of the example in three different variants (count, key, all fields) gives the reference numbers:
Fig 2.a: Measurements without PlanVisualizer showing server processing time and fetched rows.
Where do you find the corresponding numbers in the plan? You might think the runtime is the time on the Executed tab or the execution time on the overview page. But this is the time including the overhead of the plan-taking process and can be much higher.
The runtime without overhead is the Inclusive Time of the top operator box in the Executed Plan. The number of result records is next to the arrow leading to that top operator. The figures show the top operator for the three example variants:
Fig 2.b: Top operation of the first example, count.
Fig 2.c: Top operator of the second example, reading the key columns.
Fig 2.d: Top operator of the third example, read all columns (note the high overhead in Executed).
The two numbers from executions without plan and with plan are collected here:
Fig 2.e: Collected numbers of the three examples.
You can see that the numbers of rows are always identical and the runtimes are similar (plus/minus 10%) for all examples. Even 20% variations would be acceptable, but if the execution time deviates by factors then it is hard to draw conclusions from the analysis. Analyzing a plan of a different task might overlook the actual problem. Analyzing a plan with very different times might lead to different bottlenecks and different interpretations. In case of deviations, you should check carefully whether the test executions were identical, invest a bit of time in statistics or measure at a later time when there are fewer users in the system. Please note, that this is the layout of Column – or Row Search. The recently added Hex engine (Hex Search operator) shows the time overhead also in the top operator, but not on the level below. The also new Esx Search shows the result rows not as the input of the Esx Search, but inside the Esx Search operator.
The starting point of every performance analysis should be a consistency check of the created plan with the original measurement. Here, you should confirm the relevance of the created plan file. Some guidance on what to compare is necessary.
3. The Plan Overview – Some Guidance (More to Come?)
As a first analysis step you might want to look at the overview tab, but unfortunately, it does not yet contain much useful information. The shown execution time contains the plan creation overhead, the compilation is unavoidable and only happens for PlanViz executions. The dominant operators are interesting, but only if they need a high percentage of the total time (10% or more). And please note, long-running operations are consumers but rarely the root cause of problems. The number of Tables Used links to another analysis tool which comes in the next section.
Fig 3.a: Overview on PlanVisualizer.
Most importantly you should be aware that there is not only the other tab ‘Executed Plan’ but also additional analysis tools, which can help in the analysis. You should see them in the lower part of the page. The tools are
- Operator List
- Tables Used
- Performance Trace
Fig 3.b: Other analysis tools of the PlanVisualizer.
If the other tools do not appear on your screen, then you should restore the perspective of your HANA Studio (Windows -> Perspective > Restore Perspective).
The overview page is an entry point for the analysis. It collects some information on the plan, but there is room for more. PlanViz also offers other analysis tools: Tables Used list, Operator list, and Timeline tool, which are introduced in the next sections.
4. The Tables Used List Shows What Comes In
The Tables Used list is the natural starting point for the evaluation as it shows what comes in from the different used database tables. In our example, it contains three entries, the two tables ACDOCA and BSEG and an intermediate result, which is less interesting and can be ignored.
Fig 4.a: Tables Used list for the example. See the two database tables ACDOCA and BSEG.
The good thing of all tools is the synchronization with the executed plan. But from the Tables Used list there is one step in between; if you double-click on a line in Tables Used, you will jump to the Operator List showing all operators connected to that table. From there you can jump to the executed plan.
Fig 4.b: Operator List for the example, see the two database tables.
Mark a line, for example, the Basic Predicate with the largest output, and double-click to jump to that operator in the Executed Plan. The next section shows how these selection operators can be interpreted.
This section introduces the Tables Used and Operator List and how the synchronization with the Executed Plan works. More on the interpretation of the Tables Used list comes in section 6. But before selection and join operators are explained.
5. The Selection Operators of the Join Engine and their Meaning
The next figure shows the synchronized Basic Predicate, highlighted by the red color. Inside the JEEvalPrecond are also the other selection operators. They are the starting point of the selection. This is also visible when you check the surrounding in the Executed Plan. There is no operator below and there is no input to the JEEvalPrecond or to this first Basic Predicate.
Fig 5.a: Selection from a table for the first join partner, only external filters can be used.
From that, you can learn how the filter conditions of the statement are evaluated in the column store and how the so-called semi-join reducer of the join engine (JE prefix!) is executed. The nine filters in ON-clause and WHERE-clause are
ON b.rclnt = a.mandt
And b.rbukrs = a.bukrs
and b.gjahr = a.gjahr
and b.belnr = a.belnr
where a.mandt = ‘715’
and b.rldnr = ‘0L’
and b.docln = ‘000001’
and a.buzei = ‘001’
and a.gjahr = ‘2018’
and the optimizer has decided to apply them in the following way:
- It starts with the table BSEG as first join partner (= JEEvalPrecond) and applies the three WHERE-filters for table a = BSEG, i.e. a.mandt=’715′, a.buzei=’001′ and a.gjahr=’2018′, which reduces the result from 875.864to 762.586 to 353.251 records.Note, that the Tables Used list shows the maximum of the outputs (875.864), but the final result of the selection (353.251) would be more interesting.
- The next step in the processing is a selection from the table ACDOCA, see the next figure. The join engine uses the so-called semi join reducer and applies filters on ACDOCA as a second join partner in JEStep2.Four filters can be applied directly, the two filters on table b (= ACDOCA), which are rldnr and docln, and client and year by exploiting two of the ON-conditions, i.e. b.rclnt=’715′ viab.rclnt= a.mandt, b.rldnr=’0L’, b.docln =’000001′ and b.gjahr=a.gjahr viab.gjahr=a.gjahr. This reduces the sizes from 1.834.600 to 1.629.114 to 704.344 to 360.344 records. The Tables Used list shows again the maximum, but more interesting would be the selection result. This is not the 360.344 rows.
- There are still two additional ON-conditions, b.rbukrs=a.bukrs and b.belnr=a.belnr, which must be fulfilled. They cannot be applied as direct filters but must be checked row by row on all 360.344 rows. This reduces the result to 353.251 rows for JEStep2, which is the actual selected result.
- Figure 4.e shows that the processing time of JEStep2 is 69,5ms. As the direct filters are simple and fast, they sum up to only 23ms, which leaves 46ms for the two ON-conditions to check the ON-conditions on360.344rows. Unfortunately, this expensive On-filter is not shown as a separate operator in the plan.
- Note, if the deltas (for Delta Merge) would be filled in the test system, then the selection might also find rows in the delta.
Fig 5.b: Selection from a table which is a(second) join partner. There are again direct filters, but additionally, you should note, that there is an input from below and that it must be checked whether the ON-conditions with these input rows are also fulfilled.
The semi-join is a selection in JEStep1 and JEStep2, which is also the join of the key fields. The other join steps build up the full result set of the join including all other columns. It is not necessary for you to understand all the details about the join engine in the Column Search as you cannot influence that processing. You can only influence the performance by the number of rows which are processed!
Fig 5.c: The steps of a semi-join in the join engine.
Please note, that the Tables Used list is only a simple selection from the operator list and has some shortcomings:
- It does not show repeated accesses to the same table. A table can appear only once in the Tables Used list (but CDS views have often repeated accesses to the same tables).
- The column ‘Max entries processed’ is the largest output of all operators on that table and not the final result of the selection.
- The ‘Number of Accesses’ is the sum of all operators on that table and not the number of physical accesses.
- The column ‘Max Processing Time’ shows the largest exclusive time of all operators on those tables.
There are changes and enhancements planned to transform the Tables Used list into a more useful Table Selection list.
In this section, the select operators of the column search are explained and how they are combined by the semi-join reducer of the join engine. The cost-based optimizer works, as usual, it starts with the table with the expected smallest result and applies the filters according to their expected selectivity (JEEvalPrecond). The semi-join is not so familiar from other databases. The external filters are applied first in JESTep2 and then remaining ON-conditions are checked on all the rows.
6. The Timeline – Get an Overview on Bottlenecks
The synchronization property makes the timeline to a very useful tool to find processing bottlenecks. The timeline shows colored bars for the times which are connected to plan operators (pop). You will often see that there are gaps between these bars where only the smaller grey lines are shown. Actually, these gaps build the mentioned overhead of the plan-taking process, which does not contribute to the statement processing time. Start with the timeline of the fully collapsed plan and open the operators carefully to avoid confusion with the sub-operators. Find the longest colored bars, which are the long-running operators:
- How many have substantial contributions, i.e. bars longer than 10% of the total time and more than 50ms? These are the bottlenecks of the processing.
- How many of these bottlenecks are sequential? Sequential contributions might have the same root cause and if you are lucky, they can be improved with the same fix.
- How many bottlenecks parallel to each other? Here you must be careful, do not confuse parallel operators with sub-operators. Parallel issues are usually independent of each other and must be improved all together to get a net effect.
Usually, there are only a few top bottlenecks, which must be evaluated. The example has two sequential long bars.
Fig 6.a: The Timeline tool showing the processed operators by bars.
The corresponding operation details can be found easily with the synchronization property:
Fig 6.b: With the synchronization between the tools, you will easily find, the two slowest steps of the example, the JEStep2 with 70,5ms and the JEStep4 with 54,7ms.
What can be done? There is no way that you can improve the processing of 353.251 records. But if there were less, then it would run, of course, much faster. You can try it if you switch in the example statement to a year which much fewer records.
This section introduces the useful Timeline tool of PlanViz. It helps you to find processing problems and to navigate in complex plans via the synchronization property.
7. The Plan – See All Details
It is also important to have a look at plans with a limit or an aggregation to see when these two operations are evaluated. It turns out that in most cases they are processed after the time-consuming joins. This cannot be changed by the end-user and has to be kept in mind.
The following figure shows the example with a LIMIT, first with ORDER BY and then without.
Fig7.a: High-level view of the plan with key fields. It can be seen, that the LIMIT is applied at the end of the processing. In Fig 7.a, it comes in the operation JEStep4 together with the ORDER BY.
Fig7.b: Here, the LIMIT comes even one step later. Most of the processing time (99%) is spent before the LIMIT. From that you should learn, that a LIMIT cannot be used to optimize the selection performance, only the time for the data transfer to ABAP or even to the frontend is of course reduced by limiting the result set size.
The behavior is similar in the case of aggregation, as can be seen in the next figure.
Fig7.c: High-level view on the plan of the example with the COUNT(*)-aggregation. The aggregation operation comes in JEPreAggregate, which is like the LIMIT very late, when after 99% of the processing time (99%).
The rather simple example statements have plans, which contain not much more than the selection and the join. It is important to that LIMITs and aggregations are usually processed after the joins and selections. The last figure shows that the LIMIT is applied as one of the last steps in the processing, i.e. after most of the time is spent
This section has shown the plans often do not contain more than selections and joins. These operators are the main time consumers in CDS processing. LIMITs and aggregations do usually not improve the performance as they come later.