What exactly is cube optimization?

With TM1 you'll have more time for analysis and other valuable activities without having to give up Excel. Cubewise will show you how.

There are already many excellent sources available on the web explaining how to re-order dimensions or “optimize” a TM1 cube. Basically, this method can be summarized ordering dimensions “small to large, then sparse to dense.” However, I’m yet to find any reference explaining what is being optimized. This paper is aimed at addressing exactly this question: what are we optimizing and why?

The “cube optimizer” available in the traditional Architect client under the “re-order dimensions” context menu allows the background order or index order of dimensions in a cube to be rearranged. This does not affect the original build order (or presentation order) of the dimensions. This is important as it means from both a technical perspective (APIs, DB, DBRW, CellGet, CellPut) and a UI perspective, nothing is impacted by the dimension reordering. However, performance can be greatly affected, hence the “optimization” tag.

What are we optimizing?

In the cube optimizer dialog in addition to the original order, current order and new order of dimensions we see 2 key pieces of information: i) memory used by the cube, ii) percent change in memory (new vs current order). A negative percent change indicates a reduction in memory consumption. Traditionally this memory reduction is what we have focused on. That is, a properly optimized cube consumes the lowest possible amount of RAM.

But there’s more to the optimization story. In TM1 there are multiple (and potentially different) “optimal dimension orders” for a cube:

1.   optimal order for storage on disk
2.   optimal order for size in memory
3.   optimal order for view retrieval
4.   optimal order for write-back

By focusing only on cube size in memory we are only looking at one of these 4 factors. Are we even optimizing the correct thing?

There is no one best dimension order!

Which re-ordering you choose depends on which factor it is most important to optimize on for the individual application, or individual cube. Disk space is a commodity and can be discounted, so let’s just focus on the last 3 factors. Back in the “old days” the most important, or maybe the only consideration for 32-bit TM1, was optimizing for size in memory. However, today we don’t have a 2GB limitation on database size and memory has become much, much cheaper. Consequently, models have become significantly larger.

Unlike the early days of 64-bit technology you no longer need specialized (and expensive) hardware to run a TM1 application, but rather quite standard garden variety Windows or Linux servers. That is, unless your application is particularly large (in the several hundred GB to TB range). So unless your model falls in the “very large” category (in which case yes optimizing cube size in memory will still be a significant factor), then optimizing for memory consumption is actually far from the most important factor. In fact, I would argue that performance optimization (i.e. minimizing view retrieval and write-back time) is much more important.

Optimizing for other factors

If cube size isn’t necessarily what you should be optimizing how do you optimize for view retrieval (query performance) or write-back performance?

For the following examples let’s take a hypothetical Retail Sales cube which has the following dimensions:

Version (3 Leaves; Actual, Budget, Forecast)
Month (12 Leaves)
Store (50 Leaves)
Product (1000 Leaves)
Measure (2 Leaves; Qty, Value)

Let’s now look at optimizing for view retrieval and write-back. The scenario presented will be a vast over-simplification, and with all simplifications not 100% accurate. (For those who are interested we will have a later post for a much more technical deep dive and explain things in more detail.)

View retrieval optimization

To optimize for query performance (view retrieval) you need to work out which dimensions are most likely used in titles of views or in the WHERE clause of an MDX statement (e.g. Version). For best retrieval, such dimensions should be first or close to the top of dimension order. Especially if they are leaf elements.

For example, in our Version dimension with Actual, Budget & Forecast elements, if the title has Version=Actual then number of cells to query is immediately reduced to a maximum of 33% of the total cube volume (assuming Actual, Budget & Forecast all have an equal number of populated leaf cells).

If Month is ordered first and the title has Month=January the number of cells to query is no more than 8.33% of the cube.

If the Product dimension is first with an individual leaf product selected, then the number of cells to query is 0.1%of the cube. (Of course in the real-world Product is a much less likely candidate for frequent title dimension than either Version or Month but you get the point).

To keep it simple on why this ordering matters for query performance, less cells to traverse and summate, equals less CPU cycles, equals faster query time.

Write-back optimization

The same principal of minimizing the number of cells or “cube volume” also applies to write-back optimization. Here let’s consider updating a large number of individual cells in a single transaction (e.g. via a TurboIntegrator load process). Typically inputs from source systems which we load into TM1 will be partitioned along one or more dimensions, like version, month, store or product in our hypothetical example.

As TurboIntegrator processes the input file a collection of changed cells is built which need to be merged back into the cube’s base model when changes are committed. Say that all data is for Version=Actual and we receive a separate file for each store. If Version is indexed 1st then each file will need to commit a change to 33% of the cube’s volume. However, if Store were to be indexed 1st then the size of each commit would be only 1/50th or 2% of the total cube size making the commit volume for each transaction much smaller and therefore quicker.

So the golden rule for write-back optimization is match the partitioning dimension to the dimension indexed 1stin the cube(If this seems obtuse and you want to understand more then stay tuned for the follow up technical deep dive post).

Setting the original dimension order

The 2 examples above make it clear that we can have a different optimal dimension ordering for the same cube for view retrieval vs. write-back, … and let’s not forget that both these orders may be different to the optimal order to minimize memory consumption. Take-out: “optimal” dimension order is situational and depends on what we are optimizing for.

Faced with this fact the only logical conclusion is that the original build order or “presentation order” should be consistent across all cubes to aid familiarity and ease of use for users. At Cubewise we use this schema:

Time (single or multiple dimensions)
Business dimensions

The schema you use may be different, but it should be universal and consistent within a TM1 model as this makes navigating and querying cubes easier for end users and developers alike. Don’t forget TM1 is a user-centric application and the design philosophy should reflect this and place the needs of users first.

A final word

So now you know what you are optimizing when you re-order dimensions! The next time that you optimize (for memory consumption) don’t forget to think about the potential impact on view retrieval and write-back and even better, test for it. To circle back to the point of placing users first, the #1 concern of every user is performance and this should also be our primary concern when designing and maintaining any system!