Tuesday 19 April 2011

Why do our overnight batches sometimes fail - Or what the f*** is desktop heap

We run Qlikview server on a Windows 64-bit server with 48GB of memory and 4 processors.  While I'd like something bigger, this is a reasonable size.  We also run a lot of overnight batches to reload our reports, often having 5 or 6 separate qv.exe running at the same time - but our server should cope.  By batch I mean .bat files that call qv.exe /r ..... etc.  The batches are called by scheduled task or from our SAP BW system via an RFC.

And we were having problems.  Quite often we'd check in the morning (I even got a spare laptop to check in early) and find that some had failed - not a script issue, there were just a number of qv.exe processes sitting there.  Most with little or no memory or CPU time consumed - it looked like the process had tried to start but failed to kick off.  We'd go through and kill the processes with taskmanager, or procexp.exe*
When we re-triggered the load, worked fine so wasn't a script issue.

And we could test run similiar volumes of batch reloads during the day without killing the system.  We struggled to find the cause - a lot of our processes are triggered from SAP BW, so I even tried removing the direct link from the BW calls.  All to no avail.

What I eventually stumbled across was a problem with the desktop heap on the server.  I'd never heard of it, but it's a memory area that holds info on each user on the server's desktop.  The size is quite small, by default about 3KB but that is usually enough..  However this can impose a limit on the number of concurrent processes that an individual user can activate.  And our user was calling too many qv.exe processes at once.

The actual fix is to expand the heap size so that have room for more qv.exe's to run at concurrently. Note: this is a windows registry setting that controls this, nothing to do with Qlikview itself.  The registry setting is at
 HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows


and suggested setting is to change the shared section to 1024,20768,2048 on 64-bit.


Windows registry isn't my strong point, so for more details I'd suggest these links:
In QV forums the only item I found.
http://community.qlikview.com/forums/t/36319.aspx
Unfortunately I only found this after putting Desktop heap into the search.  But I didn't know that term to start with.
For more details on heap itself:
http://blogs.msdn.com/b/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx
was really useful - even if I still don't fully understand it.

After making this change, our nightly batches started working again.  We also installed the heap monitoring tool and saw that we were regularly hitting 80% after we doubled size, so explained why we used to have failures.

* ProcExp.exe is a free tool we downloaded to try and help with this.  Although didn't directly help with this problem, it is a very useful tool - it's what task manager should be.  I'd recommend installing it.

Monday 11 April 2011

TOTAL and AGGR in chart expression

I've created a customer / product profitiability Qlikview report.  Most of the charts are built around fairly simple queries, such as list by Customer / Product and Month and show Sum(Sales), Sum(Margin) etc.

It also includes a more complex Analysis chart (straight table).  The chart dimensions include lists Customers, Products, Dates and Order numbers - so at the transaction level.  Some of the expressions are looking at First Price and Last Price and comparing each transaction's price to that.

So one of the Expressions has to calculate the First Price for a Customer / Product and show it against all the related transactions.  Because the date range is dynamic, can't calculate this in the script.

Instead have to do it through an expression, which include uses TOTAL qualifier and AGGR expressions.  After much trial and error got a working version, but took me even longer to understand out why it works.

In this example the fields I'm using are:
- CustProd.  Concatentation of the Customer and Product (actually use an Autonumber to convert this concatenation to integer so that is more efficient to calculate)
- TransDate.  Will ignore Order Number, and assume only one CustProd per date for convenience.
- Price.  Precalculated in script at transaction level. 

Expression is:
=Sum(TOTAL <CustProd>
           Aggr(
                    If(TransDate = Min(TOTAL <CustProd> TransDate),Price),
                    CustProd,TransDate)
          )

How does this work?

Some notes on what TOTAL and AGGR do first - as I understand them.
TOTAL means the expression (Sum, Min) etc. is applied across all the records available to the chart, rather than just those matching the dimensions on this row.  If you include a field in the <...>, then it will create a subtotal for each unique combination of those fields.

AGGR on the other hand creates a mini-chart based on the expr and dimension.  The mini-chart will have one expression (the one defined within the Aggr function) and one or more dimensions based on the fields listed.

Example data may be
CustProd    TransDate     Price
123             1/1/11          100
123             2/1/11          110
123             3/1/11          110

Working from the inside,
Min(TOTAL <CustProd> TransDate).  Fairly easy.  This generates a list of CustProd records and the lowest TransDate for each.  If this was an expression it would return the first transdate for each Customer / Product.  Useful, but we want the price, not the date.

Next step, is to get the Price for the first date.
If(TransDate = Min(TOTAL <CustProd> TransDate),Price)
This will return the Price is the TransDate is the first date for this CustProd record.  Problem is that in the chart, we list all dates, so this only shows the Price on the first record.  We want to see the first price on all records.

To do this, need to apply the first Sum(TOTAL <CustProd> so that it all records for a CustProd can access the first price.  But can't do this without putting in an AGGR function b/w the Sum and the Min.

The AGGR(If(TransDate = Min(TOTAL <CustProd> TransDate),Price),CustProd,TransDate) creates a mini-chart that looks like this:
CustProd    TransDate     Price
123             1/1/11          100

Only the first date is returned as all other dates fail the If(TransDate = Min...) test and so return Null (which is ignored).
What's important is that every record for this CustProd will create the same chart - if it has access to all records of that CustProd - which it does as the first Sum function has TOTAL <CustProd> in it.

The final Sum(TOTAL <CustProd> .... ) simply sums the Price from that MiniChart and works out the first price is always 100.