Tuesday 19 April 2011

Why do our overnight batches sometimes fail - Or what the f*** is desktop heap

We run Qlikview server on a Windows 64-bit server with 48GB of memory and 4 processors.  While I'd like something bigger, this is a reasonable size.  We also run a lot of overnight batches to reload our reports, often having 5 or 6 separate qv.exe running at the same time - but our server should cope.  By batch I mean .bat files that call qv.exe /r ..... etc.  The batches are called by scheduled task or from our SAP BW system via an RFC.

And we were having problems.  Quite often we'd check in the morning (I even got a spare laptop to check in early) and find that some had failed - not a script issue, there were just a number of qv.exe processes sitting there.  Most with little or no memory or CPU time consumed - it looked like the process had tried to start but failed to kick off.  We'd go through and kill the processes with taskmanager, or procexp.exe*
When we re-triggered the load, worked fine so wasn't a script issue.

And we could test run similiar volumes of batch reloads during the day without killing the system.  We struggled to find the cause - a lot of our processes are triggered from SAP BW, so I even tried removing the direct link from the BW calls.  All to no avail.

What I eventually stumbled across was a problem with the desktop heap on the server.  I'd never heard of it, but it's a memory area that holds info on each user on the server's desktop.  The size is quite small, by default about 3KB but that is usually enough..  However this can impose a limit on the number of concurrent processes that an individual user can activate.  And our user was calling too many qv.exe processes at once.

The actual fix is to expand the heap size so that have room for more qv.exe's to run at concurrently. Note: this is a windows registry setting that controls this, nothing to do with Qlikview itself.  The registry setting is at
 HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows


and suggested setting is to change the shared section to 1024,20768,2048 on 64-bit.


Windows registry isn't my strong point, so for more details I'd suggest these links:
In QV forums the only item I found.
http://community.qlikview.com/forums/t/36319.aspx
Unfortunately I only found this after putting Desktop heap into the search.  But I didn't know that term to start with.
For more details on heap itself:
http://blogs.msdn.com/b/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx
was really useful - even if I still don't fully understand it.

After making this change, our nightly batches started working again.  We also installed the heap monitoring tool and saw that we were regularly hitting 80% after we doubled size, so explained why we used to have failures.

* ProcExp.exe is a free tool we downloaded to try and help with this.  Although didn't directly help with this problem, it is a very useful tool - it's what task manager should be.  I'd recommend installing it.

No comments:

Post a Comment