Sunday, 6 December 2015

Issues with Order By, Joins and Previous Function

Key Point 1: If you apply a Join to a table you have previously Sorted using ORDER BY, the table no longer retains the order by, even if for example it's a left join which doesn't add any extra rows, just an extra field at the end.

It makes sense in hindsight, but did cause me confusion and a lot of time to resolve.

Key Point 2: Previous function looks at the previous record that was passed to the Load statement from the source - even if the previous function failed a Where clause and was not processed.



For one of my applications we have customer forecasts which look something like this:

So Customer, Date and Quantity.  In the Qlikview application the forecast needs to be spread across the days between each forecast.  So the 1st Nov for Customer abc should be spread evenly across the 2nd, 3rd, 4th, 5th and 6th November (as 1st and 7th) are weekends.

I've had a QVW script which does this by first sorting the table by Customer and Forecast Date descending (last forecast to first).

Then I have a for loop which runs 0 to 30.  For each record it checks if the Forecast Date + Counter is less than the Previous(Forecast Date).  If it is a "Forecast Spread Date" is set to be Forecast Date + Counter, otherwise Null.  The Nulls are later deleted.  There is special rule for first (ie. last forecast date) record per customer which spreads over 7 days.

So for customer abc, the first date to process is 6/12/2015.  This gets spread over a week.  2nd record is 29/11/2015.  It checks that 29/11/2015 + 0 = 29/11/2015 < 6/12/2015 so keeps that date.  29/11/2015 + 1 also till 29/11/2015 + 7 is not < 6/12/2015 so doesn't keep that date.

This all worked fine, although probably a better way to do it.

Recently a user asked for something extra.  They wanted to count how many forecasts exist in the month for the customer, and use that in some formulas.

To solve between the Order By load and the for loop, I added a [Forecast Month] field to the table and an extra step:

FC_In_Month:
LOAD
  Customer,
  Forecast_Month
  Count(Distinct Month(Forecast Date)) as FC_In_Month
RESIDENT Forecast
GROUP BY Customer, Forecast Month;

LEFT JOIN (Forecast)
  Customer,
  Forecast_Month
  FC_In_Month
RESIDENT FC_In_Month;

This adds an extra field (no new rows though) FC_In_Month.

But after doing this the for loop and date logic didn't work.  I eventually worked out that the Left Join back to the Forecast table had corrupted the sort order that was put in.  To resolve I just resorted again after the join and all okay.  But something I didn't think of at the time when added the extra step.


The other problem came about because there is actually an extra bit of info in the table.

Forecasts can be Firm or NonFirm and can have same date.  The For Loop described above is only interested in NonFirm records - the Firm records are handled elsewhere.

The Load statement that runs within the Loop has a clause Where Type = "NonFirm".

As mentioned above the test is that the Forecast Date + Counter < Previous(Forecast Date).  Problem I had was with the 15th.  Working backwards by date, the first record it came across was the Firm.  The Where clause excluded this record.  It then moved to the Non-Firm record.  The Previous(Forecast Date) was the 15th - the excluded Firm record and so the test failed for the NonFirm record as well as 15/11/2015 + 0 is not < 15/11/2015.

Solution was to create a separate pre-filtered table so that what I fed into the Load statement in the For Loop only has NonFirm.  This worked, but it's not a great solution.

Both of these make sense when I think about them - even obvious.  But it was hard to see why it went wrong and hard to find.  The Previous issue had been in the code and although not common in the data was actually presenting incorrect data to end users.  Was only when testing my before and after that I noticed.


FINAL NOTE: "order by is a clause used for sorting the records of a resident table before they are processed by the load statement".  Never chased down that rule that will the input table to the load statement before processing.  Previously I've often sorted the table earlier or created an interim table_SORTED version.  Too lazy to check, but it is okay to rely on it in the load statement (where clause notwithstanding).