Final and Reserve price data: supply correctly-formed data files, plus data omitted from the revised data files.

Nicky McLean made this Official Information request to Electricity Authority

The request was partially successful.

From: Nicky McLean

Dear Electricity Authority,

Via its website https://www.emi.ea.govt.nz/Wholesale/Dat..., each month the Electricity authority publishes collections of data files containing half-hourly data on various aspects of the workings of the electricity system, and in particular, data on “Final prices” and “Final Reserve prices” – though with variations on the file names from time to time. Those data files (and others) had been published via a DVD issued twice a year, with also an associated website offering monthly updates. The last files provided in this way were called fp2013m09.csv and rp2013m09.csv.
The new arrangement involved ftp://ftp.emi.ea.govt.nz (which has since been abandoned) and initially, data for October 2013 and onwards were provided, though under different folder names and with varying file names that might be replaced later, such as file 201404_Final_pricing.csv vanishing in favour of 201404_Final_prices.csv as just one example. Automated systems neither create nor follow such flounderings, but no matter.
The key development was the later re-supply of all the earlier data files in this new scheme, all the way back to October 1996 for the final prices (199610_Final_prices.csv), yet oddly, only to November 1996 for the reserve prices (199611_Reserve_prices.csv) – this is shown via your website https://www.emi.ea.govt.nz/Wholesale/Dat... with its two sub-directories Final_Prices and Reserve_prices, and is not to be confused with your website https://www.emi.ea.govt.nz/Wholesale/Dat... that contains other matter of no interest here.
Thus arises the first request: for the revised data file for the reserve prices of October 1996. This file is omitted from your offering, when previously it had been available as file rp1996m10.csv. Why has file 199610_Reserve_prices.csv and the data it contains been suppressed?
And, just out of curiosity, why are there no reserve price data files for April 1997 to March 2004? As with Pokemon characters, “Gotta catch them all!”

There was a mass resupply of these data files in October 2018 that repaired a number of basic errors, such as truncated files - e.g. file 199912_Final_prices.csv was of length 2,216 characters (not 10,292,760), being truncated in line 80; file 200312_Final_prices.csv was of length 9,928,277 (not 10,133,442) as it omitted many time slots, e.g. half-hour 1 on 1/12/2003 for all names; and others, but further details of now-replaced files would be supererogatory.
When a collection of data files is replaced by a new set of data files, whether with different file names or not, an immediate question arises: are there changes to the data offered by these revised files? Indeed there are.
First, the positive. Amongst the millions of matching values, two new ones appear:
24/3/2004, hh 7: 7.03 BDE0111
25/3/2004, hh15:53.14 BDE0111
In the original data files, there was no mention of a value for that name at those times. Demonstrating lacunae lead to problems in epistemology because any given collection of items also omits an arbitrary number of other items (for instance, a schedule of winning lotto numbers identified by draw for the year to come), thus a putative list of omitted items would be infinitely large, further, any such list must be incomplete via Georg Cantor’s “diagonal argument” (1891) that shows how to construct a new item that is not in the list. Should that be added to the list, then another new item can be generated in the same way, and, this process can proceed indefinitely.
Happily, in this context we need not consider transfinite collections and can deal with rather less elevated concepts. Going by the date, the expectation would be that somewhere in the original file fp2004m03.csv would appear data for name BDE0111 corresponding to half-hour 7 on 24/3/2004 and half-hour 15 on 25/3/2004 – as well as data for other times and names. It turns out that data are scattered around in irregular clumps, but in records 270,820-270,821 there appear
BDE0111,24/03/2004,8,F,2.58,25/03/2004 10:56:52
BDE0111,24/03/2004,9,F,2.58,25/03/2004 10:56:52
and in records 270,934-270,939
BDE0111,24/03/2004,1,F,13.03,25/03/2004 10:56:52
BDE0111,24/03/2004,2,F,15.06,25/03/2004 10:56:52
BDE0111,24/03/2004,3,F,10.76,25/03/2004 10:56:52
BDE0111,24/03/2004,4,F,10.38,25/03/2004 10:56:52
BDE0111,24/03/2004,5,F,7.25,25/03/2004 10:56:52
BDE0111,24/03/2004,6,F,6.88,25/03/2004 10:56:52
so there is no entry for half-hour 7, while record 282,405 starts a sequence with
BDE0111,25/03/2004,16,F,87.26,26/03/2004 16:30:53
and record 282,515 ends another with
BDE0111,25/03/2004,14,F,29.58,26/03/2004 16:30:53
thus omitting any value for half-hour 15. Presumably your organisation has not discarded the data files of the DVD collections, so you can verify this situation yourselves, should you wish.
Conversely, the revised file 200403_Final_prices.csv offers
record 265,017: 2004-03-24,7,BDE0111,7.03
record 278,401: 2004-03-25,15,BDE0111,53.14

Evidently, those two values had been omitted from the original file, yet they appear in the (re-)revised file.This is a puzzle, as one imagines that some data storage system has been commanded something like “dump all final price data for March 2004” rather than “dump all final price data for March 2004 except for BDE0111 on hh7 24/3/2004 and hh15 on 25/3/2004”. How can those additional values appear, a decade after their time? And why did they not the first time?
Also troubling, and a much larger problem, is the change in data format evident between the content of the two data files. The original layout of those data is in the form of
Name,Date,Half-hour number, F-code, Datum, Timestamp.
This last interpretation was eventually confirmed by the appearance of data files in 2004 onwards with a heading line that stated “PRICE_RUN_TIME” (and in later files, perhaps “Price_Run_Time”, and in the Reserve Price data files “Run_Time” for a similar appendage), so taking these data at face value, price values are calculated well after the time to which they apply.
Yet your revised data files omit all mention of such timestamps, thus the second request: supply these Final and Reserve price data with the timestamps included, as before, not suppressed.
A question also arises as to the provenance of the data in the resupplied files, as far example what “price run time” ought to be assigned to the two newly-appearing values mentioned above. The original data came from the NZ Stock Exchange and presumably derived directly from the values that were calculated at the time and were used to drive the money exchanges between the various businesses involved. A contemporaneous record, then. Where have the data in the revised files come from? The same original records of the past transactions? If so, how does new data appear?
A similar issue is raised by the Reserve Prices data files. Instead of presenting data for hundreds of different names, just two names are offered. Unfortunately, they were BEN2201 and HAY2201, these being exactly the names also appearing in the Final Prices data files for different data so one must keep track of the source file name to keep them distinct, and variations in those file names were not helpful. Anyway, each name had two values supplied for each time slot plus a timestamp appendage (usually) that vanished after September 2013 when a new format was introduced, using the names SI and NI, until the data for October 2015 when an additional two data fields appeared so that there were four for each slot, not two. Then, all the files were replaced (bearing a date of October 2018), all the way back to that for November 1996 (still not October 1996 as with the Final Prices data files) and the additional two data fields now were found back to part-way through the 21’st of July 2009.
Naturally, additional data series are additional grist for the mill, but if it is worth the effort to resupply these files with the additional data back from October 2015 to July 2009, why not do so all the way back to the start, since the earlier files are being resupplied. Alas again, with the timestamps that were originally supplied suppressed.
Oh, and data for the 31’st October 2010 are omitted from the Reserve Price data, yet are present for the Final Price data. Has someone forgotten “Thirty days hath September...” ?
But there are more serious problems with the revised Reserve Price data files. Many are corrupt. Thanks to the daylight savings changeover days, there are two days a year that do not have twenty-four hours: one has twenty-three and the other has twenty-five. Correspondingly then, there are two days that do not have forty-eight half-hourly values: one has forty-six values and the other has fifty values. Any plan based around a constant twenty-four hours in a day will be disrupted, and dealing with this constitutes a problem whose difficulty frequently surpasses the competence of data suppliers. So it is here.
To take an example, consider the year 2010. The daylight saving rule for that year places the stretch day on Sunday the 4’th of April and the shrink day on Sunday the 26’th of September. Now consider file 201304_Reserve_prices.csv – it is easy enough to see that the half-hour numbers run from 1 to 48 only, not to 50, while in the original version of that file (i.e. prior to it being re-supplied with different names and omitting the timestamps) the half-hours run from 1 to 50 as is proper.
Similarly, consider file 201009_Reserve_prices.csv for the shrink day. The half-hour numbers run from 1 to 46, as is proper, but as well, half-hours 5 and 6 are omitted, which is not proper. Thus, that day offers data for twenty-two hours – it has been shrunk twice. And again, the originally-supplied version (that contains timestamps) also has the half-hour numbers running from 1 to 46, but with no omissions.
This miss-numbering means that data are misaligned. For example, some of the original data: (Alas, this webpage interface does not allow the specification of a fixed-spacing fount such as Courier, and multiple spaces are converted to single spaces. Thus, underline characters have been inserted to reduce the damage to the layout)
26/ 9/2010 0·01 0·01 0·01 0·01 A VOID! 0·01 0·00 0·01 0·01 0·00 0·00
Sunday__ 0·00 0·01 0·01 0·01 0·01 0·09 0·40 0·47 0·49 0·45 0·40 0·02
-2 h.hrs!!_ 0·02 0·02 0·02 0·02 0·01 0·01 0·01 0·02 0·01 0·01 0·01 0·01
________ 0·02 0·01 0·29 0·33 0·50 0·91 0·41 0·42 0·01 0·01 0·01 0·01
compared to the revised data:
26/ 9/2010 0·01 0·01 0·01 0·01 A VOID! __?_ __?_ 0·01 0·00 0·01 0·01
Sunday__ 0·00 0·00 0·00 0·01 0·01 0·01 0·01 0·09 0·40 0·47 0·49 0·45
-2 h.hrs!!_ 0·40 0·02 0·02 0·02 0·02 0·02 0·01 0·01 0·01 0·02 0·01 0·01
________ 0·01 0·01 0·02 0·01 0·29 0·33 0·50 0·91 0·41 0·42 0·01 0·01
Here, twelve values are shown to a line, so the standard forty-eight values require four lines. For the shrink day, to maintain alignment the two slots for the non-existent times are filled with “A VOID” (quite so) and here the ? where a number should appear signifies that no datum has been supplied so there is no number to show so one isn’t. As distinct from showing a zero.
For the stretch day there are fifty values to be shown, and alignment is maintained by a fifth line that is suitably offset to show the overlap:
4/ 4/2010 7·00 5·00 4·00 4·00 4·00 30·00 A JOLT!
Back 2!__ __ __ __ __ __ __ 16·19 7·00 4·00 5·00 5·00 5·00 4·00 3·00
Sunday__ 3·00 3·50 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02
+2 h.hrs!!_ 0·02 0·02 0·02 0·03 0·03 0·03 3·00 0·03 0·02 0·02 0·01 0·01
________ 0·01 0·01 0·01 0·01 0·01 0·01 0·01 0·02 0·02 0·02 3·50 4·00
Whereas the revised data file offers
4/ 4/2010 7·00 5·00 4·00 4·00 30·00 7·00 A JOLT!
Back 2!__ __ __ __ __ __ __ 4·00 5·00 5·00 5·00 4·00 3·00 3·00 3·50
Sunday__ 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02 0·02
+2 h.hrs!!_ 0·02 0·03 0·03 0·03 3·00 0·03 0·02 0·02 0·01 0·01 0·01 0·01
________ 0·01 0·01 0·01 0·01 0·01 0·02 0·02 0·02 3·50 4·00 __?_ __?
So again, values have been incorrectly placed and two omitted.
A list of the corrupt files would further expand this communication. Checking the data for the daylight savings changeover days can be done easily enough via a command something like
for # in ~Reserve~ do list # when HoursInDay(day) ¬= 24
The specific errors in the Reserve Price data can also be found, with something like
for # in ~.Reserve~ do dump # when IsBad(#)
provided you have access to a suitable system. I emphasise that the omitted data are omitted from the data files that you have re-supplied, and are not necessarily absent from whatever data storage system had been accessed to produce those files: searching that will not find omissions from them. Indeed, because the originally-supplied data files do contain the values that have been omitted from the re-supplied files, the data storage system presumably still contains them and so could supply them afresh given a suitable set of commands to do so. Considering that your organisation has emitted regulations requiring that data suppliers are to supply data that are correctly arranged, it should be so commanded.

As for the Final Prices data files (which are much largeer than the Reserve Prices files), they also are corrupt in parts, even if good in other parts, although omitting the timestamps.
Since these files contain historical data, how can it be that the re-supplied files contain data series not previously supplied? A specific example is the series named HWA1001 which offers data for 21/7/2009 to 19/8/2009, which data are equal to those supplied for both HWA1101 and HWA1102 over that date span. At a guess, HWA1001 is a mistype for HWA1101, but, how can it have acquired data? If data were missing for HWA1101 for that date span, a mistype could be supposed in the incoming data, but it would be better if it wasn’t.
Similarly, the files mention some twenty-nine names with two letter name codes, such as AN2201 and TR0331. It turns out that somehow, the first letter, a M, had been omitted: those names should be MAN2201 and MTR0331 respectively as all names have a three-letter code at their start. They have data for some five months, and the data for the corresponding three-letter names have a hole five months wide. Although the system I use has long had provision for collating deviant names into one so that for this case it slots those twenty-nine five-month pieces into their corresponding normal series, it would have been nice if the revised files had corrected the problem. Persons working only from the supplied data files will be stuck.
In general, there are three possibilities. First, the deviant name is associated with data that slots neatly into a hole and clearly belongs with its neighbouring data. Secondly, the deviant name’s data does not fill a hole but instead exactly matches the other data, or, the differences seem small enough to sweep under the rug of “rounding error” or the like. But thirdly, they are different, even if similar.
On the other hand, the revised files do suppress an annoyance in the originally-supplied files. From March to December 2003 there appeared some 944,484 records with a code “V” each of which was followed by another record (for the same name, date, and half-hour) with a code “F” but a different value. For instance,
record 11187: ABY0111,2/3/2003,1,V,57.64,3/3/2003 7:46:09 AM
record 22371: ABY0111,2/3/2003,1,F,57.93,4/3/2003 9:45:18 AM
Happily, all such revisions bear a later timestamp: the revisions in the data file are chronologically ordered by the “price run time”, and this applied even when an additional F-coded record appeared, though this is rare: nine occasions have been noted. The V and F data turn out to be similar, with daily correlation coefficients varying between 0·56 and 0·99976 or so.
In the revised data file, record 11186 has 2003-03-02,1,ABY0111,57.93 which is at least the same as the F value. These revised files do not supply a timestamp, nor a code letter (either F or V) and the stuttering does not occur. But even aside from the absence of the mysterious V values, this is not all to the good.
When confronted by a large collection of numbers one could just gloat over the existence of data files and move on to other matters, or, make some attempt to characterise the data by collecting various statistics such as minimum, median, average, maximum, standard deviation, skewness, kurtosis, etc. as an initial analysis. Although an “average price” is rather dubious, an average is still part of a description of a distribution. But for these data, any descriptive parameters are deformed by the frequent presence of bizarre values such as 100000, -100000, and -9999 when the more ordinary value is around 60, as in the above examples. Alas, because these prices are determined well after the event (as shown by the Price_Run_Time data) the obvious ploy of switching on load when the price is negative will be precluded, nor will generators be discouraged thereby.
Looking at those usages, it appears that they indicate situations where a “final price” could not be concocted by the process that produces them, for instance before a location is in use, or after it ceases being used, and so they stand for “no value here” instead of being absent. Similarly, further inspection shows that the Final Price values range not just over reasonable positive values, but also into similarly-sized negative values, and so it is possible that zero is a proper Final Price value: I would like to buy gold on the spot market at that price... Yet there can also be long sequences of zero values at the start and end of a series, also presumably indicating “no value here” instead of being absent from their data file. This is particularly vexing as there is no discernible difference in value between a proper zero value and a “no value” zero value. Attempts at assessing the distribution of values are wrecked. As a further example, the data series for Te Kaha has many zero values in the interior of its date span: which are zero and which are a different type of zero? Analysis might show that they tend to occur around the time of high tide, say: if so, plans could be laid...
Knowing what these code value signify would be helpful, further, are there any other code values lurking in the data? Perhaps with varying usage over time, as with -100000 and 0.
Using numbers as code values as well as numbers is not a good idea, especially when their types are not easily distinguished. Suppose the convention of representing Male or Female as 1 or 0 is followed, and then 0 was also used to represent any of Don’t know/Won’t say/Not recorded/De-sexed/Trans-sexed/Undecided/Hermaphrodite, etc. The resulting data would be next to useless.
A more systematic approach would be to employ an auxiliary code, as in the F and V usage above, so that the different types of value (or, non-value) could be identified, a scheme more flexible than not supplying any value at all, or supplying a “null” value or a ? or even NaN (“Not-a-Number” as has become a modern fashion) for all odd situations. This is not a new notion, and is in frequent use in the other data files your organisation publishes, for example in name sequences such as
TKH0111,HEDL,GN,TPNZ,KWh,X,I
TKH0111,HEDL,GN,TPNZ,KWh,X,F
Which both provide data for the same series. Perhaps F stands for Final, or Firm, or Fixed, while I stands for Interim, or Interpolated, or Implicit – it would be good to know. Perhaps they would stand for different data series, as with the V and F codes. Whichever, there would be clarity. And for the zero values in the Final Prices, a F code would signify a proper Final Price, while a Z code (say) would signify a place-holder signifying “no value here” and it could be treated accordingly, and statistics on the distribution of actual Final Price values would not be deformed.
Note that for the two name sequences, the F (or I) code adjoins the rest of the name sequence rather than being placed somewhere apart. Thus it would be a convenience if the Final Price data (and the Reserve Price data) were to have their code letters immediately after their name sequence as well, as in
ABY0111,V,2/3/2003,1,57.64,3/3/2003 7:46:09 AM
Which is to say: Name&Code, Date, Half-Hour Number, Data, Timestamp.
The Final Price data files have a single datum, while the Reserve Price data files have two, and later, four data per line, aside from the other fields.
And it helps to have slashes in dates rather than hyphens, because hyphens are also used to signify negative numbers and so mixups are easier. Entering numerical data as rational values (such as 1/3 for one third) is not at all common, so if slashes (especially two of them) are found in an undescribed data field, then it almost certainly contains a date. Being able to recognise dates is very helpful when data files may arrive with their columns jumbled, as has happened often.

So, in short, I’m arguing for the the original data format as had been originally supplied, with dates in the dd/mm/yyyy form, timestamps on the end, and to include the F code (and other codes where appropriate) but that shifted to follow the name field as a part of a compound name.

--------------------------------------------In Summary---------------------------------------------------
The Electricity authority operates a website https://www.emi.ea.govt.nz, that offers access to these data files, and it has the statement “A fundamental requirement of competitive and efficient electricity markets is access to reliable data and performance metrics.” - in bold bluish text.
This brings to mind Simon Hoggart’s Law of the Ridiculous Reverse “If the opposite of a statement is plainly absurd, it was not worth making in the first place.”
The above has shown that it is a statement worth making. To attain success would be a matter of basic competence in data administration and information technology, manifested in a resupply of these data files in a form that is complete, comprehensive, coherent, and correct, thereby replacing a collection that is not.
This should not be difficult nor require much effort. Your organisation has already re-supplied these data files, and done so more than once, so another re-supply is possible. This time correctly? Hopefully, not omitting October’s Reserve Price file for 1996, etc. And not suppressing the timestamp information. Even presenting data correctly on the daylight savings changeover days should be easy, because you have already dealt correctly with this maddening annoyance, as in 2014 for example, though not for earlier files that were resupplied later on. Providing dates in the dd/mm/yyyy form is also possible, because file 201912_Reserve_prices.csv turned up in that style (which broke a system expecting to convert from the hyphen style), though it was later replaced by a file in the hyphen style.
Introducing a Z code (or whatever suits) along with the F code might be slightly more difficult, but it would enable the bizarre code numbers to be kept separate from the proper price numbers, and then the statistics will be good.
Humm. Is it worth saying that “Good statistics are good to have.”?
There have been mass re-supplies of these data files in October 2015 and October 2018. Perhaps in October 2021?

Yours faithfully,

Nicky McLean

Link to this

From: OIA
Electricity Authority


Attachment image001.jpg
0K Download

Attachment image002.png
1K Download

Attachment image003.gif
0K Download

Attachment image004.jpg
19K Download


Good afternoon Nicky,

 

Thank you for your request received on 13 October 2021, under the Official
Information Act 1982.

 

We are considering your request and will respond to you as soon as
possible.

 

Please contact us if you have any queries.

 

Kind regards,

Vicki 

 

[1]cid:image001.jpg@01D71191.22399C50   Vicki Stent

       Project Coordinator                     

       DDI:      +64 4 [DDI redacted]

 

       Electricity Authority - Te Mana Hiko   

       Level 7, Harbour Tower, 2 Hunter Street

       PO Box 10041

       Wellington 6143

       New Zealand

         [2]www.ea.govt.nz

 

[3]cid:image002.png@01D71191.22399C50  [4]linkedin-icon [5]YT-small

 

 

 

show quoted sections

Link to this

From: OIA
Electricity Authority


Attachment image005.jpg
0K Download

Attachment image006.png
0K Download

Attachment image007.png
0K Download

Attachment image008.jpg
0K Download

Attachment Letter to Nicky McLean.pdf
245K Download View as HTML


Dear Nicky,

 

Thank you for your request of 13 October 2021, under the Official
Information Act 1982.

 

Please find attached the Authority’s response.  If you have any questions
regarding our response, please don’t hesitate to contact me.

 

Kind regards,

    Tessa Balinger

       Ministerial Advisor                   

       DDI:     +64 [DDI redacted]

       Electricity Authority - Te Mana Hiko   

         [1]www.ea.govt.nz

 

[2][IMG]  [3]linkedin-icon [4]YT-small

 

 

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Balinger,
Thank you for your response of the ninth of November, forwarding the response of Ms. Gillies, advising me of the provision of data in response to my OIA request. I am amused to see that your website’s page https://www.emi.ea.govt.nz/Wholesale/Dat... which I had slighted as being of no relevance to the question of the data files in https://www.emi.ea.govt.nz/Wholesale/Dat... is now the location of the new folders containing the requested data files. The difference in names lies in the underscore between Final and pricing, and hundreds of files are involved.
To recapitulate: the original file collection vanished, to be replaced by a second collection but alas, some files were truncated and had irregular names (“pricing” rather than “price”), etc. and so they were ignored. Then they in turn were replaced, by a third collection that I shall refer to as the “old” data, itself with problems that I had denounced. Now comes there a fourth collection, which I shall refer to as the “new” collection. They are dated for the third of November, just a few days too late to have arrived in an October as had the other replacements in 2015 and 2018, but no matter.
The first thing to do with a collection of new files is to compare them to the old files, and alas, this simple procedure fails at the first step, because the file names differ: the old files use the name part “Final_prices.csv “ while the new files use “FinalEnergyPrices.csv”, thus none match.
By contrast, the name part “FinalReservePrices.csv” is used in both the old and the new collections, and so it is immediately plain that the new collection contains a file 199610_FinalReservePrices.csv while the old collection does not - although the original collection did. Success! A missing file has been provided.
Further, it is quickly apparent that the new collection includes files 199704_FinalReservePrices.csv to 200403_FinalReservePrices.csv which are not to be found in the old collection. Success! Missing files have been provided.
On the other hand, the new collection ends with file 202012_FinalReservePrices.csv, while the old collection (currently) ends with file 202109_FinalReservePrices.csv.

The next stage in assessing a collection of old and new files for the same data is to compare the corresponding pairs of files to see what differences (if any) lurk within. Alas, they all differ, in tedious detail that swamps any substantive differences such as changed inserted or deleted values. The first pair to match is for 199611 and the header line of the old version reads Trading_date,Trading_period,Island,FIR ($/MWh),SIR ($/MWh) while the header line of the new version reads TradingDate,TradingPeriod,Island,FIRDollarsPerMegawattHour,SIRDollarsPerMegawattHour and changing the names of the columns has consequences for later processing. Why not stick with the established column names? They were used in revisions one and two (old), but not three (new).
A single line differing is not much of a problem in a comparison, but now comes the real difficulty. The first few lines of data (records 2-5) in the old file read
1996-11-01,1,SI,1.03,1.03
1996-11-01,1,NI,0.00,4.07
1996-11-01,2,SI,1.01,1.01
1996-11-01,2,NI,0.00,4.08
whereas in the new file they read
1996-11-01,1,NI,0.00,4.07
1996-11-01,1,SI,1.03,1.03
1996-11-01,2,NI,0.00,4.08
1996-11-01,2,SI,1.01,1.01
In other words, the sequence is SI,NI,SI,... in one, and NI,SI,NI,... in the other. This swamps the comparison in trivial differences. It doesn’t much matter which ordering is used, so long as the same one is used for both...
And there is a technical difference as well: the old files use the character sequence CRLF to end lines (as is standard on Windows systems), while the new files use LF only. In this case the comparison routine can be instructed to ignore line endings so that doesn’t much matter, it is the NI/SI sequencing that blocks simple comparisons.
However, despite the NI/SI issue, a spot check of the data file pair for October 2010 shows that the new file has data for the 31’st of October, that the old data file lacks. Success! Omitted data have been provided.

In general terms, two files cannot be identical if they have different sizes and there is no need to inspect their content to know this. Two files of differing sizes can have equal content if say the line endings differ. Their content can be equivalent, if say some lines have more trailing spaces than others. But there are many ways in which files can differ and yet be equivalent, and the only way forward is to inspect their content.
In this case, all the old data files can be read into a special work file (a “big bag”, like) and then, all the new files can likewise be read. The special point here is that the input process also checks incoming data against the existing data in the work file (for the given name and date): if there is no already-stored record, it need merely be added to the bag, and nothing to be done if they are the same. Otherwise, a report can be made on the differences as the new version is assimilated...
So then, read the old data, and then the new data. No reports of clashes emerge, because the new data files use different names: “ FIRDollarsPerMegawattHour” instead of “FIR ($/MWh)” for example. Editing the header lines of two hundred and ninety-one data files so that they all match would be tedious and error-prone as well. Are these column headers the result of whatever whim came to mind at the time? Sticking with the established usage would be better all round. But as no-one seems able to maintain a constant usage of names for long, not just for data file names but also for the names of data series, the read process has long since employed an “alias” table, and this can be extended so that the corresponding names will be regarded as equivalent. Onwards.
Many files in the old collection bungled the daylight saving changeover days, the first being 199703_FinalReservePrices.csv. Then the old collection lacked data files for the subsequent months until the appearance of 200404_FinalReservePrices.csv, and the stretch day in March for that year was therefore missed, its putative bungles moot. But later files are also incorrect:
200410_FinalReservePrices.csv
200503_FinalReservePrices.csv 200510_FinalReservePrices.csv
200603_FinalReservePrices.csv 200610_FinalReservePrices.csv
200703_FinalReservePrices.csv 200709_FinalReservePrices.csv
200804_FinalReservePrices.csv 200809_FinalReservePrices.csv
200904_FinalReservePrices.csv 200909_FinalReservePrices.csv
201004_FinalReservePrices.csv 201009_FinalReservePrices.csv
201104_FinalReservePrices.csv 201109_FinalReservePrices.csv
201204_FinalReservePrices.csv 201209_FinalReservePrices.csv
201304_FinalReservePrices.csv 201309_FinalReservePrices.csv
The corresponding files from the new collection all handle the daylight saving changeover days correctly. Success! The errors in the data for those days are gone. Better still, the same process but starting from the original data file collection found no changes to the data for the daylight savings changeover days, so the new data are not just correctly presented, they are the same as the original. Though for these data, the test is not necessarily definitive because it is not unusual for a day’s data to have many similar values. Thus, if the value for half-hour nine were wrongly being placed into the slot belonging to half-hour seven (say), or vice-versa, if the values were equal then no difference would be noticed. This is the case with the South Island data for 20/3/2005: all values are 0·11. However, a value being placed into an empty slot would be noticed, as was the case with the new data filling in holes in the old data that existed because of the miscounts on the stretch and shrink days.
And further, the new collection provides files that had been missing from the old collection, and, no new errors in the daylight saving changeover days were introduced.

But on the other hand, although as mentioned the new collection offers data for 31/10/2010 which the old file does not (though the original file does), the new collection’s data file 201711_FinalReservePrices.csv lacks any data for 18/11/2017 and 30/11/2017, yet the old collection’s data file has them This is disconcerting. Presumably your system was commanded with something along the lines of “produce monthly data files” and not “ produce monthly data files, omitting 18/11/2017 and 30/11/2017” - so how can this happen? If once a tiger is seen, is not the jungle then full of tigers?

Also disconcerting is that there were miscellaneous changes to the data on some 85 days. The largest seems to be as follows, shown by the relevant lines from the data files in the order “original”, “old”, and “new”:
HAY2201 ,13/02/2009,24,346.6,500,16/02/2009 14:11,F
2009-02-13,24,NI,346.60,500.00
2009-02-13,24,NI,2938.14,3065.32
That is, both the original and old data files agree with values of 346·6 and 500, but the new file offers 2938·14 and 3065·32. This is rather odd, as these data are blessed with the appellation “Final” - and yet there are changes, that have been made some time after 2018 (when the old data were re-issued) to data for 2009. How can this happen to historical data? What is the provenance of the new values?
This is obviously nothing to do with the data for 26/3/2011, where the “final” values were indeed revised in 2013. In the original data, many of the revised values for that day were given to five decimal figures, as for example in
HAY2201,26/03/2011,8,0.02,0.75494,1/4/2013 12:00,F
HAY2201,26/03/2011,9,0.02,1.89561,1/4/2013 12:00,F
But in the old data, they are reduced to two decimal digits (four values per line):
2011-03-26,8,NI,0.02,0.75,241.833,320.275
2011-03-26,9,NI,0.02,1.90,246.000,321.170
And in the new data (only two values per line),
2011-03-26,8,NI,0.02,0.75
2011-03-26,9,NI,0.02,1.90
Evidently, presenting the values to just two decimal digits does not necessarily represent the original data correctly. One wonders whether the values actually employed in the money-transfer accounting were constrained to two decimal digits; if not these data files do not represent them correctly.
So the old data differ from the original in bungling the stretch and shrink days (as were listed above), and in lacking the extra precision offered for the adjusted day, 26/3/2011, which is also not matched in the new data.
Another example, listing only the old and the new data:
2009-02-13,24,NI,346.60,500.00
2009-02-13,24,NI,2938.14,3065.32
And now, with multiple values changed in a day, mostly by a few cents, but also...
2009-05-17,37,NI,0.01,4.00
2009-05-17,37,NI,0.01,82091.52
And with larger changes for many in the day, such as
2009-01-16,26,NI,89.00,69.46
2009-01-16,26,NI,14.54,4.99
Enough. 2009 seems to have been a special year, worthy of frequent revisiting.

Aside from the new collection of files ending with 2020 (while the old extends up to the end of September 2021), there is another matter, the appearance of two additional columns of data, with the headings FIR (MW),SIR (MW) – that is, there are four values per line instead of just two with the headings FIR ($/MWh),SIR ($/MWh). As previously mentioned, they first appeared with the data for October 2015, then when the data files were resupplied in 2018, thus forming the “old” collection, the additional columns were now found back to part way through the 21’st of July 2009.
The new data file collection contains no mention of these additional data, even with deviant column names. There are only two values per line throughout. Since these two new data series first appeared in the re-supplied files of the old collection, and that collection omits data for the 31’st of October 2010, then those two series lack data for the 31’st October 2010, because the new collection omits those two series entirely. Similarly, some of the old files supplying those additional data series contain bungled daylight saving changeover days. Those errors can be corrected by reading data from the original data files, but that will only correct the ordinary data for those days, not the additional series.
And there are no timestamps at all. Aside from being of general interest (thus my original request), there is a specific usage. These timestamps would be particularly useful for affirming that a value has been altered by a proper process, as the new value would have a later timestamp.

Now for the final prices data. Many more data series are involved, so there is a lot of opportunity for deviationism – and confusion. As mentioned before, the first step is to compare the two collections of data files, the old and the new. This fails at once because the old data files use the name part “Final_prices.csv “ while the new files use “FinalEnergyPrices.csv”. The easiest procedure is to copy all the new files into a new folder (trying not to lose track of which folders contain files of what provenance), and therein, perform a mass rename so as to employ the shorter names of the old collection. Whereupon, it is seen that the file names match. Except for the old files called 201403_Final_Pricing.csv and 201404_Final_Pricing.csv, but, they do have counterparts.
None of the file pairings have the same size, again because of CRLF and LF usage differences, and because the column names are different anyway. And just for fun, the old file 201402_Final_Prices.csv used dates in the dd/mm/yyy style, while its counterpart in the new collection uses yyyy-mm-dd. And again, the new data files always provide two decimal digits, even if there are trailing zero digits, while the old files do not, thus 33·80 versus 33·8. On the other hand, every line of the new files is shorter by one character due to the LF usage instead of CRLF: those data files are all slightly shorter as a result as only some zero digits are omitted.
So, once again, read the content of these data files into a workfile.
And in the new collection, the names AN2201, AT1101, CH0111, DN0141, DN0331, DN1101, DN2201, ER0331, ER1101, GM0331, HO0331, LG0111, LG0331, NG0331, NG1101, NI0111, OT0111, PE0331, PI0331, RA0111, RR0111, ST0331, TI0111, TI2201, TM0111, TM0331, TN0331, TO0331, TR0331 do not appear. Success! The truncated names do not appear in the new data.
But this is not all to the good.
In the old data those truncated names supplied data for 1/12/1999 to 30/4/2000, and although they do not appear in the new collection’s data files, the corresponding names MAN2201, MAT1101, MCH0111, MDN0141, MDN0331, MDN1101, MDN2201, MER0331, MER1101, MGM0331, MHO0331, MLG0111, MLG0331, MNG0331, MNG1101, MNI0111, MOT0111, MPE0331, MPI0331, MRA0111, MRR0111, MST0331, MTI0111, MTI2201, MTM0111, MTM0331, MTN0331, MTO0331, MTR0331 do (as they should), and, they all have a hole of 152 days that would have been filled by the data from the truncated names were their data to have been properly assimilated. Whoever produced the new collection of files failed either to correct the truncated names as stored, or to include them in the list for output. There are doubtless many names in the vasty store, but if you do not call them, they will not come.
These data are nominally a record of the prices used in various transactions involving the actual exchange of money, so one wonders what happened for those involving MAN2201 etc. for those months since Manapouri is not an obscure power station. Possibly, the storage system delivers zero values when data are omitted, and if so, I’d like to buy my groceries at such a price when the checkout scanner fails to find a match. But, the seller would be unhappy with that and my cheerful suggestions of “Free?” have always been ignored. In this situation, surely at least one of the buyer or the seller would object? Followed by the correction of the storage system? Humm...
Anyway, the old collection presents data for 528 different names, while the new collection only offers 306. The twenty-nine truncated names are insufficient to make up the discrepancy, so the new collection must be omitting data for names previously supplied: 499 – 306 = 193 omitted names.
The missing names are: ABY1101, ALB2201, APS0661, ASB2201, ASY0661, BAL1101, BDE1101, BEN0111, BEN2701, BEN2702, BLN1101, BPC1101, BPE1101, BRB2201, BRK2201, BRY2201, CBG1101, CBG1102, CLH0661, CML1101, CML1102, CML2201, CML2202, CPK1101, CPK1102, CPK1103, CST1101, CUL2202, CUL2203, DAR0501, DOB1101, DVK1101, DVK1102, EDG1101, EDG1102, EDG2201, EDN1101, FHL1101, FKN1101, FKN1102, GFD1101, GFD1102, GIS1101, GLN2201, GNY1101, GOR1101, GOR1102, GYT1101, GYT1102, HAM1101, HAY2701, HAY2702, HEN1101, HEP1101, HIN1101, HPI2201, HPI2202, HUI1101, HWA1001, IGH1101, INV1101, KAI0661, KEN1101, KEN1102, KIK1101, KIN1101, KKA0661, KMO1101, KMO1102, KMO1103, KPU1101, KPU1102, KTA1101, KWA1101, KWA1102, LIV2201, LTN2201, LTN2202, MCH1101, MGM1101, MHO1101, MHO1102, MHT1102, MHT1103, MLG1101, MLG1102, MNI1101, MOT0661, MOT0662, MPE0501, MST1101, MTM1101, MTN1101, MTN1102, MTO1101, MTO1102, MTR1101, NMA2201, NPK1101, NSY2201, OAM1101, OAM1102, OHW2201, OKE1101, OKI0331, OKN1101, ONG1101, OPI2201, OPI2202, OPK1101, OPK1102, OTA1103, OTA1104, OTA1105, OTA1106, OTA1107, OTA1108, OTA1109, OTA110A, OTI0661, OWH1101, PAK1101, PAL1101, PEN0222, PEN2201, PIE1101, PNI1101, PNI1102, PPT2201, PRM1101, PRM1102, RDF1101, RDF1102, ROT1102, RTO1101, RTR1101, SDN2201, SFD1101, SPN0662, STK1101, STU1101, SVL2201, SVL2202, TAK2201, TAK2202, TGA1101, TIM1101, TIM2201, TIM2202, TKA1101, TKH0501, TKR1101, TMH2201, TMI1101, TMK1101, TMK1102, TMN2201, TNG2201, TRK1101, TRK1102, TWH2201, TWT2201, TWZ2201, UHT1101, UTK0661, WAI1101, WDV1102, WEL1101, WEL1102, WES1101, WES1102, WES1103, WGN1101, WHU1101, WIL1101, WIL2201, WIR1101, WIR1102, WKO1101, WMG1101, WMG1102, WPR2202, WPR2203, WPT1101, WPT1102, WPW1101, WPW1102, WRA1101, WRA1102, WTK1101, WTK1102, WTU2201, WTU2202.
The old file collection offers data only for 21/7/2009 to 19/8/2009 for all of those names, and it is tempting to say “good riddance to old rubbish” and enjoy their absence, as distinct from bemoaning the absence of data that would have been supplied only by the truncated names. Except, why were those series supplied in the first place? What has changed for 2009 since those data series were resupplied in 2018 that the new collection issued in 2021 omits them?
The date reappears: data for HTI1101 start 21/7/2009 in the old data but 25/2/2019 in the new data, both continuing. WAI0501: 21/7/2009 vs. 2/7/2018. WVY1101: 21/7/2009 vs.12/11/2020.
And the new collection is omitting more, in other ways. Discrepancies abound.
The old file 200302_Final_prices.csv is 8,847KB in size, slightly smaller than its neighbours (it being for February), but the new file 200302_FinalEnergyPrices.csv is 8,410KB in size, and this is not just because of the CRLF vs LF difference, it lacks any data for half-hour one on 1/2/2003. One file starts
TradingDate,TradingPeriod,PointOfConnection,DollarsPerMegawattHour
2003-02-01,2,ABY0111,51.18
And the other
Trading_date,Trading_period,Node,Price
2003-02-01,1,ABY0111,68.95
So all 233 names then current miss out, and the same again for the first half-hour of every other day of the month. This causes a spray of discrepancies in the comparison of statistics. Names whose data end before February 2003 (e.g. AHA0111) lack those omissions, as do those that start after, such as ARI1102.
The original data file and the old (re-issued in 2018), contain data for that time, yet the new data file (issued in 2021) for 2003 does not. How can this happen?

On the other hand, some old data have swum out of the historical record and into the new collection that evaded the old collection. The old data for name BOB1101 starts with 19/5/2000, while in the new data it starts with 1/9/1999 yet vanishes on 1/1/2000, not to re-appear until 19/5/2000. Similarly, for WDV1101 the old data start with 16/7/2004, but with the new data that name starts with September 1999, ends with January 2000, then resumes with 16/7/2004. And for WPR0661, the old data starts with 12/1/2007 but the new data starts with September 1999, ends with December 1999, then resumes with 12/1/2007. Likewise for HOR0661 starting 21/8/2001 in the old data but 1/9/1999 in the new. And KAW2201 starts 3/10/2002 in the old data but 1/9/1999 in the new. MDN0142: 17/3/2004 vs. 1/9/1999. MPI0661: 11/11/2003 vs 1/9/1999, KAW2201, ...
Perhaps these newly discovered data ought to be added to the bag? Ah, but only if they are good data: what caused the omission for the old collection but not the new? Historical data ought not fade in and out of view depending on what year you look. And in the new data these names do fade back out of view, no longer appearing in data files for later months shortly after their earlier starts. This is not proper behaviour.
There are other ghosts haunting the final price data for the day of revision, 26/3/2011. The original data file had data for that date censored lest there be “confusion”. Only in 2013 were the revised values supplied to fill the hole. In July 2016 there came a file declared to contain the original data for the missing date (that had been censored), except it included data for DOB1101 and CUL0661, which names had not been active in 2011. In the original data DOB1101 had lasted only for 21/7/2009-19/8/2009, and the same span in the old data. In the new data, there are no appearances of DOB1101 at all. For CUL0661 the original data cover 21/9/2009-20/8/2009 and resume with April 2012. The old data are the same as are the new data files. This agreement is good, but, how is it that a re-supply of historical data can concoct values for names not then in use?

These are examples just picked at random from the horde, noting gross discrepancies such as the presence or absence of data between the two collections. A more detailed comparison would take a bag full of the old data and then assimilate the contents of a bag full of new data, noting changes in the individual values of the various series between the old and new collections.
They are legion.
There are some 5,837 days in which some data series from the new collection has values different from those in the old collection. The changes often involve many values within the day, and the differences in Final Price values range from about $100 down to around $4. Not just additive differences but perhaps by a factor of five or so. And they’re all in December 2019...
File 201912_FinalEnergyPrices.csv is 9,815KB and has 363,985 lines while
File 201912_Final_prices.csv is 10,152KB and has 363,967 lines.
An immediate difference is provided by the data for JRD1101; in the old file, its first value is for half-hour 19 of 13/12/2019, while in the new file, its first value is for half-hour 1 of 13/12/2019. All supplied values are zero. So that’s eighteen extra values, one per line. 363,967 + 18 = 363,985.
File comparisons were impeded not just by the actual differences in values and the presence or absence of trailing zero digits in the fractional parts, but also by the different ordering of blocks of data. For the first appearances of JRD1101, file 201912_Final_prices.csv has in record 144784: 2019-12-13,19,JRD1101,0 while file 201912_FinalEnergyPrices.csv has in record 140636: 2019-12-13,1,JRD1101,0.00 and record 145046 has 2019-12-13,19,JRD1101,0.00. Remembering of course which file came from the new and which from the old collection.
Further details on many more discrepancies could be supplied, but bewilderment is already upon me. Enough.

-------------------------Conclusion-------------------------
On being informed of the new supply of data I would have liked to respond “Burp!” - in some cultures this signifies appreciation to the host for a good meal.
Instead, logorrhoea.
And some final squirts. When historical data are re-supplied in a different way (different file names, changed layout, changed names, etc.) the newly-supplied historical data should match the previously-supplied historical data. Except of course for where the old data are corrupt – missing files, even misformatted files (bungled daylight saving changeover days) etc. and so the new data files can repair the old collection and it be improved thereby. Evidently, no such checks were made for the new collection, or for the old. Other than perhaps “Lots of data files have been produced. It is good.”
The new file collection is also corrupt – missing data, missing times, even omitted days, aside from changed values. Where you offered one file collection, that contained mistakes, you now offer two file collections containing mistakes. Different mistakes. The old collection can’t be discarded without data loss, though the new collection could supply the missing data files and replace the erroneous files – if the new files were not in error themselves.
And the new data files also omit data previously supplied, the additional two columns headed by FIR (MW),SIR (MW) – one pair for the North Island and a second pair for the South Island - and you took the trouble to replace the data files in the old collection back to July 2009 in order to provide those data.. Presumably, the effort to do so was then thought worthwhile. Why is it so no longer? Or are there yet other data files somewhere in which they may be found? Similarly, the old file collection has not been augmented with this year’s October’s data as would be done in files called 2021010_Final_prices.csv and 202110_Reserve_prices.csv via entries in https://www.emi.ea.govt.nz/Wholesale/Dat... (adjusted once a month), instead, daily files are appearing via https://www.emi.ea.govt.nz/Wholesale/Dat... which means that the additional series FIR (MW),SIR (MW) are no longer being supplied.

And not forgetting the objective of my original request, that the Final and Reserve Price data be supplied along with the timestamp information (sometimes named as “Price_Run_Time”), as had been the case with the original data. These timestamps would be useful in tracking changes to the data (as with the revision of data for 26/3/2011 done in 2013), rather than just staring at a puzzle. But the new data files you have indicated do not include timestamps, and their absence does not constitute supplying them.
In this context I hesitate to mention your web page https://www.emi.ea.govt.nz/Wholesale/Dat... under which is an even greater horde of data files in multiple sub-directories. These files continue to employ the format of the originally-supplied data files, for instance ABY0111,23/11/2021,9,84.94,23/11/2021 00:10:12,L which is to say: name, date, half-hour number, datum, timestamp, L-code. Evidently, there is a use for a code letter other than F. I’d still prefer that to be a part of the name sequence, but so many data files now have it at the end. Except for example with the original data, where for 2001 onwards, the F-code was moved from the end to before the datum – still not following the name field. Oh well. And the date is in the form dd/mm/yyyy.
For consistency and ease of comparison with the Forecast Prices, the Final (and Reserve) Price data should have the same format, which means with timestamps and the F-code, as had originally been the case. Why introduce a deviant format in the new data? Multiple different formats for related data are a load on everyone’s patience and even computers slog.
But consistency could be restored by a different route: I hope that you will not reformat the Forecast Price data in the new style as well, thus discarding the timestamps and the code letter. Which is why I hesitated to mention that collection.

So, I am still asking for the supply of Final and Reserve price data that include the timestamps, and which would be in a format compatible with that of the massive Forecast Prices collection, as per the format of the original data supply. And not omitting the FIR (MW),SIR (MW) pairs. Monthly compendia would be good too, updated monthly, not just at the end of the year.
Why introduce inconsistency? Why remove information?

Yours sincerely,
Nicky McLean

Link to this

From: OIA
Electricity Authority

Kia ora Mr McLean,

Thank you for your request of 28 November 2021, under the Official Information Act 1982, for the following information:

- "the supply of Final and Reserve price data that include the timestamps, and which would be in a format compatible with that of the massive Forecast Prices collection, as per the format of the original data supply. And not omitting the FIR (MW),SIR (MW) pairs. Monthly compendia would be good too, updated monthly, not just at the end of the year."

You can expect a response by Friday 24 December 2021 at the very latest, being 20 working days after the day your request was received.

If we are unable to respond to your request by Friday 24 December 2021, we will notify you of an extension of that timeframe.

If you have any queries, please feel free to contact me by emailing [email address].

Kind regards,

Tessa Ballinger
Ministerial Advisor
DDI: +64 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: OIA
Electricity Authority

Dear Mr McLean,

I refer to your official information request dated 28 November 2021 for information regarding EMI datasets. Please see below for the full request.

For your request to be considered valid the information sought must be “specified with due particularity" under S12(2) of the Official Information Act (OIA). The Authority has been unable to identify what information is being requested and therefore your request will need to be clarified or amended to enable us to respond.

We suggest you do this by providing us with:
• a clear and concise question or,
• a specific request for official information.

Please note, any clarification or amendment of a request is considered to be a new request for the purpose of calculating the maximum statutory timeframe for response—see section 15(1AA) of the OIA.

If you have any queries, please feel free to contact me by emailing [email address].

Yours sincerely,

Tessa Ballinger
Ministerial Advisor
DDI: +64 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
Thank you for your response of the sixth of December (Saint Nicholas’s day; my birthday!) and apologies for misspelling your name by omitting an “l” - sans-serif typefaces are not so clear and tired eyes can falter. Anyway, I make mistakes too. Sorry.
The statement you outlined is clear, which is why I am puzzled by your following response of the seventh in which you call for a “clear and concise question or, a specific request for official information”. This is what I had attempted at the start, though alas, I admit, not so concisely due to the complexity of the data in place and their absences. Then came the “new” data collection with its own problems, but notably, not providing the requested timestamps as I had at first thought.
Providing explicit enumerations of the hundreds of data files involved (or missing) along with what was present (or missing or wrong) and what was requested would be even worse. Is this what you mean by “...has been unable to identify what information is being requested and therefore your request will need to be clarified or amended to enable us to respond.” - or should I regard your re-statement of the sixth as sufficiently clear and specific?
I am hoping that your organisation’s data administrator and his or her staff can see their way clear to converting back to the original format or similar, that is indeed compatible with the ongoing usage of the Forecast Prices collection, resulting in a coherent presentation of both past and ongoing data, and in particular, all having the accursed timestamps.
This matter having dragged on, there is no pressing need to spoil the pre-Christmas celebrations. Still, if changes are to be made, they might as well be done soon so that there would be less mess to sweep under the rug. Both the old and the new file collections are corrupt, for instance.

Yours sincerely,
Nicky McLean

Link to this

From: OIA
Electricity Authority

Dear Mr McLean,

Thank you for your email.

The Electricity Authority's Data and Information Management team has an ongoing commitment to open data and transparency. We do not consider these emails to be usefully processed as requests under the Official Information Act 1982, given that they are about the specific formats of data sets. We have appreciated the opportunity to correct some errors in response to earlier comments but consider the current state of the Electricity Market Information website (EMI) to be fit for purpose for most people using it.

We would be happy to hear from you at [Electricity Authority request email] if you continue to notice any issues with the data we publish. However, we ask that you please limit your comments to a concise description of the problem you observe.

Thank you for your time and I hope you enjoy the Christmas and New Year period.

Kind regards,

Tessa Ballinger
Ministerial Advisor
DDI: +64 4 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
There is negative progress. I am not much interested in reporting data errors and format problems even though they exist and shouldn't. I note a recent development, the vanishing of the "old" data file collection as supplied via https://www.emi.ea.govt.nz/Wholesale/Dat... (the underline between Final and pricing), so a whole pile of problems have vanished. But this also means that the data for the FIR (MW),SIR (MW) pairs have vanished as well since they are not being supplied in the "new" collection. This is a separate matter, if a related detail, as with the errors.
My request for information concerns "the supply of Final and Reserve price data that include the timestamps", as appeared to have been recognised in your reply of the sixth of December. This continues to be the objective.
The data of the "Forecast Prices" collection does include timestamps but those files start in 2018, while the Final and Reserve price data files ceased presenting timestamps in late 2013 (and the new files replacing them from 1996 have no timestamps at all), thus, any consideration of when forecast prices become declared final via comparing their timestamps is precluded.
Supplying the timestamps (the Price_Run_Time, or similar name) associated with with the Final and Reserve price data rather than suppressing them would enable such a comparison and other investigations.

I am still requesting this information: should I start over with a fresh request? (and not re-mentioning errors, etc.)

Yours sincerely,
Nicky McLean

Link to this

From: OIA
Electricity Authority

Kia ora Mr McLean,

Thank you for your email of 19 December 2021 clarifying your request.

We have considered your request for "the supply of Final and Reserve price data that include the timestamps" a request under the Official Information Act 1982.

You can expect a response by 4 February, being 20 working days after the day of your email clarifying your request for information.

If we are unable to respond to your request by then, we will notify you of an extension of that timeframe.

If you have any queries, please feel free to contact me by emailing [email address]

Kind regards,

Tessa Ballinger
Ministerial Advisor
DDI: +64 4 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: OIA
Electricity Authority


Attachment Letter to Nicky McLean.pdf
236K Download View as HTML


Kia ora Mr McLean,

Thank you for your request of 19 December 2021, under the Official Information Act 1982.

Please find attached the Authority's response. If you have any questions regarding our response, please don't hesitate to contact me.

Kind regards,

Tessa Ballinger
Ministerial Advisor
DDI: +64 4 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
Thank you for the reply by Ms. Sarah Gilles, which is a bit difficult to follow because the second paragraph ("By 28 February...") is a rambling sentence with sliding subjects. The first part is clear enough, about the intended format to come, and then comes a long (...) part. It too starts clearly enough then continues by saying ... and without the FIR and SIR data ... and is available elsewhere on the EMI website. The paragraph ends "This will be available on the EMI website." However the proffered link https://emi.ea.govt.nz/Wholesale/Dataset... is u/s, with "error 404". Now, if this is for the intended data to come, fair enough, it is not yet after the end of February. But if the "This" refers also to the FIR and SIR data, said to be available elsewhere on the EMI website, as is possible for sliding subjects in a long sentence, it is unhelpful.

In other words, where in the EMI website are the FIR and SIR data to be found, presumably for the stated July 2009 to January 2022? Probably not at the stated web address even after February.

I take it that the Final and Reserve prices to come (and with timestamps) are to be really really Final and may well be delayed rather than be issued and then later on be changed, and that if one wants to know about any prices prior to such Ultimate Finalities (not yet manifested), so as to have at least a full set of values for a month, one will have to peruse the many values offered in the Forecast Prices for the relevant data. Timestamps will thus be helpful in following the Price's Progress.

Yours sincerely,

Nicky McLean

Link to this

From: OIA
Electricity Authority

Kia ora Mr McLean,

This webpage (https://emi.ea.govt.nz/Wholesale/Dataset...) is where the final prices will be found by the end of February.

The FIR and SIR values are not part of the pricing files we receive from the pricing manager. You can explore and download FIR and SIR here: https://emi.ea.govt.nz/Wholesale/Reports...
Or here: https://emi.ea.govt.nz/Wholesale/Dataset....

If this does not give you the data that you need you can make another request for official information by emailing [email address].

For general enquiries please email [Electricity Authority request email].

Kind regards,

Tessa Ballinger
Ministerial Advisor
DDI: +64 4 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
Thank you for your reply, in which you identify the latest arrangements whereby one might find the FIR and SIR data. Two web locations are specified after "You can explore and download FIR and SIR here:" The first web location gives https://emi.ea.govt.nz/Wholesale/Reports...
which evokes a pleasant display with coloured blobs and a squiggly black line labeled FIR; some messing about with the pokable options on offer converts to squiggles for SIR. But the downloadable options are only for the image of the graph in various formats, not for the data themselves, so this would be the "explore" usage you mention.

For the downloadable data aspect then, on to the second proffered link, https://emi.ea.govt.nz/Wholesale/Dataset.... This evokes a rather ominous yearly directory list, but its earliest year of 2009 does indeed correspond the the previously-supplied earliest year of the FIR and SIR data: onwards.
The last data file with FIR and SIR data that had been supplied is rp2021m09b.csv, which came from your file 202109_Reserve_prices.csv, which came from a web page collection that your organisation has since purged, as has been described. Still, why not look for September 2021 in the new scheme. Obviously, one pokes directory 2021, and this evokes https://emi.ea.govt.nz/Wholesale/Dataset... which shows a horde of files whose names are quite difficult to read. ("MSS" traditionally means "manuscript" - not here!) File MSS_301112021091100097_0X.ZIP looks likely: onwards.
This file can then be decompressed easiiy enough, and some ninety files appear. Humm. The first is called MSS_301112021091100097_0X.0_01-OCT-2021_00_00_0.SPDSOLVED and it contains a vast amount of stuff (which I won't attempt to copy here; your system should be able to show you it), none of which looks remotely like the content of file rp2021m09b.csv (you can see why my file names avoid long sequences of digits):
Trading_date,Trading_period,Island,FIR ($ MWh),SIR ($ MWh),FIR (MW),SIR (MW)
2021-09-01,1,NI,.09,2.01,73.036,166.895
2021-09-01,1,SI,.07,1.94,75.001,90.801
etc...

If these downloadable data files do indeed contain the requested FIR(MW) and SIR(MW) data, how are they to be found? Is steganography involved? Files containing just those data would be much smaller and also easier to parse. The previous scheme of appending them to the data in the Reserve Price files has however been denounced as miscegnation and those files have been purged.
So, where to look for FIR(MW) and SIR(MW)?

Yours sincerely,

Nicky McLean

Link to this

From: OIA
Electricity Authority


Attachment Letter to Nicky McLean.pdf
236K Download View as HTML


Kia ora Mr McLean,

As indicated in the Authority's letter to you of 1 February 2022, we intend to publish the final energy price and final price reserve files exactly as received from the pricing manager.

Some of these are now available here: https://emi.ea.govt.nz/Wholesale/Dataset....
Within this folder (which is live already) will be six subfolders (four are there already):
../FinalEnergyPrices
../FinalReservePrices
../InterimEnergyPrices
../InterimReservePrices
../ProvisionalEnergyPrices
../ProvisionalReservePrices

This should be finished over the next few days with a few notes and explanations added.

Kind regards,

Tessa Ballinger
Ministerial Advisor
DDI: +64 4 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
Guided by your response I made a spot check on file finalRes20220216140317.csv.gz and saw that it indeed is in the style of the original Reserve Price data files (and including timestamps), but lacking a header line and so it cannot be seen whether these data are in fact for the FIR and SIR data series, stated as being available in your previous reply. and whose absence I had mentioned.

I imagine that as the end of February approaches, so also will appear the data file collections for the Final and Reserve price series, as you have outlined, and as portions are even now appearing partway through February. What then of the associated FIR and SIR data series? As previously stated, the web pages you quoted as offering them do not actually present those data series. Presumably they are somewhere else. But where?

Yours sincerely,
Nicky McLean

Link to this

From: OIA
Electricity Authority

Kia ora Mr McLean,

I refer to your two recent emails dated 9 and 20 February 2022 in which you made queries regarding FIR and SIR data. As indicated in previous correspondence, the FIR and SIR values are not part of the pricing files we receive from the pricing manager.

You can explore and download FIR and SIR data here: https://emi.ea.govt.nz/Wholesale/Reports...

Or here: https://emi.ea.govt.nz/Wholesale/Dataset....

The Authority is dedicated to continuously improving the way we collect and publish data. Our improvements are determined by priorities at the time, noting the need to service a variety of stakeholders interested in the data and insights. So far, in the 2021/2022 financial year we have processed four of your requests under the Official Information Act, as well as answering and addressing your ongoing queries and correspondence. The Authority does not have extensive resources available to respond to repeat individual specific data requests. Where possible, we have made some changes in response to your requests but that has diverted key staff away from other work. We will continue to make improvements for the benefit of all interested parties.

Yours sincerely,

Tessa Ballinger
Ministerial Advisor
DDI: +64 4 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
Thank you for your reply confirming the imminent arrival of Final and Reserve Price data files, as had been described. The remaining difficulty is still the absence of FIR and SIR data files. As mentioned in my reply of the 9'th February, the web site addresses you have quoted do not in fact lead to data files containing FIR and SIR data in any recognisable form. Parsing the .svg format data that is used to generated the displayed graphs, still less scaling off the values from the graph is not likely to be a productive endeavour. The second web address continues to be for vast data files full of all manner of stuff, none of which appears to be FIR or SIR data as had been supplied.
Since the latest plan for supplying Final and Reserve data no longer includes the FIR and SIR data as before, where now are the FIR and SIR data be found? Your system must have them somewhere in order to show them on the graphs.

Yours sincerely,
Nicky McLean

Link to this

From: OIA
Electricity Authority


Attachment Reserve price trends.PNG.png
85K Download


Dear Mr McLean,

Please follow this link for the "Reserve price trends" report: https://emi.ea.govt.nz/Wholesale/Reports...
And follow this link for the "Cleared reserves offer stacks": https://emi.ea.govt.nz/Wholesale/Reports...

If you click on the "data" tab you will be able to view and download the data. Please see the screenshot attached.

I hope this is helpful.

Kind regards,

Tessa Ballinger
Ministerial Advisor
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
Thank you for your reply, with attached screenshot, derived from the "data" options offered by the web pages your reply nominates. Indeed there is mention of FIR and SIR, for the North and South Islands, but, these names are applied in a slightly different way. Note in particular on the screenshot are the headings "FIR price ($/MWh)" and "SIR price ($/MWh)" which are indeed the Reserve Price data. Now consider the first few lines from file 202109_Reserve_prices.csv, as had been supplied.
Trading_date,Trading_period,Island,FIR ($ MWh),SIR ($ MWh),FIR (MW),SIR (MW)
2021-09-01,1,NI,.09,2.01,73.036,166.895
2021-09-01,1,SI,.07,1.94,75.001,90.801
2021-09-01,2,NI,.09,.52,75.721,170.812
The texts FIR and SIR appear twice each, once with ($ MWh) and a second time with (MW); the former are the price values, and the latter are the megawatt values. These megawatt values are no longer being supplied in the latest revision to the format of the reserve price data files, and it is these data that I have been seeking as the tail end of this request.
Thus, referring to your nominated web page https://emi.ea.govt.nz/Wholesale/Reports..., the heading is Reserve price trends and the y-axis annotation is $/MWh with a choice of FIR and SIR; these clearly are prices, not megawatts and presumably are the same price values as are supplied in the corresponding data files.
Referring to your second nominated web page, https://emi.ea.govt.nz/Wholesale/Reports..., this shows a more complex display with coloured bars corresponding to $/MWh (annotation to the right) plus a wiggly black line "Cleared FIR" whose scale appears to be that of the y-axis, annotated in MW. Again a choice between FIR and SIR.
After some messing about, I see I can request data for the first of September 2021, and obtain the results as a data files, starting as follows:
Cleared reserves and offer stacks

From www.emi.ea.govt.nz provided by the Electricity Authority (New Zealand)
Run at: 20220306140237

Parameters
Date,01 Sep 2021
Region type,Island
Region,North Island
Market product,FIR
Band 1 max,10
Band 2 max,20
Band 3 max,50
Band 4 max,100
Band 5 max,1000

Period start,Period end,Trading period,Region ID,Region,Series,Reserves (MW)
01/09/2021 00:00:00,01/09/2021 00:30:00,1,NI,North Island,Cleared FIR,73.03600
01/09/2021 00:00:00,01/09/2021 00:30:00,1,NI,North Island,100-1000 $/MWh,119.00300
01/09/2021 00:00:00,01/09/2021 00:30:00,1,NI,North Island,50-100 $/MWh,37.10000
01/09/2021 00:00:00,01/09/2021 00:30:00,1,NI,North Island,20-50 $/MWh,101.40000
01/09/2021 00:00:00,01/09/2021 00:30:00,1,NI,North Island,10-20 $/MWh,31.86400
01/09/2021 00:00:00,01/09/2021 00:30:00,1,NI,North Island,0-10 $/MWh,166.79900
01/09/2021 00:30:00,01/09/2021 01:00:00,2,NI,North Island,Cleared FIR,75.72100
01/09/2021 00:30:00,01/09/2021 01:00:00,2,NI,North Island,100-1000 $/MWh,88.00300

And yes, 73.036 appears, followed by 75.721 as in the earlier data file format for the FIR(MW) data, and I suppose there could be a data file with the SIR(MW) values too as well as those for the South Island..
Aside from the rather large heading, the FIR(MW) and SIR(MW), it seems, are to be found by ignoring all the lines other than those with "Cleared" on them. This is certainly within the scope of data processing, but seems rather wasteful. And it seems that only single days can be selected at a go for this class of activity. (The "Reserve price trends" scheme mentions a date range, but, is for price data only, and those price data are already available via a disc file)

Can I hope therefore that it would be possible to find a scheme for acquiring the FIR(MW) and SIR(MW) data for NI and SI in some compact form (not along with large amounts of other stuff), and which could be acquired in reasonably large blobs, such as monthly. Otherwise it will mean a four-fold slog: separate data requests for each of FIR(MW) NI, SIR(MW) NI, FIR(MW) SI, SIR(MW) SI (or other ordering), one set for each day, for month after month. And much of the resulting information would have to be discarded. This is offputting.

Yours sincerely,
Nicky McLean

Link to this

From: Nicky McLean

Dear Ms. Ballinger,
This is to acknowledge the supply of Final and Reserve Price data files that include timestamps as per my request. Some difficulty ensued.
Starting with the Reserve Price data, as these files are smaller... Some 1,683 files are fingered via https://emi.ea.govt.nz/Wholesale/Dataset... all of them compressed via the .gz scheme, and ranging in size from 180 bytes to 155,464 bytes. As an aside, computer systems usually allocate storage space in blocks of some minimum size, such as 4KB or even 32KB; thus, splitting data into many small files can waste a lot of space. This is a chance for a data administrator to show competence.
Fortunately, 7-zip can be requested to expand them all in one go, not one-at-a-time, and the resulting files range from 647 bytes to 1,228,883 bytes in size, all with eye-straining names. Evidently, their content is not grouped by day, still less by month. If anything, it may be by the time of calculation. Whichever, this irregular scheme is an invitation for omissions.
And so it proves. Data for Monday 9/8/2021 are omitted for half-hours 37-42. There is no point in naming the data files that do not have those data, since all of them don’t. There are no such omissions in your file 202108_Reserve_prices.csv (except that you have purged it as it contained FIR (MW),SIR (MW) data) nor in your later-supplied file 202108_FinalReservePrices.csv.
Conversly, you may recall that the Reserve Price data file for October 2010 omitted data for Sunday the thirty-first, but then was re-supplied with data for that date present. However, the re-supplied data files do not contain the data series FIR (MW), SIR (MW) along with FIR ($/MWh), SIR ($/MWh) as before, and nor do these latest data files. So, there remains a one-day hole in the FIR (MW),SIR (MW) data. The web page you have indicated as offering those data series, accessed via https://emi.ea.govt.nz/Wholesale/Reports... that might supply the missing data for 31/10/2010 does not show on its graph any data with the legend “Cleared FIR” as is shown for later dates.

The claim is that these files are as supplied by the NZ Stock Exchange. Well, to start at the start, the new file finalRes19960101000001.csv begins
BEN2201,01/10/1996,1,1,1.97,"",F
BEN2201,01/10/1996,2,0,0,"",F
BEN2201,01/10/1996,3,0,0,"",F
but the original file rp1996m10.csv (actually supplied as Reserve_1996_10.csv) begins
BEN2201,01/10/1996,1,1,1.97,15/11/1996 01:54:50,F
BEN2201,01/10/1996,2,0,0,15/11/1996 01:54:50,F
BEN2201,01/10/1996,3,0,0,15/11/1996 01:54:50,F
And as you can see, the file originally supplied by the NZ Stock Exchange has timestamps supplied, while your re-supplied file has them present as null fields in an unusually elaborate form. Somewhat later on, after some initial fumbles, the NZ Stock Exchange began offering files with a coherent header line, starting with data for April 2005 as follows:
Grid_Exit_Point,Trading_Date,Trading_Period,Price_6s,Price_60s,Run_Time,Run_Type
which was helpfiul, though less so when the heading was changed. It appears that your new data files employ the same field order as the original files, but it would be good to have a header line confirming that.
It turns out that your re-supplied data files continue with “null” timestamps until they become genuine timestamps with data for Tuesday 22/8/2017. Since your organisation ceased supplying the data files as received from the NZ Stock Exchange with the end of September 2013 (i.e. including timestamps), there still remains a four-year hole in the timestamp supply.
And now a table.
File name Data Date
finalRes20170824000001.csv 23/ 8/2017
finalRes20170825000001.csv 24/ 8/2017
finalRes20171221000001.csv 20/12/2017
finalRes20171222000001.csv 21/12/2017
finalRes20171223000001.csv 22/12/2017
finalRes20171225000001.csv 24/12/2017
finalRes20171226000001.csv 25/12/2017
finalRes20171227000001.csv 26/12/2017
finalRes20171228000001.csv 27/12/2017
finalRes20190424000001.csv 23/ 4/2019
finalRes20190425000001.csv 24/ 4/2019
finalRes20190514000001.csv 12/ 5/2019
finalRes20190514000002.csv 13/ 5/2019
finalRes20190719000001.csv 18/ 7/2019
finalRes20190720000001.csv 19/ 7/2019
finalRes20200818000001.csv 17/ 8/2020
finalRes20201214000001.csv 13/12/2020
These seventeen files all contained data for one day only (and that not the date as might be inferred from the file name) and all lacked the “F” code (the “Run_Type”) at the end of each line that is present in all the other files. Irregularity causes stumbles and is not helpful. It also provokes questions as to provenance.

Having assimilated the new data, the obvious next step is to compare them with the old data. Obviously, the timestamps cannot be compared, because they do not overlap.
Happily, most values are the same, though there are hundreds of discrepancies. The first appear in January 2009 and the 198 differences (of 5,952 values for the month) range from 1¢ to $88. The most extreme is offered for Haywards 60s via the new data having
HAY2201,17/05/2009,36,2.98,10.29,"",F
HAY2201,17/05/2009,37,.01,82091.52,"",F
HAY2201,17/05/2009,38,19.77,4.02,"",F
while the old data offers (with non-null timestamps)
HAY2201 ,17/05/2009,36,.21,3.76,19/05/2009 10:30,F
HAY2201 ,17/05/2009,37,.01,4,19/05/2009 10:30,F
HAY2201 ,17/05/2009,38,5.14,4.02,19/05/2009 10:30,F
That is, the new data has a leap to 82091·52 unsupported by nearby values, a difference of 82087·52 from the old data, which stays steady. The next largest differences are around 2500 for both 6s and 60s on 13/2/2009. There are some more differences in the hundreds and the tens, but most are single-digit. A full schedule would be overwhelming: there are 2,900 differences between the four sets (pairs of values for both Haywards and Benmore) of 444,184 values over 1/10/1996 to 31/1/2022, their average being under a cent (thanks to all the zero differences) except for the Haywards 60s being 19·21¢ thanks to the single large difference. The averages of the 2,900 non-zero differences in the four groups range from 62¢ to $115, again due to the single large difference.
Other differences are due to data being supplied with different formats, for instance,
BEN2201,16/06/2019, 3, .50,.13,F (first version, no timestamps since September 2013)
2019-06-16,3,SI,.50,.13,96.552,118.503 (with FIR(MW) and SIR(MW) data, thus four values per line)
BEN2201,16/06/2019, 3,0.5,0.12557, 17/06/2019 07:38:36,F (the latest, with timestamps)
So, was the actual value 0·13 or, 0·12557? This provides room for gamesmanship.
More substantial changes are scattered about in 2009 and in December 2019, and for Haywards on Saturday 26/3/2011.
Here is a rough histogram of the non-zero differences from -10 to +10, with a logarithmic count scale: unfortunately, alignment via multiple spaces is damaged by being converted to single spaces, and a fixed-spacing typeface would help as well, but these are not on offer.
1/1/1996-5/3/2022 Haywards 60s difference.
<-10·00 24 (-74·49 to -10·80)
-10·00 2 2|###
-9·50 1 3|
-9·00 1 4|
-8·50 1 5|
-8·00 1 6|
-7·50 2 8|###
-7·00 4 12|######
-6·50 1 13|
-6·00 1 14|
-5·50 4 18|######
-5·00 3 21|#####
-4·50 3 24|#####
-4·00 8 32|#########
-3·50 65 97|#########+########
-3·00 13 110|#########+#
-2·50 11 121|#########+
-2·00 67 188|#########+########
-1·50 44 232|#########+######
-1·00 39 271|#########+######
-0·50 155 426|#########+#########+##
0·00 141 567|#########+#########+#
0·50 27 594|#########+####
1·00 12 606|#########+#
1·50 18 624|#########+###
2·00 12 636|#########+#
2·50 7 643|########
3·00 25 668|#########+####
3·50 8 676|#########
4·00 4 680|######
4·50 0 680|
5·00 2 682|###
5·50 6 688|########
6·00 0 688|
6·50 2 690|###
7·00 2 692|###
7·50 0 692|
8·00 1 693|
8·50 0 693|
9·00 0 693|
9·50 3 696|#####
>10·00 20 (10·20 to 82087·52)

These differences are too big to ignore. How can they happen? The exemplified old data in 2009 was as supplied by the NZ Stock Exchange, and the new data are also supplied by the NZ Stock Exchange? Yet these historical data are different. A timestamp would assist in determining the sequence of events and which value sould be taken as being more “final”, but alas. This layout could also accommodate an explanatory remark after the F, but alas, there is none.

Now for the larger Final Prices data collection. Via https://www.emi.ea.govt.nz/Wholesale/Dat... came some1,539 files of size from 62KB to 222,143KB when expanded. These were made into one file at 5·33GB then converted to 306 monthly files, which with the 000, and “” suppressed, occupied 4·9GB with 109,579,117 records, 91,416,917 with spurious “000” (84%) and 10,368,156 with ,””,
Yet again, endless petty differences in format swamp any attempt at a simple file comparison so a full assimilation is necessary, and oddities appear.
For each day of February 2003 data are first presented lacking a value for half-hour one, then much later, the day’s data are re-supplied with hh1 included, and then, yet again. For HAY2201 as an exemplar,
Record: Content.
306629: HAY2201,01/02/2003,1,75.26,2003-02-02 07:50,F
308121: HAY2201,01/02/2003,1,75.26,2003-02-02 07:50,F
For half-hour two, there are three presentations:
85541: HAY2201,01/02/2003,2,54.00,2003-02-02 07:50,F
306630: HAY2201,01/02/2003,2,54,2003-02-02 07:50,F
308354: HAY2201,01/02/2003,2,54,2003-02-02 07:50,F
And so on for the different series names, for every day of February 2003.
Notice that the timestamps are the same, as are the values. Why these stutters?
In the combined file, the three are placed as follows:
27347339: HAY2201,01/02/2003,2,54.00000,2003-02-02 07:50,F
30304428: HAY2201,01/02/2003,2,54,2003-02-02 07:50,F
30306152: HAY2201,01/02/2003,2,54,2003-02-02 07:50,F
The combined file has not had the three trailing zeroes squeezed out. It was combined directly from the supplied data files, and the three records can be found in them, as follows:
rec. 2 of final20030201280002.csv (has data only for HAY2201 for February 2003),
rec. 382 of final20030201280013.csv
rec. 1147969 of final20030101000001.csv (whence the 54.00000)
Amusingly enough, the data file for February is larger instead of smaller than its neighbours: 16,214KB, 29,118KB, 16,478KB respectively for January, February, March 2003.
The obvious scheme would be to remove the duplicate records, but this would require a special procedure. A simple ploy is to sort the data file and then remove duplicate records as they will now be adjacent. Except, note that one of the three values is 54.00 (originally, 54.00000), while the other two are 54: while equal as numbers, these are not identical texts. Removing the literal duplicates reduced the record count from 621,124 to 345,368. Converting .”.00,” to “,” reduced it to 340,847 and just for completeness, converting ”.d0,” to “.d,” (about 3,000 occurences for each digit), reduced it to 313,168. Then it transpired that there were some sixteen names with values of “,0.01,” versus “,.01,” for half-hour 21 of the 22’nd.: BAL0331, BDE0111, BWK1101, EDN0331, GOR0331, HWB0331, HWB2201, INV0331, INV2201, MAN2201, NMA0331, PAL0331, ROX1101, ROX2201, SDN0331, TWI2201. Changing the equal numbers to have equal texts (“,0.01,” to “,.01,”) and removing the resulting duplicates left 313,152 records, and the file is now no longer embolismic but 14,658KB, smaller than its neighbours as should be the case for Februaries.
While all this has its entertaining aspects, a coherently administered data system should not be capable of performing this sort of floundering for something so simple as producing data files. Sticking to an established format would help too.
Contrariwise, instead of superfluous data, there are also absences. The timestamps vanished for 1/1/1999 to 31/12/2000 for the new data files declared to be direct from the NZ Stock Exchange, yet, the original data files from the NZ Stock Exchange have them. For example, the first line of file fp1999m01.csv (originally Final_1999_01.csv) reads
ABY0111,01/01/1999,1,53.82,10/02/1999 12:16:56,F
ABY0111,01/01/1999, ,53.82,,F
is the corresponding new record for the same data.
On the other hand, the original data files no longer were supplied with timestamps after that for September 2013, while the new data files continue to supply them. Thus, some of the new data files lack timestamps that had originally been supplied, but later files supply imestamps past the end of the original supply. It would be nice if the new data files could simply replace the old data files, but that would mean losing timestamps as well as gaining timestamps. A special merge process will be needed instead.
Two spurious records appear, supplying the only data for those names for that day:
HWB0332,24/04/2004,36,0.00,25/05/2004 12:04:13,F
ROB1101,29/04/2004, 1,0.00,30/04/2004 15:31:41,F
But these appear in the originally-supplied data also, and would be better candidates for omission. The value supplied, zero, presumably is not the result of the price concoction process but instead represents “No result” and such zero values are common sequences. This is unfortunate because their presence will disrupt any attempt at analysing the distribution of price values, which is surely a matter of interest.
Conversely, some 192 names supplied in the old data fail to appear in the re-supplied data: their date span is 21/7/2009 to 19/8/2009, just thirty days each. The names are ABY1101, ALB2201, APS0661, ASB2201, ASY0661, BAL1101, BDE1101, BEN0111, BEN2701, BEN2702, BLN1101, BPC1101, BPE1101, BRB2201, BRK2201, BRY2201, CBG1101, CBG1102, CLH0661, CML1101, CML1102, CML2201, CML2202, CPK1101, CPK1102, CPK1103, CST1101, CUL2202, CUL2203, DAR0501, DOB1101, DVK1101, DVK1102, EDG1101, EDG1102, EDG2201, EDN1101, FHL1101, FKN1101, FKN1102, GFD1101, GFD1102, GIS1101, GLN2201, GNY1101, GOR1101, GOR1102, GYT1101, GYT1102, HAM1101, HAY2701, HAY2702, HEN1101, HEP1101, HIN1101, HPI2201, HPI2202, HUI1101, HWA1001, IGH1101, INV1101, KAI0661, KEN1101, KEN1102, KIK1101, KIN1101, KKA0661, KMO1101, KMO1102, KMO1103, KPU1101, KPU1102, KTA1101, KWA1101, KWA1102, LIV2201, LTN2202, MCH1101, MGM1101, MHO1101, MHO1102, MHT1102, MHT1103, MLG1101, MLG1102, MNI1101, MOT0661, MOT0662, MPE0501, MST1101, MTM1101, MTN1101, MTN1102, MTO1101, MTO1102, MTR1101, NMA2201, NPK1101, NSY2201, OAM1101, OAM1102, OHW2201, OKE1101, OKI0331, OKN1101, ONG1101, OPI2201, OPI2202, OPK1101, OPK1102, OTA1103, OTA1104, OTA1105, OTA1106, OTA1107, OTA1108, OTA1109, OTA110A, OTI0661, OWH1101, PAK1101, PAL1101, PEN0222, PEN2201, PIE1101, PNI1101, PNI1102, PPT2201, PRM1101, PRM1102, RDF1101, RDF1102, ROT1102, RTO1101, RTR1101, SDN2201, SFD1101, SPN0662, STK1101, STU1101, SVL2201, SVL2202, TAK2201, TAK2202, TGA1101, TIM1101, TIM2201, TIM2202, TKA1101, TKH0501, TKR1101, TMH2201, TMI1101, TMK1101, TMK1102, TMN2201, TNG2201, TRK1101, TRK1102, TWH2201, TWT2201, TWZ2201, UHT1101, UTK0661, WAI1101, WDV1102, WEL1101, WEL1102, WES1101, WES1102, WES1103, WGN1101, WHU1101, WIL1101, WIL2201, WIR1101, WIR1102, WKO1101, WMG1101, WMG1102, WPR2202, WPR2203, WPT1101, WPT1102, WPW1101, WPW1102, WRA1101, WRA1102, WTK1101, WTK1102, WTU2201, WTU2202.
Comparisons now are only slightly hindered by names HTI1101, KIK2201, LTN2201, WAI0501, WVY1101 having started after September 2013, but othersise, where data are available over 1/10/1996 to 30/9/2013, all values are equal. This is a relief.
The timestamps also match (where both values exist) ... except for a difference of 319 days. This is of course for the notorious day of adjustment, Saturday 26/3/2011. The originally-supplied data file Final Prices – 032011.csv was suppressed and replaced by a file from which data for the 26’th had been removed, lest its presence prove “confusing”. In early 2013 a file FP_Price_20110326.csv arrived with the missing day’s values, as revised; by then, timestamps were not being provided so 1/4/2013 sufficed. Then, via a later Official Information Act request, there came data with the original values (that had been censored) and a timestamp of 27/3/2011 seemed in order. An oddity was that those data included values for names CUL0661 and DOB1101, that had not been active in 2011. One wonders how they were concocted, and when – but, no timestamps.
Now the new data for March 2011 includes values for the 26’th, and offers a rather earlier timestamp of 17/05/2012 10:29:31 (for ABY0111) – perhaps the values were calculated then, but not approved for publication until much later. The file does not offer data for CUL0661 and DOB1101, as it shouldn’t. Alas, there is no sign of the censored original data for the 26’th, which would be distinguished by having earlier timestamps.
After September 2013, there are no old timestamps to compare, but the values can be compared where both the old and the new offerings align, and, they’re all equal ... except in December 2019. For instance, the old data file offers
ALB0331,03/12/2019, 1,69.98,F
But the new data file offers two values:
ALB0331,03/12/2019, 1,69.98,06/12/2019 10:27:06,F
ALB0331,03/12/2019, 1,15.72,06/09/2021 13:30:00,F
And then escalates to three!
ALB0331,07/12/2019, 1,69.64,09/12/2019 14:00:43,F
ALB0331,07/12/2019, 1,17.07,06/09/2021 13:30:00,F
ALB0331,07/12/2019, 1,17.07,22/09/2021 13:30:00,F
And again, the old data file, produced contemporaneously and before the revision date, has the first value. Changes extend from the third to the twenty-seventh of December.
At last, this is proper behaviour. Revised values (even if not actually changed) have their own timestamp, and the sequence can be followed. Having some explanatory remark following the F code would be even better, as one’s faith in the finality of the “Final” price data falters. However, since one cause would likely effect many changes, annotating every affected value with the same note would be tedious and would consume storage space. Perhaps instead the changes could be documented in a file stored nearby, called say Corrigenda.txt?
Why this is happening is unclear. The price concoction scheme may well be generating many provisional values, but, why are they leaking out into a file declared to deserve the appellation “final”? A revision two years later is one thing, but why a second revision a fortnight later? For another cause? Or is this some sort of floundering? And these revisions are not just for one day.

In Summary:
The new Final Reserve Price data files do supply timestamps, but only starting 22/8/2017, so there is a four-year gap from the end of the original supply which ceased with September 2013. Although the NZ Stock Exchange’s data files supplied timestamps up to September 2013, your data files from the NZ Stock exchange do not. There are also many small discrepancies in the values, and some are large. It is not clear which values should be preferred. The previously-supplied FIR (MW), SIR (MW) data remain lost.
The new Final Energy Price data files do supply timestamps, yet oddly, omit them for 1999 and 2000 even though they are present in the originally-supplied data. The new data files do not include the 192 extra names over 21/7/2009 to 19/8/2009 in the old data. Where the old and new data align, the timestamps match, as do the values, except for the notorious revision day of Saturday 26’th March 2011, and alas, the original values (with their original timestamps) that were censored do not appear. Data for February 2003 is over-supplied, but the surplus records can be extirpated as the values are equal even though their texts are not.
Many days in December 2019 have time slots with two, or even three values supplied, but with different timestamps. This is at least a demonstration of the value of timestamps, though appending explanatory remarks or offering a file with a log of changes would be even better.
Collating the new information with the old so as to gain information and not lose information will be a tedious process. Not just a simple matter of discarding the old data files in favour of the new.

Concentrating on the original request for data with timestamps, the re-issued new data files do provide additional timestamps but some timestamps have been lost as well and others remain missing.

So, partial progress.

Yours sincerely,
Nicky McLean

Link to this

From: OIA
Electricity Authority

Kia ora Mr McLean,

I refer to your emails of 7 March 2022 in which you queried the way the Electricity Authority (Authority) publishes data on the EMI website, in addition to the numerous requests we have received from you since 2013 and the conclusion of your employment with the Authority.

The Authority is committed to transparency and publishes many data sets as possible in various forms for ease of access. In doing so the Authority caters to many different parties through data sets that are of widespread interest and in formats that can be easily accessed by most.

Where possible, we have made some changes in response to your requests, but the Authority does not have extensive resources available to respond to repeat individual data requests. Furthermore, this diverts staff from working on key tasks and continuous improvement for all our stakeholders
The information you previously requested under the Official Information Act 1982 (the Act) has been provided to you and there is no obligation under the Act on the Authority to create information or format the information in a way preferable to you.

The Authority is dedicated to continuously improving the way we collect and publish data. Our improvements are determined by priorities at the time, noting the need to service a variety of stakeholders interested in the information we produce. We will continue to make improvements based on sector-wide benefit.

Yours sincerely,

Phil Bishop
Manager - Data and Information Mangement
Electricity Authority - Te Mana Hiko
www.ea.govt.nz

show quoted sections

Link to this

Things to do with this request

Anyone:
Electricity Authority only: