This is an HTML version of an attachment to the Official Information request 'Aid Programme documents'.

 
 
 
 
 
 
 
 
 
 
 
 
the  
 
Act 
Annual Assessment of Results (AAR) 
 
Report of the AAR for the New Zealand Aid Programme in the Ministry of 
Foreign Affairs and Trade for AMAs and ACAs completed in 2017/18 
under 
 
 
 
 
 
Final Report 
 
Information 
 
 
 
 
 
Released 
 
 
Prepared for // 
IOD PARC is the trading name of International 
Organisation Development Ltd// 
Insights, Monitoring & Evaluation 
Official  Omega Court 
Pacific and Development Group  
362 Cemetery Road 
New Zealand Aid Programme 
Sheffield 
Ministry of Foreign Affairs and Trade 
S11 8FT 
 
United Kingdom 
Date // 3 July 2019 
 
Tel: +44 (0) 114 267 3620 
www.iodparc.com 


 
Contents 
Acronyms ............................................................................................................................................................ 3 
Executive Summary ............................................................................................................................................ 4 
1. Introduction .................................................................................................................................................... 9 
2. Methodology ................................................................................................................................................. 10 
Limitations of the AAR .............................................................................................................................. 13 
3. Key findings .................................................................................................................................................. 14 
3.1 Robustness of effectiveness ratings ..................................................................................................... 14 
the  
3.2 Robustness of other DAC criteria ratings (ACAs only) ...................................................................... 21 
Act 
3.3 Assessment of qualitative analyses in AMAs and ACAs ..................................................................... 22 
4.  Summary of Findings ................................................................................................................................ 29 
5.  Conclusions ................................................................................................................................................ 29 
6.  Recommendations ..................................................................................................................................... 30 
under 
Appendix 1: Methodology ................................................................................................................................. 32 
Appendix 2: Terms of Reference ...................................................................................................................... 41 
Appendix 3: Distribution of sample ................................................................................................................. 43 
Appendix 4: Influence of scholarship activities on AMA robustness.............................................................. 46 
Information 
Appendix 5: Assessment Templates for AMAs and ACAs .............................................................................. 47 
Appendix 6: Interviewing script ....................................................................................................................... 49 
Released 
Official 


 
Acronyms  
ACA 
Activity Completion Assessment 
 
 
 
AMA 
Activity Monitoring Assessment 
AQP 
Activity Quality Policy 
ARF 
Activity Results Framework 
CCIs 
Cross Cutting Issues 
DAC 
Development Assistance Committee 
DP&R 
Development Planning and Results (team) 
M&E 
Monitoring and Evaluation 
the  
MFAT 
Ministry of Foreign Affairs and Trade 
Act 
MO 
Multilateral Organisation 
N/A 
Not Assessable 
ODA 
Official Development Assistance  under 
RMT 
Results Measurement Table 
SAM 
Scholarships and Alumni Management System 
UN 
United Nations 
VfM 
Value for Money 
Information 
Released 
Official 

 
 


 
Executive Summary  
Background  
New Zealand’s Aid Programme is delivered through investments (known as Activities) administered 
by the Ministry of Foreign Affairs and Trade (MFAT). The performance of individual Activities is 
reported through Activity Monitoring Assessments (AMAs) and Activity Completion Assessments 
(ACAs). Within AMAs and ACAs, Activity Managers rate performance on a five-point scale against set 
criteria. Upon completion of an Activity, ACAs record ratings of effectiveness, relevance, impact, 
efficiency, and sustainability. AMAs record ratings of effectiveness of on-going Activities. An 
important element of AMAs and ACAs is that they identify issues that may affect the implementation 
and results of activities, as well as recommendations and lessons to improve activities.  
For many Activities, AMAs and ACAs are the only formal MFAT assessment of their progress and 
performance. It is therefore important that MFAT has confidence in their robustness. The Annual 
Assessment of Results (AARs) assesses the robustness of ratings in AMAs and ACAs. The AAR also 
the  
assesses whether AMAs and ACAs are helpful to improve Activities  
Act 
According to Activity Managers who were interviewed for this AAR, AMA and ACA templates have 
become more user-friendly over the last five years through revisions based on outcomes of AARs, 
amongst others. AMAs and ACAs are becoming increasingly embedded within MFAT’s performance 
management system. Completion rates for AMAs and ACAs have increased substantially from 59% in 
2013/14 AAR to 88% in 2017/18, the highest completion rate to date.  
This is the fourth (AAR)1, providing an independent quality assurance of a randomly selected 
under 
representative sample of 66 AMAs and 36 ACAs, drawn from a total of 141 AMAs and 55 ACAs that 
were completed between 1 July 2017 and 30 June 2018. Of those selected, 63 AMAs and 28 ACAs 
were eventually reviewed.2 AMAs and ACAs with non-robust effectiveness ratings were progressed to 
an interview with the relevant Activity Manager, following which reviewers finalised their assessment.  
A proportionally low number of interviews conducted with Activity Managers to review non-robust 
effectiveness ratings is a major limitation of this AAR3. To enhance comparability with previous 
AARs, final effectiveness ratings were adjusted to allow for the low interviewing rate in the current 
AAR.4  
Information 
Summary of Findings 
Robustness of Effectiveness Ratings 
Released 
To have general confidence in the robustness of Activity Managers’ effectiveness ratings across all 
AMAs and ACAs, MFAT would expect at least 75% of the assessed ratings to be robust. 
MFAT can be reasonably confident in the robustness of AMA effectiveness ratings, especially for 
short- and medium-term outcome ratings. Based on the adjusted post-interview ratings, it is 
Official 
encouraging that the robustness of short-term outcome ratings in AMAs meets the 75% confidence 
threshold, while that of medium-term outcome ratings is just under, at 74%. The robustness of output 
ratings still trails at 64%.    
                                                           
1 Three previous AARs were conducted in 2015, 2016 and 2018 for AMAs and ACAs completed between 1 July 2013 – 30 June 2014; 
between 1 July 2014 – 30 June 2015; and between 1 July 2016 – 30 June 2017, respectively. 
2  The main reason for the differences between selected and reviewed totals is that Activity-related documents could not be provided in 
time to be included in the review. 
3 The interview rate for the current AAR (33%) is much lower compared to previous AARs (86% in the inaugural AAR; then 92% and 84% in 
the two subsequent AARs).  
4  The adjustment for each result area is based on the average percentage change in post-interview robustness of effectiveness ratings 
across the previous three AARs and applying it to the pre-interview rating in the current AAR. Both the initial post-interview robustness 
ratings and the adjusted post-interview ratings have limitations. The initial post-interview ratings are not informed by a proportional 
number of interviews compared to previous years and therefore are likely to be understated. The adjusted post-interview ratings are 
indicative rather than definitive, but are likely to be closer to what the actual results may have been. 

 
 


 
The robustness of output and medium-term outcomes ratings for AMAs in the current AAR is lower 
compared to previous AARs, but the robustness of short-term outcome ratings is higher. However, 
there has been no statistically significant changes in the robustness of AMA effectiveness ratings in 
the current AAR compared to both the inaugural AAR and the 2016-17 AAR.5  
Based on adjusted post-interview ratings, the robustness of short- and medium-term outcome ratings 
in ACAs, at 77%, exceeds MFAT’s confidence threshold, but the robustness of output ratings trails at 
72% - compared to the inaugural AAR, the decrease in robustness of ACA output ratings is likely to be 
statistically significant6. When an Activity ends, it is especially important to know whether it has 
achieved its results or not.  It is therefore encouraging that MFAT can be reasonably confident about 
the robustness of short- and medium-term outcome ratings in ACAs.  
In both AMAs and ACAs, effectiveness ratings of 1 and 2 (inadequate ratings) are more robust 
compared to ratings of 4 and 5 (good and very good ratings). This suggest that Activity Managers 
may be inclined to over-rate effectiveness of Activities, which is consistent with all previous AARs. 
As in previous AARs, robustness of effectiveness ratings for both AMAs and ACAs increased 
substantially after interviewing. The main reason is that Activity Managers may not document all the 
evidence that informs ratings in the AMA or ACA. Contextual information and understanding are 
the  
major factors that influence Activity Managers’ ratings, but this is often not explained in AMAs and 
ACAs. Activity Managers may assess an Activity’s progress and performance with due consideration 
Act 
of challenges in the implementing context, but do not always document this in AMAs and ACAs. 
Therefore, many AMAs and ACAs still do not capture the evidence that supports effectiveness ratings 
in sufficient detail to provide stand-alone records of an Activity’s effectiveness. 
Quality of Results Management Tables and influence on Effectiveness Ratings 
under 
Based on references to Results Management Tables (RMTs) as sources of evidence for completing 
AMAs and ACAs, Activity Managers are increasingly relying on RMTs to monitor and report 
Activities’ progress. Ratings were often substantiated by references to movements in RMT indictors, 
or progress in relation to baselines and targets. An increasing number of Activity Managers are also 
proposing appropriate actions to address identified shortcomings of RMTs.  
However, RMT shortcomings continue to impede the quality of AMAs and ACAs. In the current AAR, 
15% of AMAs (n=9) were based on RMTs assessed as adequate, compared to 68% (n=40) where 
RMTs were found to have shortcomings. Many RMTs are not updated regularly and may not reflect 
Information 
the evolving reality of dynamic Activities. Other major shortcomings of RMTs include the absence of 
baselines, targets and data to monitor and report progress. It appears that completing AMAs and 
ACAs for complex Activities, for example multi-county and multi-donor Activities, as well as 
Activities funded through Budget support, could be challenging because RMTs for these Activities 
Released 
are not straight-forward. Per MFAT guidelines, AMAs for funding to Multilateral Organisations draw 
on the Strategic Plans and annual reports of the organisations themselves. Their annual reporting is 
therefore not based on an MFAT RMT and does not provide a clear-cut fit with AMA assessment 
criteria. 
Official 
Robustness of other DAC Criteria ratings in ACAs 
MFAT guidelines for the assessment of the DAC criteria of relevance, efficiency, impact and 
sustainability were revised in 2017 and incorporated in AMA and ACA templates. However, the 
reviewers based their assessment of the robustness of ratings on outdated guidelines in Annex 1 of 
the 2015 Aid Quality Policy (AQP). With this limitation in mind, the robustness of DAC criteria 
ratings in ACAs has predominantly improved compared to previous AARs but remains below the 
75% confidence threshold. More specifically: 
                                                           
5 Assessment of statistical significance is limited by availability of detailed data from the inaugural AAR and the adjustment approach 
applied in 2017/18. 
6 Assessment of statistical significance is limited by availability of detailed data from the inaugural AAR and the adjustment approach 
applied in 2017/18. 

 
 


 
  the robustness of relevance, efficiency and sustainability ratings has improved, following 
almost consistent decreases over the previous three AARs.  
  the robustness of impact ratings is higher compared to the first two AARs but has decreased 
slightly compared to the previous AAR. A key challenge in assessing an Activity’s impact at 
completion appears to be the absence of baseline data. 
Bearing in mind that that ratings of these criteria were not discussed during interviews, and 
despite encouraging increases in robustness of sustainability and impact ratings, confidence in 
these ratings would still be somewhat lacking. 
Cross-Cutting Issues 
While the assessment of Cross-Cutting Issues (CCIs) remains a challenging aspect of AMAs and 
ACAs, it is encouraging that the proportion of ACAs containing ‘adequate’ or better analyses of CCIs 
has increased substantially over time. However, the proportion of CCI analyses in AMAs that 
received ‘good’ or ‘very good’ ratings remains small and has decreased since the previous AAR. This 
suggests that analyses provided by Activity Managers may still lack evidence, and/or depth and 
the  
insight, to warrant high ratings.  
Act 
Actions to Address Issues (AMA) and Lessons Learned (ACAs) 
In AMAs, the overall quality of proposed actions to enhance Activities’ performance has increased 
over successive review periods.  Compared to the inaugural AAR, the proportion of AMAs that 
identified lessons assessed as inadequate decreased by 13%. At the same time, the proportion of 
lessons assessed as ‘good’ or ‘very good’ has also decreased slightly (8% from baseline), suggesting 
under 
that actions to strengthen on-going Activities are pooling in the ‘adequate’ category. 
In ACAs, lessons to inform future Activities remain of a relatively high quality – 82% of ACAs 
identified meaningful, useful lessons to strengthen future activities. This is 19% more compared to the 
inaugural AAR, but 14% less that the 2016-17 AAR. It is very encouraging that the proportion of ACAs 
that did not identify any lessons, or identified generic, unsubstantiated lessons, has decreased by 
51% compared to the inaugural AAR.  
Drawing on AMAs and ACAs to improve Activities 
Information 
The extent to which AMAs and ACAs can be used to improve Activities considers whether they 
articulate a plausible, evidence-based story of an Activity’s progress and performance, and then 
identify key issues and helpful recommendations/lessons to strengthen the Activity (or Activities in 
general) going forward. The assessment is based on different criteria to the effectiveness ratings and 
Released 
therefore results may diverge.   
Despite some missed opportunities to identify actions or lessons that could improve Activities, MFAT 
can be cautiously confident that both AMAs and ACAs can be drawn on to improve Activities. 
Qualitative elements in the majority of AMAs and ACAs provide insightful assessments across a range 
Official 
of issues, and they identify meaningful actions and lessons to inform activity improvement.  
  The proportion of AMAs and ACAs that cannot be drawn on to improve activities has 
decreased steadily, but so has the proportion that are considered highly informative. This 
has resulted in a net increase the proportion that are considered ‘adequate’ in documenting 
information that can be drawn on to improve Activities. Therefore, most AMAs and ACAs 
contain helpful information that can be drawn on to improve Activities.  
   A relatively large proportion of AMAs (20%, or six AMAs) for Activities with large whole-of-
life costs ($5 million and more) did not provide helpful information to improve the 
associated Activities. This can be ascribed partly to the fact that most complex Activities, as 
well as funding of Multilateral Organisations, have large budgets but there is limited scope 
for proposing ways within MFAT’s control to strengthen them.  

 
 


 
Conclusions  
1.  Despite improvements in qualitative aspects of reporting, AMAs and ACAs still do not provide 
sufficiently comprehensive or stand-alone records of Activities’ progress. As in previous AARs, 
Activity Managers draw on evidence from a range of sources to assess the effectiveness of 
Activities but tend not to document all this evidence in AMAs and ACAs. If the evidence is not 
comprehensively documented, the loss of institutional knowledge leaves substantial gaps, 
especially where staff turnover is high. When these gaps build up year-on-year, new Activity 
Managers might find it challenging to complete insightful AMAs and ACAs, thereby jeopardising 
the robustness of AMAs and ACAs in the longer term. 
2.  Despite remaining challenges around the robustness of effectiveness ratings, AMAs and ACAs 
generally include helpful information to improve Activities. Providing helpful information to 
improve complex Activities, which often have high whole-of-life costs, are challenging since ways 
to improve these Activities may not be within MFAT’s full control. 
3.  So far, AARs have not revealed major statistically significant results related to AMA and ACA 
improvement over time. However, due to contextual factors such as the capacity and capability of 
the  
Activity Managers, incentives, etc. in an environment of relatively high staff turnover, MFAT 
does not expect to see statistically significant linear improvements over time. 
Act 
Recommendations 
1.  AMAs and ACAs should remain as essential building blocks of the Aid Programme’s performance 
management system. Activity Managers use AMAs and ACAs to reflect and assess the progress, 
performance and challenges of Activities and they serve as important repositories of institutional 
under 
memory and continuity during Activity implementation. Increasingly insightful and usable 
lessons and actions to address issues, if harnessed through a robust knowledge management 
system, could also prove valuable in strengthening Activities. 
2.  Ongoing training and technical support would be important to ensure that gains made in the 
robustness and usefulness of AMAs and ACAs are maintained and enhanced. Gradual 
improvements are becoming evident, but it would be important to address known challenges and 
strengthen capacity to maintain this positive momentum and to prevent the gains made from 
being lost.  
Information 
Continue to provide training and guidance for Activity Managers in Wellington and at Post 
(including locally-engaged staff) to ensure that they understand why and how to document the 
evidence base for AMAs and ACAs fully, yet concisely, to increase the proportion of AMAs and 
ACAs that provide stand-alone records of Activities’ progress and performance. This would be 
Released 
instrumental to lift the robustness and usefulness of AMAs and ACAs (and therefore their value 
as essential building blocks of the Aid Programmes performance management system) to a 
higher level.  
Training and support in the following priority areas could be considered:  
Official 
  Documenting consolidated evidence from several sources to justify effectiveness and 
DAC criteria ratings. 
  RMT quality and wider socialisation of RMTs as foundations of Activity design, 
monitoring and reporting, as well as dynamic tools for Activity improvement.  
  MERL expert assistance to support regular reviewing and updating of RMTs to ensure 
that they remain relevant and up-to-date. 
  Strengthening RMTs and monitoring of complicated and complex Activities, for example 
multi-donor and multi-country Activities, as well as Activities funded through budget 
support. 
  Improving consistency and coherence in AMAs and ACAs, including identifying issues 
that affect the progress and performance of Activities, and following this through into 

 
 


 
meaningful, evidence-based actions to improve on-going activities (in AMAs), or lessons 
relevant to comparable types of Activities when they complete ACAs. 
3.  Given the size of MFAT’s funding to Multilateral Organisations and the unique arrangements 
around their monitoring, it could be beneficial to tailor guidance for the AMAs of these Activities.   
4.  Provide support to Activity Managers to identify and perceptively address appropriate Activity 
cross-cutting markers:  
  Where a cross-cutting marker is identified as relevant, it should be dealt with 
consistently and perceptively throughout the design, monitoring and reporting of the 
activity, including in AMAs and ACAs. 
  Avoid including cross-cutting markers that are not relevant to an Activity in its 
AMA/ACA. 
5.  AARs should continue to be conducted on a periodic basis to monitor the effect of known 
enablers and constraints to the robustness and usefulness of AMAs and ACAs, as well as to 
identify emerging challenges and actions for their continuous improvement. A larger database 
will also enable meaningful trend analyses of the robustness and usefulness of AMAs/ACAs 
the  
across different sectors, programmes and budget levels.  
Act 
 
 
under 
Information 
Released 
Official 

 
 


 
1. Introduction 
Background 
New Zealand’s Overseas Development Assistance (ODA) is delivered through Activities administered 
by the Ministry of Foreign Affairs and Trade (MFAT). AMAs and ACAs are internal assessments of the 
Activities’ performance and provide forward-looking actions and lessons to improve Activities. They 
form the building blocks of the Aid Programme’s Performance System.  
  AMAs are completed annually for Activities expending over $250,000 per annum, or 
smaller Activities with a high-risk profile. They rate and describe effectiveness of on-going 
Activities, and provide a descriptive assessment of performance against relevance, 
efficiency, sustainability, cross-cutting issues, risk, Activity management and actions to 
address identified issues. 
  On completion of activities with a total expenditure over $500,000 ACAs are completed, 
the  
ideally within one month of receiving the final Activity Completion Report from the 
implementing partner.  ACAs rate and provide a narrative assessment of an Activity’s 
Act 
performance in relation to five Development Assistance Committee (DAC) criteria, namely 
relevance, effectiveness, efficiency, impact and sustainability; they also comment and 
provide analyses on cross-cutting issues, risk and Activity management, and identify lessons 
to improve future Activities. 
Completion rates for AMAs and ACAs have increased since 2013/14. The completion rate for 2014/15 
AMAs and ACAs was 79%, which was an increase of almost 20% from 2013/14. The completion rate 
under 
for 2016/17 AMAs and ACAs was 76%. AMAs and ACAs from 2017/18 had the highest completion rate 
ever, namely 88%.  
During interviewing, eight (of 11) Activity Managers noted that revisions of AMA and ACA templates 
over the last five years have made their completion easier, more useful and less time consuming. 
Three Activity Managers stated that they had completed training and knew where to access support 
if they required assistance with the completion of AMAs/ACAs. Two Activity Managers – one based 
at post and one in New Zealand – noted that some staff at Post were unaware/unable to access the 
same level of training/support that is offered to Wellington-based staff. Both stated that this could 
Information 
be improved by increased promotion of available support at Post and including the information in 
orientation training for locally-engaged staff, in particular.  
Purpose of the AAR 
Released 
For many Activities, AMAs and ACAs completed by Activity Managers are the only formal MFAT 
assessment of their progress and performance. Aggregated ratings data from AMAs and ACAs provide 
a snapshot of the performance of New Zealand’s ODA. It is important to have confidence in the 
robustness of these ratings. AARs assess the robustness of Activity Managers’ ratings of Activities’ 
effectiveness. It also assesses the overall usefulness of AMAs and ACAs, as well as the analysis of 
Official 
cross-cutting issues, actions proposed to address issues (AMAs), and of lessons learnt (ACAs) 
presented by Activity Managers.   
The purpose of the AAR is to: 
  assess the level of confidence that MFAT can have in the robustness of AMAs and ACAs 
  inform the Insights, Monitoring & Evaluation team’s efforts to strengthen the Aid Programme’s 
Performance System 
  provide input to the Aid Programme Strategic Results Framework.7 
                                                           
7  Level 3 of  the New Zealand Aid Programme Strategic Results Framework incorporates the following indicators pertaining to the 
quality of AMAs and ACAs: 
2.3 : Percentage of AMAs and ACAs rated 3 or higher on a scale of 1-5 reviewed against quality standards  
2.4 : Percentage of AMAs and ACAs rated 3 or higher on a scale of 1-5 reviewed against quality standards for Cross-cutting issues  

 
 


 
This is the fourth Annual Assessment of Results (AAR) that provides an independent quality 
assurance of AMAs and ACAs. It was conducted for AMAs and ACAs completed between July 2017 
and July 2018.8 It describes findings from a sample of 66 AMAs and 36 ACAs, drawn from a total of 
141 AMAs and 55 ACAs. Of those selected, 63 AMAs and 28 ACAs were eventually reviewed.9 
Effectiveness ratings in 34 AMAs and 18 ACAs were found to be inconsistent with documented 
evidence and 35 Activity Managers10  were identified for interviewing to clarify these ratings. 
Interviews were conducted with 11 Activity Managers, covering eleven AMAs and six ACAs. This 
constitutes an overall interview rate of 33%, which is much lower compared to previous AARs, and 
which is a major limitation of the current AAR. 
Structure of the Report 
The report starts with a brief overview of the AAR’s purpose (Section 1) and methodology, including 
limitations (Section 2). Section 3.1 deals with the robustness of effectiveness ratings in AMAs and 
ACAs, while Section 3.2 deals with the robustness of other rated criteria, namely relevance, efficiency, 
sustainability and impact (ACAs only). Challenges to the robustness of effectiveness ratings are also 
discussed, including the role of Results Management Tables (RMTs). Section 3.3 reflects on the 
assessed quality of non-rated criteria, namely Actions to address Issues Identified (AMAs), Lessons 
the  
Learned (ACAs) and the overall usefulness of AMAs and ACAs. The analysis of Cross-Cutting Issues is 
also discussed. The report ends with conclusions and recommendations in Sections 4 and 5 
Act 
respectively. 
2. Methodology  
under 
The methodology  for the AAR was developed in consultation with MFAT and has been refined over 
four successive AARs. A summary is provided here, with more detail in Appendix 1. 
Sampling 
The AAR is based on a statistically representative sample of AMAs and ACAs that met the following 
criteria: 
  it was submitted between July 2017 and July 2018 
  it was completed within 12 months after the assessment period ended. 
Information 
A total population of 141 AMAs and 55 ACAs met these criteria. A simple random sample of 66 AMAs 
and 36 ACAs (95% confidence levels, 10% confidence intervals). Ultimately, the AMA sample was 
reduced to 63 (that is 45% of AMAs) due to unavailability of information. To ensure consistency 
with previous AARs, AMAs for Scholarship Activities were excluded from all comparative 
Released 
analyses, meaning that comparative analyses are based on an AMA sample of 59.11 The ACA 
sample was reduced to 28 (51% of ACAs) for the same reason. The distribution of the samples 
according to sectors, programme categories and Whole-of-Life Budget Programme Approvals is 
illustrated in Appendix 3. 
There were no major changes to the quality reporting system during the review period. The use of 
Official 
revised AMA and ACA templates, which were introduced in June 2017, is well-established. Most 
AMAs (91% of AMAs) and ACAs (75% of ACAs) were completed in the revised templates. Due to 
the small number of AMAs and ACAs that were completed in the old templates, an analysis of the 
effect of template revisions on the robustness of AMAs and ACAs would not be meaningful.12 
                                                           
8  The previous three AARs were the inaugural AAR, conducted in 2015 for AMAs and ACAs completed between July 2013 and July 2014; 
the AAR conducted in 2016 for AMAs and ACAs completed between July 2014 and July 2015, and the AAR conducted in 2018 for AMAs 
and ACAs completed between July 2016 and July 2017. 
9 The main reason for the differences between selected and reviewed totals is that Activity-related documents could not be provided in time 
to be included in the review. 
10 Some Activity Managers were responsible for more than one AMA or ACA. 
11  The Scholarships Programme completed AMAs for the first time in 2014/15. Four Scholarship Activities were included in the 2016 
AAR and eight were included in the previous AAR. The four scholarship activities included in the current AAR sample account for 
3% of the AMA sample. 
12  Five AMAs and seven ACAs were complete using the old templates. 
10 
 
 


 
As in previous AARs, AMAs for Scholarship Activities are excluded from the main analysis because 
the robustness of scholarships AMAs have typically been significantly lower and have skewed overall 
AMA robustness results. In previous AARs, AMAs were prepared for individual Scholarship 
Activities. In the current AAR period, the approach to AMAs for the scholarship Activities have been 
revised to have a few integrated AMAs focused at a more programmatic level. Therefore, there was 
an over-arching AMA prepared for the Scholarship programme, as well as integrated AMAs for all 
Scholarship Activities at a country level. There was consistency of assessments and issues across the 
four integrated Scholarship AMAs that were reviewed. Therefore, compared to previous AARs, the 
influence of scholarship AMAs on the robustness of effectiveness ratings in the current AAR would 
have been less evident, but for consistency scholarships activities are excluded. 
Assessments  
Standard assessment templates for AMAs and ACAs were developed for the inaugural AAR and 
slightly amended over previous AARs without compromising comparability with the inaugural AAR. 
For the current AAR, two revisions were made to templates:  
  The rating scale for assessing RMTs in the AMA template was re-organised so that ratings 
the  
reflect a logical increase in quality from a rating of 1 to a rating of 5.  
  In previous AARs, an overall assessment of AMAs and ACAs focused on quality, based mainly 
Act 
on the robustness of evidence that informed the completion of the AMA/ACA. In the current 
AAR, the overall assessment considered the robustness of evidence, report coherence and the 
extent to which this evidence was used to propose helpful actions or lessons to improve 
Activities. It considers coherence between identified issues and proposed actions/lessons to 
address these issues.  
A desk review of the sampled AMAs and ACAs, as well as accompanying partner progress reports, 
under 
completion reports and the reports of independent evaluations, was conducted to assess the robustness 
of effectiveness ratings and, in ACAs, also the ratings for other DAC criteria. The assessment of 
robustness is based on the extent to which Activity Managers’ ratings were substantiated by the 
evidence and analysis presented in the AMA/ACA, as well as in supporting documentation provided to 
reviewers, in accordance with guidance in the Activity Quality Policy (AQP) and in the new AMA and 
ACA templates. To have confidence in the robustness of Activity Managers’ effectiveness ratings 
across all AMAs and ACAs, MFAT would expect at least 75% of the assessed ratings to be robust. 
Non-rated elements of AMAs and ACAs were subject to more qualitative assessments, based on 
Information 
tailored rating scales. These assessments are not subject to the 75% confident level. 
Ratings scales are in Appendix 1, while copies of the assessment templates are in Appendix 5. 
Interviews  
Released 
Effectiveness ratings in 34 AMAs (52%) and 18 ACAs (50%) were initially assessed as non-robust. 
Ideally, Activity Managers responsible for all these AMAs and ACAs should have been interviewed 
telephonically to clarify the evidence base for the effectiveness ratings. However, only eleven Activity 
Managers, covering only one-third of the identified AMAs and ACAs, were interviewed.13 Following 
Official 
interviews, a final assessment of the robustness of ratings was made.   
Interviewing resulted in an increase in the robustness of all effectiveness ratings for both AMAs and 
ACAs (see Table 1). Compared to before interviewing, the percentage of outputs assessed as robust 
after interviewing increased by 7% for AMAs and 7% for ACAs. After interviewing, the robustness of 
both short-term and medium-term ratings in AMAs increased by 5%, while the robustness of 
outcome ratings in ACAs increased by 7%. 
Where non-robust ratings were revised to robust after interviewing, Activity Managers could provide 
additional information or explanations to justify the ratings they had given. This evidence is not 
systematically documented in AMAs or ACAs, for example judgements formed from project visits 
and direct interaction with implementing partners and/or partner governments. Contextual 
                                                           
13  Of the 11 interviews that were conducted, four involved discussion of more than one AMA and/or ACA. 
11 
 
 


the  Act 
under 
Information 
Released 
Official 


 
Limitations of the AAR 
1.  The AAR is based on selected information on Activity performance. This includes AMAs, ACAs, 
the partner report(s) corresponding to the completion dates of AMAs/ACAs, as well as reports of 
independent evaluations, where available. Interviews with Activity Managers supplement the 
information base for some assessments. Other information that may affect ratings is not 
reviewed.  
2.  Comparisons of actual post-interview effectiveness ratings between the current and previous 
AARs should be made with caution.  
Both the initial post-interview robustness ratings and the adjusted post-interview ratings have 
limitations. The initial post-interview ratings are not informed by a proportional number of 
interviews compared to previous years and therefore likely presents and under-estimation of 
robustness of effectiveness ratings.  The adjusted post-interview ratings are indicative rather than 
definitive, but are likely to be closer to what the actual results may have been had a similar 
proportion of interviews compared to previous AARs been conducted. 
3.  The assessment of DAC criteria (including effectiveness) was based on Quality Criteria 
the  
Considerations outlined in Appendix one of MFAT’s 2015 Aid Quality Policy. However, Activity 
Managers likely based their ratings on the AMA/ACA Guidelines revised in 2017. This includes 
Act 
simplified requirements for Effectiveness, Relevance, Impact, Sustainability and Efficiency, and 
Cross-Cutting issues being addressed in the Effectiveness section rather than throughout all 
DAC criteria. This could affect the comparability of assessed ratings in the current AAR with 
those of previous AARs.   
4.  The sample size is a representative sample determined at 95% confidence level and a confidence 
interval of 10. Decreasing the confidence interval will increase the sample size and reduce the 
under 
margin of error. This will enable more precise estimates of AAR findings.14  However, for the 
2017-18 AAR, confidence intervals have only been calculated for the initial post-interview 
ratings and not the adjusted post interview ratings due to potential limitations in the approach. 
5.  Keeping interviews to 30 minutes each means that they focus on non-robust effectiveness 
ratings, as well as four process-related questions. Other non-robust ratings (e.g. for other DAC 
criteria in ACAs), or the analyses of CCIs, Lessons and Actions can generally not be discussed 
within the available time.  
6.  The AAR was carried out by two assessors, and pairs of assessors have also changed in 
Information 
subsequent AARs. Inter-assessor consistency is therefore a potential limitation. This was 
mitigated through (1) In-depth orientation and induction of assessors by an experienced Team 
Leader, who has been involved in all four AARs to date; (2) Assessor ‘calibration’ following the 
assessment of two AMAs and one ACA; (3) Cross-checks and investigation of inconsistent 
Released 
findings compared to previous AARs; and (4) Independent quality assurance of selected AMAs 
and ACAs completed by both assessors. 
 
Official 
                                                           
14 Taking initial outputs ratings in Annex 1 as an example, this means that we can say with 95% certainty that between 41% and 60% of 
AMAs will have effective output ratings, no matter how many different samples of the same size we draw form the total number of 
AMAs. Another way of saying it is that 51% of AMAs have robust output ratings, with a 19% margin of error.  If the sample size is 
increased, the confidence interval (and margin of error) will become smaller. 
 
13 
 
 


 
3. Key findings 
3.1 Robustness of effectiveness ratings  
For AMAs, Activity Managers rate effectiveness of Activities separately for outputs, short-term 
outcomes and medium-term outcomes. For ACAs, the effectiveness of Activities is rated for outputs, 
while a combined rating is given for short- and medium-term outcomes.  
The current AAR included an integrated AMA for multiple scholarship Activities, comprising 
scholarships for New Zealand based tertiary, Commonwealth, Regional Development 
Scholarships and In-Country English Language Training. It incorporated some, but not all the 
AMAs in the other three selected scholarship Activities, namely scholarships for Commonwealth, 
Regional Development, Pacific Development and Short-Term Training in Niue, Timor-Leste and 
Tuvalu, respectively. Compared to the previous AAR, the influence of scholarship AMAs on the 
robustness of effectiveness ratings in the current AAR has been similar, although less prominent. 
When AMAs for scholarship Activities were excluded from analyses, the robustness of output and 
the  
short-term outcome ratings increased, while that of medium-term outcomes decreased, but not as 
markedly as in the previous AAR.  
Act 
The quantitative analysis of results is based on the final, adjusted, post-interview assessment of the 
robustness of  ratings in AMAs and ACAs, excluding AMAs for Scholarship Activities. To be 
confident in the robustness of effectiveness ratings, MFAT would expect at least 75% of these 
ratings to be robust.  
Robustness of effectiveness ratings: AMAs  under 
The robustness of output ratings in the current AAR remains below the 75% confidence threshold and 
is also lower compared to previous AARs (Figure 1). The robustness of medium-term outcome ratings 
would be just below the 75% threshold – higher than the baseline and similar to the previous AAR.  
The robustness of short-term outcome ratings would be higher compared to the previous three AARs 
and, for the first time, it would meet the 75% threshold.  
Assessment of likely statistical significance between the adjusted 2017-18 results and both the 
inaugural AAR and the 2016-17 results indicated no statistically significant changes. 15  
Information 
This suggests that MFAT can be reasonably confident in the robustness of AMA effectiveness ratings. 
Released 
Official 
                                                           
15 Assessment of statistical significance is limited by availability of detailed data from the inaugural AAR and the adjustment of ratings 
applied in 2017-18. 
14 
 
 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


 
There are some differences in the shortcomings of RMTs found in the current AAR compared to the 
previous AAR. In the previous AAR, shortcomings of RMTs were mainly related to Outputs and 
Outcomes, e.g. outputs and outcomes were not clearly identified or distinguished; there were too 
many, or they were too complex (34 out of 69 RMTs, that is almost half, had such shortcomings). 
This was followed by challenges around Indicators and Targets (24 out of 69 RMTs, that is 35%), 
with challenges around data collection and reporting against RMT also being common (22 out of 69 
RMTs, that is 32%). In the current AAR, most shortcomings were also related to Indicators and 
Targets (32 out of 59, that is 54%) and Outputs and Outcomes (31 out of 59, that is 53%). However, 
compared to the previous AAR, substantially more challenges were associated with out-of-date 
RMTs and RMTs that were no longer fir-for -purpose, as well as shortcomings related to data 
collection and reporting.  There has also been an increase in the number of RMTs where baseline 
data were absent. 
In the current AAR, the most common shortcomings relate to outdated RMTs, or RMTs that were no 
longer fit-for-purpose (19 out of 59, that is 33%), followed by poorly identified outputs and/or 
outcomes (15 out of 50 RMTs, that is 25%, had shortcomings in this regard).  In the previous AAR, 
the most common specific RMT shortcomings were poorly identified outputs and outcomes, and 
poorly distinguished outputs and outcomes (17 out of 69, or 25%, and 13 out of 69, or 19%, 
the  
respectively). Additional shortcomings noted in the current AAR that were not found in the previous 
AAR relate to targets being over-ambitious, lack of overall clarity and inadequate 
Act 
presentation/layout of the RMT, as well as inadequate accuracy of data/monitoring. 
It appears that the development and updating of RMTs to ensure their relevance and use as dynamic 
frameworks for monitoring and strengthening Activities on an on-going basis are becoming 
increasingly challenging. Also, partner reporting against RMT indicators and targets is also falling 
short by a substantial margin. However, it is encouraging that an increasing number of Activity 
Managers are proposing appropriate actions to address the identified shortcomings of RMTs – of the 
under 
40 AMAs that had RMTs with identified shortcomings, 67% included appropriate recommendations 
to revise and/or strengthen the Activity’s RMT.  This could indicate that more Activity Managers are 
realising the value of robust RMTs and are keen to ensure they have fit-for-purpose to monitor 
Activities. More robust RMTs would be more useful as Activity monitoring tools and would likely 
encourage more coherent partner reporting against expected results.  
Training and technical assistance could be instrumental in enhancing the robustness of RMTs. 
Special attention could be given to the development and socialisation of RMTs as foundations of 
Activity design, as well as dynamic tools for Activity monitoring, reporting and improvement. 
Information 
Progress reporting for more complex Activities, or for Activities that do not require RMTs per MFAT 
policy might require special attention. 
Robustness of effectiveness ratings: ACAs  
Released 
Based on adjusted post-interview effectiveness ratings, the robustness of output ratings is 72% and 
that of short-and medium-term outcomes is 77% (see Figure 3). This brings the robustness of output 
ratings in line with the 2016-17 AAR (yet still lower compared to the baseline), while the robustness of 
short- and medium-term results now exceeds the 75% confidence threshold (but is also still lower 
compared to the baseline).  
Official 
Assessment of likely statistical significance between the adjusted 2017-18 results and both the 
inaugural AAR and the 2016-17 results indicates a potentially statistically significant decrease in the 
robustness of output ratings between the inaugural AAR and the current AAR.17  
When an Activity ends, it is especially important to know whether it has achieved its results or not.  It 
is therefore encouraging that MFAT can be reasonably confident about the robustness of short- and 
medium-term outcome ratings in ACAs.  
 
 
                                                           
17 Assessment of statistical significance is limited by availability of detailed data from the inaugural AAR and the adjustment approach 
applied in 2017-18. 
18 
 
 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


 
have a robust RMT, which enabled ratings to be substantiated by reporting against clearly 
identified results and indicators. The same applies to nine out of eleven ACAs (82%). 
  Compared to previous AARs, robust ratings in the current AAR appear to demonstrate greater 
confidence in evidence on the part of Activity Managers. Ratings were often substantiated by 
references to movements in indictors, or progress in relation to baselines and targets.  
In 24 AMAs (excluding scholarship AMAs) and five ACAs, N/A effectiveness ratings were 
assessed as robust – this means that the reviewers agreed that an Activity’s effectiveness could 
not be rated. Reasons for this included: 
  Shortcomings of RMTs. In 17 AMAs (including AMAs for Activities involving Multilateral 
Organisations and those funded through Budget Support, as well as multi-country and multi-
donor Activities) and two ACAs (including one Humanitarian Response Activity), the absence 
or shortcomings of RMTs made it impossible to assess effectiveness. 
  In five of 24 AMAs (21%), it was too early to assess progress towards outcomes. 
  In seven of 24 AMAs (29%), there was no data/evidence available to assess effectiveness, 
including three AMAs where baseline data were not yet available. Effectiveness in four ACAs 
could not be assessed due to lack of data/evidence. 
the  
3.2 Robustness of other DAC criteria ratings (ACAs only)  
Act 
The assessed robustness ratings for the DAC criteria of relevance, efficiency, impact, sustainability in 
ACAs is based on desk  reviews, as interviews focused on effectiveness ratings. Had the DAC ratings 
been discussed further at interview, there may have been some increase in their robustness in line 
with increases seen in previous AARs, where the ratings of these criteria were discussed during 
interviews.  
under 
The comparative robustness of  ratings for the DAC criteria across the four AARs is shown in Figure 
4. Compared to the baseline (inaugural AARs), the robustness of sustainability and impact ratings 
has increased. While that of efficiency has decreased by only one per cent, the robustness of 
relevance ratings has decreased by 16%. No rating has yet reached the MFAT confidence level of 
75%. Bearing in mind that that ratings of these criteria were not discussed during interviews, and 
despite encouraging increases in robustness of sustainability and impact ratings, MFAT cannot use 
the ratings of other DAC criteria in ACAs with confidence to inform decision-making. 
The main reason for non-robust relevance ratings is that they do not address all considerations 
Information 
outlined in the 2015 AQP, against which they were assessed. However, MFAT staff may have, 
appropriately, based their response on the AMA and ACA guideline Activity Quality Ratings for 
Completion Assessment 
revised in 2017, which is not consistent with the 2015 AQP.  Key changes 
include: simplified requirements for Relevance, Impact, Sustainability and Efficiency, and Cross-
Released 
Cutting issues being addressed in the Effectiveness section rather than throughout all DAC criteria. 
Therefore, aspects such as the relevance of modality and the mainstreaming of cross cutting issues 
across all criteria, are of lesser importance.  
A likely result of this would be an increase in the number (and proportion) or robust relevance, 
efficiency, impact and sustainability ratings – and the robustness of sustainability ratings would 
Official 
likely exceed the 75% confidence threshold. Reviewers based their assessment of the robustness of 
these ratings on a number of considerations that are no longer included in the revised guidance. Had 
assessments been based on the revised guidance, more ratings may have been assessed as robust. 
 
 
 
 
 
 
 
21 
 
 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


 
Drawing on AMAs and ACAs to improve Activities 
The extent to which AMAs and ACAs can be drawn on to improve Activities depends on whether they 
are based on evidence from more than one source (regardless of the robustness of effectiveness 
ratings); whether they tell a coherent, evidence-based story of progress; and whether they 
consistently identify meaningful actions (AMAs) or 
Box 2 
lessons (ACAs) to inform the improvement and future 
2017/18 AMAs on strengthening 
planning of Activities (rating scales are presented in 
Scholarship Activities 
Appendix 1). 
When assessed separately, all four 
Figure 8 suggests that there could be reason to be 
scholarship AMAs in the current AAR 
more confident in drawing on AMAs and ACAs to 
refer to actions that could strengthen 
improve Activities: 
monitoring, reporting and AMAs for 
scholarship Activities in the future. This 
  The percentage of AMAs that received an overall 
includes the development of a revised 
rating of 1 or 2 (meaning they do not contain 
RMT for scholarship activities, which 
consistent, evidence-based information to improve 
was being undertaken as part of a 
the associated Activity) decreased from 43% in the 
strategic evaluation of the scholarship 
inaugural AAR to 19% in the current AAR; that is a 
programme. The evaluation was 
the  
decrease of 24%.  
underway at the time the AMAs were 
being drafted. The three country 
Act 
The percentage of AMAs that received overall 
scholarship AMAs also refer to a tracer 
ratings of 4 and 5 (meaning that they contain highly 
study that was underway at the time the 
consistent, evidence-based information to improve 
AMAs were being drafted, as well as the 
the associated Activity) shows small decreases 
alumni strategy and a new Scholarships 
across successive AARs. In the current AAR, 22% of 
and Alumni Management System (SAM), 
AMAs received a rating of 4 or 5, which is a 
which are expected to strengthen 
monitoring and reporting against 
under 
decrease of 2% compared to the inaugural AAR. 
medium-term outcomes. It would be 
This coincided with an increase of 20% in the 
important to monitor the effect of these 
proportion of AMAs that contain fairly consistent, 
initiatives on the robustness and quality 
evidence-based information to improve the 
of scholarship AMAs in the future. 
associated Activities increased by 20% - up from 
33% in the inaugural AAR to 53% in the current 
AAR. This indicates that the proportion of AMAs that cannot be drawn on to improve Activities is 
decreasing in favour of those that can be drawn on to improve Activities, although there is room 
for improvement. 
Information 
  Over time, the proportion of ACAs that does not contain consistent, evidence-based information 
to improve Activities has reduced substantially (from 39% in the inaugural AAR to 11% in the 
current AAR – therefore a decrease of 28%), but the proportion provides highly consistent, 
evidence-based information to improve Activities has also decreased somewhat (from 37% to 
Released 
29% - therefore a decrease of 8%). The proportion that provides fairly consistent, evidence-based 
information to improve Activities has increased by 37% - up from 24% in the inaugural AAR to 
61% in the current AAR. This indicates that the vast majority of ACAs contain some information 
that could meaningfully be used to improve Activities in the future.  
Official 
 
 
 
 
 
 
 
 
26 
 
 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


 
4.  Summary of Findings  
1.  Anecdotally, interviews with Activity Managers conducted as part of the current AAR revealed 
that they have a positive attitude about AMAs and ACAs, as well as a good understanding and 
appreciation of their value. Fewer Activity Managers regard the completion of AMAs and ACAs as 
a compliance requirement and there was a clear sense that more Activity Managers view the 
completion of AMAs and ACAs as important opportunities to reflect analytically on the progress, 
performance and challenges of Activities and their on-going improvement. This is substantiated 
by relatively high AMA and ACA completion rates. 
2.  Using the adjusted post-interview robustness ratings, in AMAs, the robustness of output and 
medium-term outcomes ratings is lower compared to previous AARs, but the robustness of short-
term outcome ratings is higher, reaching the 75% confidence threshold. In ACAs, the robustness 
of short- and medium-term outcomes is just above the 75% confidence threshold.  
3.  References to RMTs as sources of evidence for completing AMAs and ACAs are increasing, 
indicating that Activity Managers are increasingly relying on RMTs to keep track of Activities’ 
the  
progress. The quality of RMTs appears to be improving. However, shortcomings of RMTs 
continue to hamper monitoring and reporting of results. Many RMTs are not updated and are no 
Act 
longer adequate to monitor progress of evolving Activities. Other major shortcomings of RMTs 
include the absence of baselines, targets and data to monitor and report progress.  
AMAs of funding for Multilateral Organisations appear to be especially challenging because the 
results frameworks and progress reporting of these Organisations do not always correspond to 
MFAT’s requirements. It appears that developing AMAs and ACAs for complex Activities, for 
example multi-county and multi-donor Activities, as well as Activities funded through Budget 
under 
support could also be challenging because RMTs for these Activities are not straight-forward. 
It is encouraging that an increasing number of Activity Managers are proposing appropriate 
actions to address the identified shortcomings of RMTs.  
4.  The identification and analysis of Cross-Cutting Issues remain a challenge for Activity Managers. 
Although the quality of analysis of CCIs in both AMAs and ACAs is showing improvement, the 
proportion of these analyses that received ‘good’ or ‘very good’ ratings remains small. There was 
also an increase in the proportion of AMAs and ACAs where CCI analyses received N/A ratings, 
suggesting that there either were no cross-cutting priorities identified in the AMA or ACA (i.e. CCI 
Information 
is classified as ‘not measured’ or ‘not targeted’), or there was very limited or no further elaboration 
on CCIs in the document..   
5.  Observed improvements in the quality of more analytical aspects of AMAs and ACAs are 
Released 
encouraging. Compared to the baseline, the proportion of ACAs that identified well-
substantiated, useful lessons increased by 19%, while proportionately more AMAs are shifting 
from ‘inadequate’ to ‘adequate’ quality as far as the identification of actions to strengthen 
Activities is concerned.  
6.  There is reason to be cautiously confident in the overall usefulness of AMAs and ACAs. The 
Official 
majority of AMAs and ACAs are of adequate usefulness. The proportions of AMAs and ACAs that 
are of inadequate usefulness have decreases substantially compared to the baseline – by 24% in 
the case of AMAs and by 20% in the case of ACAs. These appear to have shifted mainly from 
‘inadequate’ to ‘adequate’ usefulness. However, a small proportion of AMAs and ACAs appear to 
be slipping back from ‘good’ to ‘adequate’ usefulness.   
5. Conclusions 
1.  Despite improvements in qualitative aspects of reporting, AMAs and ACAs still do not provide 
sufficiently accurate, complete or stand-alone records of the activity. As in previous AARs, 
Activity Managers draw on evidence from a range of sources to assess the effectives of Activities 
but tend not to document all this evidence in AMAs and ACAs. If evidence is not comprehensively 
29 
 
 


 
documented, the loss of institutional knowledge leaves substantial gaps, especially where staff 
turnover is high. When these gaps build up year-on-year, new Activity Managers might find it 
challenging to complete insightful AMAs and ACAs, thereby jeopardising the robustness of AMAs 
and ACAs in the longer term. 
2.  Despite remaining challenges around the robustness of effectiveness ratings, AMAs and ACAs 
generally include some consistent, evidence-based information that could be drawn on to improve 
Activities. Providing useful information to improve complex Activities, which often have high 
whole-of-life costs, are challenging since ways that could improve these Activities may not be 
within MFAT’s full control. 
3.  So far, AARs have not revealed major statistically significant results related to AMA and ACA 
improvement over time. Some trends are beginning to emerge, while valuable lessons have 
contributed to a much-refined methodology. While MFAT does not expect linear improvement 
due to contextual factors such as organisational capacity and incentives, over a longer time period 
statistically significant changes may become evident.   
6. Recommendations 
the  Act 
1.  AMAs and ACAs should remain as essential building blocks of the Aid Programme’s performance 
management system. Activity Managers use AMAs and ACAs to reflect and assess the progress, 
performance and challenges of Activities and they serve as important repositories of institutional 
memory and continuity during Activity implementation. Increasingly insightful and usable 
lessons and actions to address issues, if harnessed through a robust knowledge management 
system, could also prove valuable in strengthening Activities. 
under 
2.  Ongoing training and technical support would be important to ensure that gains made in the 
robustness and usefulness of AMAs and ACAs are maintained and enhanced. Gradual 
improvements are becoming evident, but it would be important to address known challenges and 
strengthen capacity to maintain this positive momentum and to prevent the gains made from 
being lost.  
Continue to provide training and guidance for Activity Managers in Wellington and at Post 
(including locally-engaged staff) to ensure that they understand why and how to document the 
evidence base for AMAs and ACAs fully, yet concisely, to increase the proportion of AMAs and 
Information 
ACAs that provide stand-alone records of Activities’ progress and performance. This would be 
instrumental to lift the robustness and usefulness of AMAs and ACAs (and therefore their value 
as essential building blocks of the Aid Programmes performance management system) to a 
higher level.  Released 
Training and support in the following priority areas could be considered:  
  Documenting consolidated evidence from several sources to justify effectiveness and 
DAC criteria ratings. 
  RMT quality and wider socialisation of RMTs as foundations of Activity design, 
Official 
monitoring and reporting, as well as dynamic tools for Activity improvement.  
  MERL expert assistance to support regular reviewing and updating of RMTs to ensure 
that they remain relevant and up-to-date. 
  Strengthening RMTs and monitoring of complicated and complex Activities, for example 
multi-donor and multi-country Activities, as well as Activities funded through budget 
support. 
  Improving consistency and coherence in AMAs and ACAs, including identifying issues 
that affect the progress and performance of Activities, and following this through into 
meaningful, evidence-based issues to improve on-going activities (in AMAs), or lessons 
relevant to comparable types of Activities when they complete ACAs. 
30 
 
 


 
3.  Given the size of MFAT’s funding to Multilateral Organisations and the unique arrangements 
around their monitoring, it could be beneficial to tailor guidance for the AMAs of these Activities.   
4.  Provide support to Activity Managers to identify and perceptively address appropriate Activity 
cross-cutting markers:  
  Where a cross-cutting marker is identified as relevant, it should be dealt with 
consistently and perceptively throughout the design, monitoring and reporting of the 
activity, including in AMAs and ACAs. 
  Avoid including cross-cutting markers that are not relevant to an Activity in its 
AMA/ACA. 
5.  AARs should continue to be conducted on a periodic basis to monitor the effect of known 
enablers and constraints to the robustness and usefulness of AMAs and ACAs, as well as to 
identify emerging challenges and actions for their continuous improvement. A larger database 
will also enable meaningful trend analyses of the robustness and usefulness of AMAs/ACAs 
across different sectors, programmes and budget levels.  
the  
 
 
Act 
under 
Information 
Released 
Official 
31 
 
 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 



 
basis of which the final rating was decided. Each reason was then coded to a common 
theme/reason and a frequency analysis of theses/reasons was conducted. 
 
Information on Activity’s Managers’ process when completing AMAs/ACAs, their 
opinion of the AMA/ACA templates and guidance, their view of the robustness of 
evidence available to inform the completion of AMAs and ACAs, as well as which 
aspects of AMAs/ACAs they found especially challenging (and why) was documented. A 
content analysis of the documented information was conducted to identify 
commonalities and emerging themes.  
Sample Revision for AAR Quantitative Analysis 
Quantitative Analysis 
Comparative assessments of results between the four AARs was completed. Robustness of 
effectiveness ratings at output, short term outcome, medium term outcome for AMAs and ACAs 
as well as relevance, efficiency, impact and sustainability for ACAs was compared and presented 
in charts. For the remaining criteria (Cross-cutting issues (AMAs and ACAs) , Overall quality 
ratings (AMAs and ACAs), Actions to address issues (AMAs only), Activity Results Framework 
the  
(AMAs only), and Lessons learned (ACAs only)), the assessed ratings were compared by 
Act 
percentage and presented in charts for inadequate (rating of 1 or 2), adequate (rating of 3) and 
good (rating of 4 or 5). 
N/A Ratings 
N/A ratings were removed from the observed proportion to enable the calculation of robustness.  
Scholarships Programme  
under 
Scholarships Activities were introduced to the AAR process in 2016 (AMAs for 2014-15) and have a 
notable influence on the overall robustness of AMA effectiveness ratings.  For purposes of 
comparability across all four AARs, assessments of the robustness of AMAs in this report 
consistently exclude Scholarship Activities unless noted otherwise. 
Statistical Analysis of results 
The Wilson score method was used to construct initial confidence intervals26: 
Information 
 
In this calculation n is the sample size, p is the observed proportion, q=1‐p and z is the quintile of the 
standard normal distribution which depends on the desired level of confidence. For 95% confidence 
Released 
intervals, as is the case in this instance, z is 1.96. Using the same formulas as in previous AARs27, 
these were the adjusted using the ‘finite population correction’ method referenced by Burstein28 to 
construct confidence intervals with more accurate coverage probability given the small population 
sizes. This approach enables the nominal coverage of 95% to be maintained while narrowing the 
confidence intervals.  
Official 
To determine statistical significance of differences in findings between the inaugural and 2017-
18 AAR and the 2016-17 and 2017-18 AAR, the Z test based on normal approximation in two finite 
populations was used29. It was possible to conduct statistical analyses when a rating was applied by 
                                                           
26 Newcombe, R.G. (1998) Two‐sided confidence intervals for the single proportion: comparison of seven methods. Stat. Med. 17, 857‐872. 
27 Wilson’s method alone was used for the inaugural 2015 AAR report (2013-14 results). The 2016 AAR report (2014-15 results) used the current 
combination of Wilson overlaid with the FPC method referenced in Burstein; the 2013-14 results were revised accordingly for comparison. See 2016 
AAR Report for further detail. 
28 Burstein. H (1975), Finite Population Correction for Binomial Confidence Limits, Journal of the American Statistical Association, Vol. 70, No. 
349  (Mar., 1975), pp. 67- 69; Rosenblum, M, van der Lann. M.J (2009) Confidence Intervals for the Population Mean Tailored to Small Sample 
Sizes, with Applications to Survey Sampling, Int. J Biostat 5(1):4 
29 Krishnamoorthy K and Thomson J. Hypothesis Testing About Proportions of Two Finite Populations. The American Statistician, Vol 56, No 3 
(August  2002), pp 215-222. 
37 
 
 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


 
Appendix 3: Distribution of sample 
The following figures profile some of the key descriptors for the AMA and ACA samples: 
Number of AMAs by Sector 
Agriculture/Forestry/Fishing
15
Government & Civil Society
7
Education
5
Economic Governance
5
Agriculture
5
Transport & Storage
4
Health
3
Water Supply & Sanitation
2
the  
Unspecified / Multi
2
Act 
Reconstruction relief and rehabilitation
2
Other multisector
2
Law and Justice
2
Humanitarian Response
2
Trade / Tourism
1
Resilience
1
under 
Popuation Policies/Programmes &…
1
Emergency Assistance
1
Communications
1
Business & Other Services
1
Banking & Financial Services
1
 
 
Information 
Number of ACAs by Sector 
Health
4
Released 
Resilience
3
Agriulture/Forestry/Fisheries
1
Agriculture
4
Other Multisectoral
2
Governance
2
Official 
Education
2
Banking & Financial Services
2
Water Supply & Sanitation
1
Trade and Labour Mobility
1
Renewable Energy
1
Humanitarian Response
1
Environment
1
Economic Governance
1
Business & Other Services
1
 
 
 
43 
 
 


 
Number of AMAs by Programme 
 
Solomon Islands
6
ASEAN Regional
6
Multilateral Agencies
5
Kiribati
5
Scholarships
4
Vanuatu
3
Partnerships and Funds
3
Pacific Human Development
3
Pacific Economic
3
Niue
3
Indonesia
3
Fiji
3
Papua New Guinea
2
Pacific Regional Agencies
2
Latin America and the Caribbean
2
Humanitarian & Disaster Management
2
Tonga
1
the  
Tokelau
1
Timor Leste
1
Act 
Pacific Transformation Fund
1
Pacific Energy & Infrastructure
1
Myanmar
1
Economic Development - ED
1
Africa
1
 
under 
Number of ACAs by Programme 
 
Partnerships and Funds (contestable)
6
Latin America & the Caribbean
3
Papua New Guinea
2
Other Asia
2
Information 
Niue
2
Multilateral Agencies
2
Fiji
2
Released 
Tonga
1
Vanuatu
1
Solomon Islands
1
Samoa
1
Pacific Human Development
1
Official 
Pacific Economic Development
1
Humanitarian & Disaster Management
1
Economic Development - ED
1
Africa Regional
1
 
 
 
 

44 
 
 


 
Number of AMAs and ACAs according to Whole-of-Life Budget Programme Approval* 
 
 
AMAs 
 
ACAs 
$ 5 million and over
31
$ 5 million and over
6
$3 million - $4.99
$3 million - $4.99
8
5
million
million
$1 million - $2.99
$1 million - $2.99
16
6
million
million
$500,001 - $999,999
3
$500,001 - $999,999
7
the  
< $ 500,000
3
< $ 500,000
Act 
 
*The Whole-of-Life Budget Programme Approvals for three AMAs and one ACA were not available 
 
under 
 
 
Information 
Released 
Official 
45 
 
 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 


the  Act 
under 
Information 
Released 
Official 

 
 
Appendix 6: Interviewing script  
 
INTRODUCTORY SCRIPT - MFAT Annual Assessment of Results (AAR) 2018 
Thank you for agreeing to speak with me today.   
As you will have been advised by the Development Strategy and Effectiveness team, I have been 
involved in a review of a sample of AMAs and ACAs to determine the robustness of self-assessed 
performance ratings in these reports.  This year my colleague and I are reviewing 66 AMAs and 36 
ACAs (102 in total) that were randomly selected from a total of 196 AMAs and ACAs submitted 
between July 2017 and July 2018.  
We would specifically like to discuss the ratings given against the effectiveness criteria – these are 
the ratings provided for achievement of outputs, short-term outcomes and medium-term outcomes. 
We will aim to keep the interview to 30 minutes (longer if more than one AMA or ACA will be 
the  
discussed). Time allowing, we may also briefly discuss assessments and ratings provided for other 
performance criteria, as well as your experience of the tools, guidance and support available to 
Act 
complete AMAs and ACAs. 
The primary documents reviewed for our analysis are the AMAs/ACAs themselves, as well as the 
corresponding partner report. We assessed whether or not the effectiveness ratings given appear 
‘robust’ on the basis of the evidence and analysis presented in the AMA/ACA and the partner report. 
Where further information and clarification are required, we have the option to speak with Report 
under 
Authors (Activity Managers) and/or Deputy Directors in order to gain further insight into the 
evidence and considerations underlying the given ratings.   
I asked to speak with you today because I need a deeper understanding of how you arrived at some 
of the ratings in the following report/s (list here) 
Before we begin the discussion, I would like to assure you that the discussion will be  
Information 
  Strictly confidential.  Individual AMA/ACA reports will not be identifiable in the Annual 
Assessments of Results (AAR) report.  All findings will be presented as percentages of the 
sample, and aggregated for smaller programs.  Any comments from interviewees that are 
included in the report will be anonymous.  Assessment and interview records will be stored 
securely by the Development Strategy and Effectiveness team and it will not be used for any 
Released 
purpose other than do inform the AAR. 
  The AAR process will not result in a change to the original ratings in the MFAT system. The 
assessors are not tasked to recommend any changes to the ratings in individual AMAs/ACAs. 
Our main aim is to understand the process and quality of evidence that informed the ratings. 
Official 
At the end of the interview, feedback on the independent assessment can be provided to Program 
Managers/Activity Managers by the interviewers, if required/requested.  This feedback will be 
provided in general terms, that is, indicating the strengths/weaknesses of the AMA/ACA against a 
suite of agreed criteria against which it was assessed. 
Now, to confirm, we are discussing (Activity number and name – if more than one, take them one at 
a time) 
Questions on criteria: 
  To be inserted by interviewer based on desk assessment.  
  Focus on effectiveness ratings that were assessed as non-robust. Time allowing, then discuss 
ratings for other criteria that were also assessed as non-robust.  
49 
 

 
  Clarify evidence and interpretation of evidence to understand whether original rating is 
robust or not.  
  In a separate row directly below the original assessment in the assessment template, indicate 
what the assessed rating is after interviewing and motivate why, if it has changed from the 
rating given at the time of the desk assessment (e.g. “The Activity Manager conducted a site 
visit and provided information from this visit to justify the original rating. This information 
was not documented in the AMA/ACA. Based on this information, the reviewer agrees with 
the Activity Manager’s original rating – it is robust”). 
  If the Activity Manager cannot justify his/her original rating based on all available 
information, then you may want to ask about his/her interpretation of the evidence which led 
you to assess the original rating as non-robust. This is not about defending your assessment. 
It is about allowing the Activity Manager to contextualise and explain that information, 
which may lead you to reconsider your assessment of the original rating, or not. This is not to 
be debated with the Activity Manager. Simply record your final rating and the justification for 
it, and move on to the next item. 
Questions on Process for completing AMAs/ACAs: 
the  
  Can you describe the process you went through to develop the AMA/ACA?  Act 
  Do you find the guidance and templates helpful? Any suggestions to improve them? 
  Did you receive any training or support from the Insights, Monitoring & Evaluation team on 
how to complete AMAs/ACA? Did you find it helpful? Any suggestions to improve it? 
  In your view, do you have sufficient robust evidence/data to inform the AMA/ACA process? 
(You could prompt about the timeliness and quality of partner reports; opportunities for site 
visits, etc.) 
under 
  Are there any particular aspects of AMAs and ACAs that you find especially challenging? 
What are they? Why do you find them challenging? (Prompt for specific aspects, e.g. 
mainstreaming of cross-cutting issues, VFM) 
 
Information 
Released 
Official 
50