Observation Assessment

I spend a lot of time talking to clients about their observation assessments. It blows me away that this single topic which is so fundamental to the core services we provide in vocational education and training and yet it is not well understood. It is by far one of the most common non-compliances that I see on a regular basis in both the work we do, the audit reports generated by the national regulator and funding authorities. To provide an insight to this, the following are some of the common non-compliances related to observation assessments:

The assessment does not address all performance criteria (validity)
The assessment does not address the requirements of the performance evidence (validity)
The criteria used in support of the observation do not describe observable behaviours (validity and reliability)
There is insufficient benchmarking to support the reliability of the observation assessment (reliability)
There is insufficient evidence recorded by the assessor to provide valid evidence (validity and sufficiency)
The candidate was not assessed on the required number of occasions (validity and sufficiency)
The observation criteria do not align to the task requirements (validity)

Observation assessment is one of those classic functions within the RTO where there is a constant competitive dynamic between quality and efficiency. It is efficient to purchase commercially designed assessment tools and not waste additional time and resources on pre-assessment validation and customisation and to allow these assessment tools to be administered by assessors without oversight. Of course, this generally results in assessment tools being utilised which were not compliant to begin with and assessment recorded by tick and flick. I have just described the single most prominent systemic risk to the quality and reputation of VET. On the quality side of the dynamic, it requires an investment in time to validate and improve the observation assessment design, to educate trainers about their use and to establish assessment quality controls to ensure that no assessment gets through to outcome reporting unless it is right. Choosing to take the quality pathway can be hard work and expensive, no question. Where an RTO chooses to positions itself in this option between quality and efficiency says a lot about the RTO owner and the culture of the organisation. Some RTOs are stuck in the efficiency model and are trying to claw their way back to quality. These can be the most rewarding clients to work with as we help them to improve their assessment system. Where does your RTO sit within this option between assessment quality and assessment efficiency?

In this article, I am going to focus on some specific topics and issues which are central to understanding the role and function of observation assessment. These include:

The role of criterion referenced assessment
Recognising the need for observation assessment
Benchmarking in support of reliability
Recording observation evidence
Inconsistent regulation

Criterion referenced assessment

It is not possible to understand the requirements of observation assessment without understanding criterion referenced assessment and the role it plays in the observation. It was in the very late 1980s that Australia adopted our current competency-based training and assessment model. The actual date is difficult to pin down as it was a gradual adoption over time. The work of the National Training Board in 1988-89 and the development of “occupational standards” was instrumental in the evolution of competency standards and the establishment of performance criteria as the basis for the assessment. The more commonly used definition of criterion referenced assessment is provided by Brown (1988) whom defined criterion referenced assessment as:

“An evaluative description of the qualities which are to be assessed (e.g. an account of what pupils know and can do) without reference to the performance of others.“
Brown, S. (1988) ‘Criterion referenced assessment: what role for research?’ in Black, H.D. and Dockerell, W.D., New developments in educational assessment, British Journal of Educational Psychology, Monograph series no. 3, pp. 1-14.

The key thing to know about criterion reference assessment is that it focusses on the assessment of the candidate’s individual performance against specified criteria. In our competency-based assessment model, these criteria are first and foremost the performance criteria from the relevant unit of competency in addition to other criteria that may be included based on the assessment context. As an example, we may include additional criteria relevant to a specific workplace requirement or equipment safety requirement. The criteria must be expressed in a way that makes them observable based on the required behaviour of the candidate. For some units of competency, the performance criteria are already expressed in a way that lends them to be quite observable. I’m going to use the unit of competency AHCNSY206 – Care for nursery plants as the basis in this article to provide examples. It’s worth noting that this unit of competency is primarily included within certificate II qualifications and therefore the tasks identified within the unit of competency are typically performed under supervision with limited autonomy. As an example, performance criteria 2.6 “Stake and tie plants using the required materials and according to instructions”. This criteria implies that the person being assessed would have been provided very clear work instructions and is been assessed in their skills for staking and tying plants using the required materials. Under a criterion referenced assessment model we are simply asking the question, did the candidate perform this task correctly (as instructed). The answer is either Yes or No. If the candidate can demonstrate satisfactory performance of all criteria specified in the unit of competency, then they can be assessed as competent.

I just want to make a key point here; it is the performance criteria that are the primary basis for the observation. We need to make sure that all of the performance criteria are being assessed through an observation of the candidate’s work. It is quite common for me to receive an audit report from a client which identifies that the RTO is non-compliant because not all performance criteria have been assessed and the audit report will usually list these. Of course, it is not possible to observe a candidate’s performance based on a particular performance criterion, if the assessment task did not require the candidate to perform the task described in the performance criteria in the first place. So, this obviously means that the assessment task itself needs to be valid so that it creates the opportunity for these performance criteria to be observed. This may mean that you need to have multiple observation tools which cover all performance criteria because the unit of competency may need to be assessed over a series of performance assessment tasks. In the case of our unit of competency for caring for nursery plants, we might design one assessment task around element of competency 1 and 2 (preparing for and maintaining nursery plants) and a separate assessment task around element of competency 3 (Complete nursery plant maintenance operations). But, the absolute key point to take away from this section is that, all of the performance criteria must be integrated into the observation tools in support of these tasks and customised for the context of the assessment and complemented with any additional workplace or enterprise requirements. These performance criteria are the basis for the criterion referenced assessment of the candidate’s performance. Make sure they are all covered in the assessment task and the observation tool.

All units of competency require observation assessment

I often make the point when presenting information or talking to clients that every unit of competency requires observation assessment. I sometimes receive these commercial assessment resources from a client where the assessment is basically theoretical. I am sure you have experienced this yourself. The assessment is asking lots of knowledge questions around how the candidate would do this and how the candidate would do that and these questions often require a written response which are simply marked using a marking guide as either correct or incorrect. This type of assessment is not valid as a performance assessment because it is only assessing the candidate’s knowledge and understanding of the task and is not assessing their performance ability to perform the task. It does not matter if you introduce case studies that present realistic workplace situations requiring the candidate to analyse and provide written responses based on the case study . This does not make the assessment task compliant; it simply is leading to the collection of more knowledge evidence and is not assessing the candidate’s ability to perform the task. This is a very common non-compliance. This is a really key point. If you want to assess a candidate in a unit of competency then you need to get the candidate to actually perform the task. That may be a practical task or it may be a cognitive analytical task but either way they actually need to do it and their performance needs to be observed. If you do not get the candidate to actually demonstrate the task then, your assessment will be non-compliant because it is not valid as a principle of assessment and the assessment evidence that you collect (such as the written response activity) is also not valid as a rule of evidence. It is that simple.

I just want to circle back to my opening sentence in this section. Every unit of competency requires observation assessment. The majority of units of competency fall into three different unit of competency types. We have demonstrable or practical units of competency such as AHCNSY206 – Care for nursery plants. We have cognitive or analytical units of competency such as AHCPCM508 – Develop an integrated pest management program and we have units which are a combination of both such as AHCWRK403 – Supervise work routines and staff performance. The valid evidence to collect and observe for caring for nursery plans is obviously observing the candidate perform the task in a plant nursery. The valid evidence to collect and observe for the candidate developing an integrated pest management program is the pest management program that the candidate has developed in response to the assessment task. The point I want to make here is that all of these units are equally observable. The key differences is that the first unit of competency would be observed the workplace or a simulated workplace by directly observing the candidate’s performance caring for nursery plants as opposed to the second unit of competency where I would be collecting a completed pest management program that the candidate has developed. In the cognitive unit, I would be using the basis of the performance criteria to assess the candidates work to determine if they have met the requirements of the criteria. This is often a difficult concept to grasp so I will provide an example. If I am assessing the candidate’s submitted integrated pest management program, I would be reviewing the documented program against performance criteria 2.2 which requires me to assess if the candidate has “Developed strategies to ensure minimal or no risk of resistance developing in the range of weeds, pests or diseases identified”. So, I would review the work submitted by the candidate using my subject matter expertise and I am critiquing the work to determine if the candidate has developed the appropriate strategies as required. I have an observation tool that specifies this requirement, and I can make a decision about the adequacy of the candidate strategies and then record the assessment outcome. The only difference between this model of observation assessment compared with the more practical task of caring for nursery plants is that I am assessing a document (as the students work). It is a document prepared by the candidate based on the task requirements and I am assessing the candidates work based on the criteria. The key takeaway here is that all units of competency require observation assessment including those where we are assessing the candidate’s practical performance and those where we are assessing a product (such as a document) produced by the candidate based on the task requirements. Under our criterion referenced assessment model, the candidates work must be assessed against the requirements of the performance criteria. If we can seriously fix that one issue alone in the VET sector today, we would have significantly less non-compliance.

Benchmarking in support of reliability

No doubt everyone is familiar with the concept of providing model answers or benchmark responses in support of written or verbal knowledge assessment. This seems to be a very widely accepted concept and a requirement that most RTOs are complying with these days. We know that if we are going to have a written knowledge assessment, we need to have the benchmark answers so that the assessor can compare the candidate’s responses to these benchmark answers and can mark the candidates work using a consistent standard. This relates to the principle of assessment of reliability. The principle of reliability requires that the RTO provide sufficient tools and guidance to the assessor to ensure that the same assessment standard is being applied not only between different candidates but more importantly between different assessors. In other words, all of the assessors need to be “on the same page” and applying the same standard. Now, here comes the key point. This requirement of needing to provide benchmark guidance is equally applicable to observation assessment. A very common non-compliance is where an auditor will make the assessment non-compliant because there is insufficient guidance available to the assessor to explain the expected observable behaviours that the candidate is required to demonstrate. So, lets look at an example and let’s use our unit on caring for nursery plants as an example. The performance criteria 2.6 required the following: Stake and tie plants using the required materials and according to instructions. We might express this as an observation criteria as follows: “Staked and tied plants using the correct technique and material in accordance with work instructions”. Now, if I want to ensure that every candidate is assessed to the same standard, I need to specify the requirements relating to this task as a guide to the assessor. This is commonly referred to as an observation benchmark. The observation benchmark for this one observation criteria might look something like the following (thanks WikiHow):

Prepared material according to work instructions.
Select a bamboo or wooden stake for small and medium sized tree or use a metal stake for a larger tree.
Insert the stake on the windward side of the tree about 15–20 cm deep about 5 cm from the base of the primary branch.
Locate the primary branch that supports the most growth.
Tie the primary branch to the stake two thirds of the way up the stem using a material with a flat, broad surface, such as elastic or a wire inside of a rubber hose, tie the branch to several locations on the stake for firm support.

Now, I know some of you are thinking, far out, do we need to do this for every observation criteria and my answer is yes. A little trick that I use is to insert the observation tool itself into the assessor instructions document and populate the area usually used for written observations with your guidance to the assessor as the observation benchmark. Try to provide as much detail as possible and try and keep the guidance technical relating to the task requirements. It is a great activity to get trainers and assessors involved in brainstorming these benchmarks together as a form of assessment moderation. Beware, this can often lead to quite heated discussions and funny moments. Providing benchmarking in support of observation assessment is a requirement of the principle of assessment of reliability. If you do not have detailed benchmarking in support of your observation assessment, then your assessment is likely to be made non-compliant for not having sufficient reliability.

Recording observation evidence

We have all heard the term “tick and flick”. This is the practice where an assessor ticks all the boxes to identify the candidate completed everything correctly, makes no comment or observation and signs and dates the document (maybe) for submission. The problem with this practice is that it results in no recorded observation evidence to justify the assessment decision. From a compliance point of view, the assessment evidence is not sufficient or valid (rules of evidence). Ticking boxes is not recording valid assessment evidence. I sometimes have clients tell me that, they only record a comment if the candidate did not perform the task correctly to capture the areas for improvement. I totally get this and have no problem with the recording of evidence in a non-satisfactory outcome. But, this does not mean that where the candidate’s performance was also satisfactory, we can simply finalise the assessment without recording any assessor evidence. Recording assessor evidence is equally a requirement regardless of the assessment outcome. It does not matter if you have an internal policy that supports only recording non-compliant performance evidence. Just because you have a policy it does not make it ok. The key point is, we need evidence recorded by the assessor about their judgement of the candidate’s performance all the time not only on certain occasions. We need some recorded observations to verify that the candidate’s performance was actually assessed by the assessor and met the requirements of the unit of competency. If you are struggling to believe me, I would refer you to the national regulators general direction on the Retention requirements for completed student assessment items (click). The following is an extract:

Completed student assessment items include the actual piece(s) of work completed by a student or evidence of that work, including evidence collected for an RPL process. An assessor’s completed marking guide, criteria, and observation checklist for each student may be sufficient where it is not possible to retain the student’s actual work. However, the retained evidence must have enough detail to demonstrate the assessor’s judgement of the student’s performance against the standard required.
ASQA General Direction on the Retention requirements for completed student assessment items 2013

If you are not aware, a general direction is issued under the authority of the National Vocational Education and Training Regulator Act 2011 and is a condition of registration for the RTO to comply with directions issued by the national regulator. Seriously, if this is not enough motivation then simply consider that not recording any observation evidence is going to lead to assessment evidence which is not sufficient or valid which makes it non-compliant under clause 1.8. I would also point out that most (if not all) State funding provider guidelines have very specific requirements to retain records of assessment that can verify the completion of assessment. Now, I know this is difficult and I know that it is particularly difficult to get trainers to comply with this requirement. I can only suggest that you implement a program of professional development to explain the requirement, maybe implement some work instructions and parallel to this implement a system of assessment quality control to ensure that assessments are not accepted without sufficient observation evidence. It may take some time to change the culture and practice but it’s definitely something that you cannot ignore. Clients will often ask me how much evidence needs to be recorded. In response to this question firstly, I recommend that you merge all of the cells on the right-hand side of the observation tool to create an open space for comments to be recorded generally. It is quite unrealistic to expect an assessor to record a micro little comment against every single criterion. I also recommend that the assessor needs to record a few basic comments about the candidate’s performance based on their technical observations. Encourage your assessors to focus their observations on the technical aspects of the candidate’s performance. We are not expecting the assessor to record war and peace, but we do want them to provide some reasonable comments to justify their decision. Do not accept the old argument from the assessor that says “I don’t know what to write” or my favourite, “What am I expected to write down in addition to the criteria that are already on the page”. Seriously, when a trainer says this to me, I simply question their professionalism and technical competence. My usual response to this question is to ask “So, from a subject matter expert perspective you have no value to bring to the assessment other than ticking the box?”. We seriously need to educate and reform these assessors or weed these guys out of the sector because they are putting your RTO at risk. I would absolutely recommend a program of professional development and consider implementing some guidelines and examples for assessors to use as a benchmark and implement a system of assessment quality control. We recently released a valuable webinar on assessment quality control if you would like more guidance on this (click).

Inconsistent regulation

A systemic threat to quality and compliant observation assessment is inconsistent regulation. Inconsistent regulation sends mixed messages to the regulated community. As an example, at least two or three times a month, I receive an audit report from a prospective client who has been made non-compliant for some of the issues identified in this article. These are audit reports generated by the national regulator and also increasingly by State funding authorities. This makes sense to me, and it reinforces the issues I have raised in this article and how important they are. Occasionally, I will encounter an RTO where I see significant evidence of absent observation assessment, not assessing all performance criteria or no recording of observation evidence. Of frustration to me is when the RTO may question my feedback because these issues “were not raised during our recent ASQA audit”. The problem here is what I call “lightweight auditors”. There are many ASQA auditors who are highly experienced and are experts in assessment. I worked with many of these auditors at NARA, VETAB and ASQA but, the lightweight auditors presents a significant risk to the sector because through their lack of experience and knowledge, they give a false understanding that poor assessment practices are acceptable. A lightweight auditor will give the RTO the benefit of the doubt where there is insufficient, or no observation evidence is being recorded. A lightweight auditor will not identify a non-compliance where the RTO has assessed practical requirements theoretically. A lightweight auditor does not understand the nuance of benchmarking practical observation. A lightweight auditor can be so impressed and distracted with the beautifully formatted documents that they fail to do a detailed review of the assessment validity. I am not joking. If you get one of these lightweight auditors during an audit then consider it a happy day but, do not make any false assumption that the standard they expect will equally be applied amongst more experienced auditors or that you should accept this standard yourself.

Sometimes, I can provide as much feedback as is possible and it may make little difference to the RTO’s behaviour but, when the same RTO gets an outcome of a performance assessment from a funding authority suspending their funding contract pending resolution of non-compliance with observation assessment, this tends to jolt them back into reality. The funding authorities in most States have very experienced auditors. ASQA has a constantly changing panel of auditors and increasingly some of these auditors have very little experience in VET or a deep understanding of assessment design. I do not see any evidence that ASQA is undertaking any significant steps to correct this. I think that ASQA assume that because the auditor has a TAE, they should understand the requirements of assessment, but the reality is they sometimes don’t. Beware of the lightweight auditor and do not make the mistake of benchmarking your compliance on their lack of experience. I get that everyone needs to start somewhere and these auditors may gain experience over time but, ASQA should get in front of this and consider introducing an enhanced auditor induction / development program that mitigates the risk of inconsistent regulation. ASQA needs to take a systemic approach to countering this systemic threat.

Good training,

Joe Newbery

Published: 30th June 2021

Back to Articles