APTIS SCORING SYSTEM
KAREN DUNN
PAGE 4
For the skill components, a CEFR level is assigned according to the score obtained in each skill.
Key points to note in this score allocation process are as follows:
1. Each skill component has designated set of score ranges which determine CEFR level
allocation. The precise scores required to achieve a specified CEFR level differ for each skill,
i.e. the range of scores that will achieve a B1 is different for Writing compared to Listening.
This means the numerical scores are not directly comparable between skill areas.
See the Appendix for CEFR cut scores for the main Aptis variants.
2. For a given skill component, if a candidate’s score falls into the borderline area immediately
below a CEFR cut-score boundary, CEFR allocation for that skill is refined based on their
performance in the Core component. A strong Core performance will see them pushed up to
the next CEFR level. It is possible therefore for two candidates to be awarded the same
numerical score and a different CEFR allocation for the same component if they have scored
differently in the Core component.
3.2 Overall CEFR level
Candidates completing a four-skills test are additionally allocated an overall numerical score, plus an
overall CEFR level. The overall numerical score is the sum of the scores obtained in each skill. This is
derived independently to the overall CEFR level which is calculated as a rounded average of the
CEFR level achievement in each skill. The two scores are not directly comparable. This is described in
more detail in Section 8.
4. APTIS AND STANDARDS
All Aptis tasks are written according to the published specifications (O’Sullivan and Dunlea, 2015) and
are subject to a rigorous pre-testing procedure. When creating a new version, only new tasks which
function in line with the existing standards will be included. The two key testing principles of reliability
and validity are at the forefront of Aptis test design and the development of new versions. With respect
to scoring validity, the theoretical fit of the way that performance is being assessed, the accuracy of
scoring decisions made by raters, and alignment with the external performance standards outlined in
the CEFR are all crucial elements. The latter is discussed in more detail in Section 5
below.
A test is reliable if it can be depended on to provide a consistent measurement of candidate ability
levels, and a test is valid if it is measuring the ability it is designed to assess. The reliability of Aptis
receptive skills components (Reading, Listening, and Core) are calculated at regular intervals using
the Alpha statistic, and are invariably found to exceed the recommended threshold. Figures reported
from the 2016–17 Annual Operating Report are shown in Table 1. Statistical assessment of test
reliability gives an indication of what is known as the “internal consistency” of a test, in other words,
whether all the items are working together to assess the same underlying ability. The most common
means of reporting reliability is a Cronbach’s Alpha statistic. The reported scale is between 0–1, with
estimates closer to 1 representing a higher degree of reliability. A figure greater than 0.7 is generally
considered to indicate that a set of test items are functioning as required.
|