Rating scales in neurology
Clinical trials of interventions in neurology generally require outcomes that can be measured and that enable results to be compared between trials. They also require comparisons to be made about the severity of disease of the trial participants.
Some features can be readily measured with little likelihood of disagreement; an obvious example being mortality. Other clinically important features, however, cannot be so easily measured.
The outcomes that are often of most importance to patients – such as quality-of-life issues, overall disability, and the impact of their disease – are often particularly hard to assess consistently.
For example, trials of new agents in the treatment of multiple sclerosis have shown that the agents are effective in reducing both the accumulation of neuronal lesions, as assessed by magnetic resonance imaging, and the rate of relapse (both relatively easy-to-measure outcomes), yet the effects on the progression of physical disability (a more important clinical outcome to patients) have been more controversial.
Single-item scales, which assess a single disease feature or outcome, are attractive to clinicians because they are easy to interpret, with each score on the scale having a single, specific meaning. They are also often simple and quick to administer and easy to score.
However, they have only a limited ability to detect differences between patients and to detect change in a single patient over time. Moreover, as such scales are based on a single disease feature or outcome, they are prone to observer error, and inter-rater reliability is often low.
Also what they measure is only one aspect of a disease and, as such, the scores that they produce are often a poor reflection of the more complex outcomes that are typically of most interest to patients and clinicians.
Multiple-item scales are developed with the aim of measuring these more complex, multidimensional outcomes, such as quality of life or degree of disability. In addition, multiple-item scales have the advantage that a combination of various items tends to cancel out the random error that is associated with each individual item. Reliability therefore tends to be higher than for single-item scales.
Nevertheless, many clinicians find multiple-item scales problematic, as they are more difficult to interpret clinically than a single-item scale, the total is made up of a sum of various scores for different items, and any one total score can be achieved by a variety of permutations.
Development of multiple-item scales
The development of multiple-item clinical scales is a time-consuming process that involves essentially four stages.
Definition of the construct (i.e., what it is that is to be measured) and any sub-constructs (i.e., any subdivisions of what is being measured)
Generation of a set of individual, measurable disease features or outcomes that cover all the important issues.
Assessment of these features or outcomes in a sample of patients, with the resulting data used to develop a scale that is a reliable and valid representation of the construct and any sub-constructs.
Examination of the scale in independent samples of patients
Clinically useful and scientifically sound
The choice of which scale to use in a particular circumstance must be made carefully. A scale must be both clinically useful and scientifically sound.
To be clinically useful, a scale must be able to be incorporated usefully into clinical practice, and be appropriate for the sample of patients being studied.
To be scientifically sound, a scale must be
- reliable - that the scores can be reproduced when the scale is applied by the same rater on a different patient (intra-rater reliability), by a different rater on the same patient (inter-rater reliability), or – in the case of self-rating scales – by the same patient at different times (test–retest reliability)
- valid - i.e., that it measures the concept that it is intended to measure
- responsive - both to change in the feature or outcome that is being measured in a single patient over time and to differences in the measurement across a group of patients
Formal assessment of the reliability, validity, and responsiveness of a clinical scale is likely to be beyond the interest and scope of many clinicians, but there are some important points to bear in mind.
- Reliability, validity, and responsiveness are independent properties and each must therefore have been assessed separately (although often by the same researchers and reported in the same papers).
- They are sample-dependent properties, and so must have been assessed in different sample groups – this is especially important for generic scales that are designed to be used in more than one disorder.
- Studying the distribution of the scores in a sample is a simple way of deciding whether the scale will be useful in that sample, since the scores of patients with the maximum score or the minimum score may not be able to change, whatever the effect of an intervention or whatever changes are wrought by time.
- The use of a scale in clinical practice or in a study or trial will, generally, provide enough information for reasonable statements to be made about the reliability of that scale.
- The features or outcomes assessed in a scale ought, generally, to bear a reasonably close relationship to the outcome that is being measured.
Aim of the intervention
However, no matter how scientifically sound a scale is, it may not be the best choice if it is not designed to measure the right feature.
For example, a trial that is to examine the specific effect of a drug on motor symptoms in Parkinson’s disease is not likely to be best served by a scale that is designed to measure psychological well-being. The further removed the outcome measured by the scale from the aim of the intervention, the more likely the results are to be unhelpful, or misleading.
The use of clinical rating scales is an area with which all neurologists should be acquainted. This section of the website presents some of the better known rating scales used in neurology. There is a short description of each scale, along with the primary reference in the literature for each scale.
Published on CNSforum 20 Jan 2005