Rating evidence

Our ratings explained

What do our ratings mean and how should you use them?

Any attempt to simplify multiple studies into a single rating score will be imperfect. Yet busy professionals need help to identify the most promising ways of working. We provide ratings for interventions in the Evidence Store in relation to:

  • Overall effectiveness: looking at the consistency of effect across different research studies
  • Strength of evidence: looking at how confident we can be about a finding, based on how the research was designed and carried out.

Each intervention in the Evidence Store is rated on a five-point scale for both overall effectiveness and for strength of evidence. Here’s how our research partners at Cardiff University created these ratings.

Overall effectiveness

For each intervention in the Evidence Store we define the effect in relation to how consistently the intervention was found to be effective across studies.

The best way of assessing the overall effectiveness of an intervention is through a meta-analysis. A well conducted meta-analysis pools data from various studies to produce a robust measure of overall effect. Where it was not possible to carry out a meta-analysis (e.g. because different studies measured different outcomes, or measured them in different ways) we have looked at the effects found in individual studies and whether they combined to give a consistent message about effect, or whether the overall picture was mixed.

Studies might also vary in the number of people involved – so how do we assess one big study that found one result if three smaller studies found something different? We’ve taken a simple approach of requiring a majority of both studies and the participants involved to show a particular outcome when pooled together.

Our approach to overall effect is therefore a five-point scale that communicates what we know about how effective (or not) an intervention is. The table below shows the scale, with icons and an explanation of the rating.

IconNameWhat this means
Negative effectEvidence tends to show negative effect. The balance of evidence (including the pooled effect size where available) suggests that the intervention has a negative effect, meaning the intervention made things worse. This takes into consideration the number of studies showing positive and negative effects, and also how many people took part in those studies.
No effectNo effect. The balance of evidence (including the pooled effect size where available) suggests that the intervention has no effect overall.
Mixed  effectEvidence tends to show mixed effect.  Studies show a mixture of effects and the criteria for negative or tends to positive effect is not met.
Tends to positive effectEvidence tends to show positive effect. The balance of evidence (including the pooled effect size where available) suggests that the intervention has a positive effect. This takes into consideration the number of studies showing positive and negative effects, and also how many people took part in those studies.
Consistently positive effectEvidence shows consistently positive effect. The evidence (including the pooled effect size where available) consistently suggests that the intervention has a positive effect. This takes into consideration the number of studies showing positive and negative effects, and also how many people took part in those studies.

Strength of evidence

The evidence store makes judgements about the strength of evidence included in systematic reviews. Our overall framework is provided by the EMMIE system developed by the UCL Jill Dando Institute, for use by the What Works Centre for Crime Reduction in their toolkit. EMMIE uses a specific way of evaluating the quality of existing reviews called the EMMIE-Q. The Crime Reduction Toolkit is working with a well-developed and substantial global literature from criminology. When we started to apply the EMMIE-Q, almost no studies in children’s social care met the criteria to obtain a meaningful score. There were also some issues with several poor studies being in a review that was done well. We have therefore adapted the EMMIE-Q to provide a four-point rating for strength of evidence.

Our first two ratings are an attempt to differentiate between reviews that contain no good quality evidence and those where some good quality evidence is present. Here we make a judgement about the number of acceptable quality studies within a review that we are summarising. To be acceptable, a study has to meet key quality criteria. We have adapted these from the core ones developed and used by the Early Intervention Foundation (EIF). We make a judgement about whether there are no acceptable quality studies (which gets a 0), whether there are 1 or 2 (which gets a 1) or whether there are 3 or more. Where there are 3 or more this meets the threshold for us to apply the EMMIE-Q. The requirements of EMMIE-Q are combined to allow us to differentiate between lower and higher scores.

This process allows us to rate the strength of evidence in an existing review on a five-point scale:

IconStrength of evidenceWhat this means
Very low strength evidenceNo acceptable quality studies included in the review.
Low strength evidenceOne or two acceptable quality studies included in the review.
Moderate strength evidence Three or more acceptable quality studies included in the review (high quality review therefore possible). 
AND 
Review method meets between 0-3 EMMIE-Q quality requirements (see below)., This indicates that strong confidence cannot be placed in the findings of the review.
High strength evidence Three or more acceptable quality studies included in the review (high quality review therefore possible). 
AND
Review method meets between 4-6 EMMIE-Q quality requirements, including those relating to the search strategy, statistical conclusions and validity of outcome constructs (all themes marked*, see below).  This indicates a high quality review in which strong confidence can be placed.
Very high strength evidenceMultiple reviews which meet the criteria under ‘high strength of evidence’.. 

Defining an acceptable quality study

The following definition of an acceptable quality study is consistent with key elements of the definition used by the Early Intervention Foundation (EIF). An acceptable quality study must have the following characteristics:

  1. The sample is sufficiently large to test for the desired impact. A minimum of 20 participants are subject to measures at both time points within each study group (e.g. a minimum of 20 participants in the treatment group AND comparison group).
  2. The study must use valid measures. Participants might be asked to complete measures at various points, and these measures should reliable, standardised and validated independently of the study. Administrative data and observational measures might be used to measure programme impact.
  3. Comparability of groups is addressed in selection and/ or analysis. This might be achieved through randomisation, or by selecting a comparator group based on matching criteria, or through analysis by using statistical techniques such as propensity score matching.
  4. An ‘intent-to-treat’ design is used, meaning that all participants recruited to the intervention participate in the pre/post measurement, regardless of whether or how much of the intervention they receive, even if they drop out of the intervention (this does not include dropping out of the study – which is then regarded as missing data).
  5. The study should report on overall and differential attrition (or clearly present sample size information such that this can be readily calculated).

EMMIE-Q requirements

The EMMIE-Q identifies 6 requirements, each relating to a different aspect of study quality. These inform the assessment of the methodology of studies that are used to measure effect. They are as follows:

#Requirements
1.A transparent and well-designed search strategy*
2.High statistical conclusion validity (at least four of the following are necessary for a study to be considered sufficient)* (a) Calculation of appropriate effect sizes (b) The analysis of heterogeneity (c) Use of a random effects model where appropriate (d) Attention to the issue of dependency (e) Appropriate weighting of individual effect sizes in the calculation of mean effect sizes
3.Sufficient assessment of the risk of bias (at least two necessary for sufficient consideration)* (a) Assessment of potential publication bias (b) Consideration of inter-rater reliability (c) Consideration of the influence of statistical outliers
4.Attention to the validity of the constructs, with only comparable outcomes combined and/or exploration of the implications of combining outcome constructs*
5.Assessment of the influence of study design (e.g. separate overall effect sizes for experimental and quasi-experimental design)
6.Assessment of the influence of unanticipated outcomes or spin-offs on the size of the effect (e.g. quantification of displacement or diffusion of benefit)

Requirements 1-4 (highlighted by *) are considered particularly important, and are required for any review to achieve a rating of 3, which is the highest rating in the scale.

We then use the number of EMMIE-Q requirements present to inform a judgement on strength of evidence, differentiating between a 2 and a 3 in our strength of evidence scale as outlined above. This is different from the way the EMMIE-Q scores are used by the What Works Centre for Crime Reduction because, as discussed above, there is more high quality evidence in that field.

The What Works Centre’s outcomes framework

Research in the Evidence Store has to focus on outcomes that fit in the What Works Centre’s outcomes framework. In this framework there are three sets of primary outcomes which are:

  • The rights of children, parents, carers and families
  • Children’s and young people’s outcomes
  • Parent, carer and family outcomes

There are also process outcomes that relate to organisational factors around Children’s Social Care. These include:

  • Cost-effectiveness of services
  • Workforce outcomes
  • Skills, knowledge and experience of social workers and other social care professionals

Read a full description of the What Works Centre’s outcomes framework here.