Towards early identification of mental health problems in children’s social care

The purpose of this study is to advance the development of early identification tools for young people’s mental health (MH) problems in social care settings. 

Report documents

Evaluation protocol
(PDF, 327KB)
Full report
(PDF, 2MB)


The purpose of this study is to advance the development of early identification tools for young people’s mental health (MH) problems in social care settings.  

Almost all young people with social care contact are likely to experience some kind of mental health problem, yet a small proportion of these are thought to have formal diagnoses, and even less receive treatment. Due to a poor integration of datasets between services for young people, there remain significant problems in identifying young people with mental health problems in social care settings. Failure to identify risk factors and mental-health-associated problems early can delay treatment; if accurate early identification tools can be developed, social care services could deliver more timely support for vulnerable young people. One valuable approach to exploring this is through data linkage. Given the multi-factorial nature of mental ill health, this study suggests that building accurate models to identify mental health problems will require access to large, representative datasets of multi-domain data that reflect a broad range of bio-, psycho-, and social factors. 

This project, conducted by researchers from the University of Cambridge, involved creating a linked database of education, health, and social care data to measure childhood mental health problems and their associated risk factors.  


This study aimed to create a linked database of health, social care and education data, in order to allow for a more comprehensive and holistic measurement of MH problems and their associated risk factors. The study also aimed to develop prototype models for the early identification of mental health problems of young people in social care settings. 

The study addressed the following research questions: 

  1. What is the best method of measuring mental health problems and risk factors for young people’s mental health problems in linked administrative datasets? 
  1. What are the relationships between risk factors and mental health problems? 
  1. What are the best methods for building predictive risk models and early identification tools for young people’s mental health problems for use in social care settings? 
  1. Can findings and methods be replicated across databases (i.e. translated to Cam-CHILDCADRE)? 

The authors are also conducting ongoing work to investigate the following research questions: 

  1. What is the prevalence and distribution of mental health-associated problems and their risk factors? How do patterns of mental health-associated problems vary between social care, health, and educational settings? How do they vary across Wales, UK? 
  1. What is the unrecognised mental health need in social care settings? 


For the measurement of mental health problems and associated risk factors, a retrospective cohort study design was used, with cross-sectional analysis. The cohort of young people was defined as anyone who was aged 0-17 years in the period between 1 January 2013 and 31 March 2020. The final cohort consisted of 1.1 million young people in Wales, of which 46,704 had social care data and were thus used in sub-sample analysis for early identification model prototyping. Though the overall cohort comprises 1.1 million young people, sample sizes differ quite substantially between datasets. 

The team created a linked multi-agency database (containing datasets from health, social care and education) which was used to measure the prevalence of mental health problems within the cohort. This database was then used to explore various machine learning methods for identifying mental health problems in children in social care settings. The machine learning method benefits from the large amount of data available and use the information to learn from existing data and discover hidden patterns which are then used to predict the outcome of future observations.    Research was carried out within the Secure Anonymised Information Linkage (SAIL) Databank. This project also helped to build a linked administrative database in Cambridgeshire and Peterborough by using the Adolescent Mental Health Data Platform (ADP)/SAIL database. The database was used to refine methods for: 

  • Operationalising the measurement of mental health problems and risk factors within multi-agency data  
  • Developing methods to map the prevalence and distribution of mental health problems and associated risk factors in multi-agency data 
  • Estimating unidentified mental health needs within social care. 

Key findings

The research found that, within the Welsh GP Dataset (WLGP): 

  • 15% of the researched cohort had at least one mental or psychological health condition of interest, with mood disorders being the most common. 
  • Risk factors fell on a wide spectrum of measurability, from ‘directly measurable’ to ‘derivable’ to ‘measurable by proxy’. Important factors associated with childhood mental health problems were spread across different data sources, rather than being confined to any one particular database. 
  • Of the statistical and Neural Network models trialled, the Graph Neural Network provided the most promising method of identifying young people with a mental health diagnosis. However, greater accuracy and further validation is required prior to considering clinical implementation. 

This study demonstrates that it is possible to link together multi-agency data from social care, health and education settings, which can be used to measure the prevalence of different mental health conditions and their associated risk factors. This linked database allowed different biological, psychological and social risk factors to be measured, which may not have been measurable in single agency data. These different factors interact with each other and contribute to the development of different mental health conditions. As such, a linked database provides a solid foundation for the improvement of early identification tools. 


  • Linkage of multi-agency data offers a promising way of developing early identification tools because early warning signs for mental health problems which may be missed in single-agency data can be combined, leading to a stronger signal for detecting developing problems 
  • Early identification of potential problems means that young people and their families can be offered more timely and proportionate support, instead of waiting in distress for problems to worsen and meet service thresholds 
  • As more robust early identification tools are developed, staff in contexts such as social care can use them to aid decision-making, helping to identify young people who may have additional needs and support smoother care pathways for young people and their families. 

Further work

  • The team have successfully applied for funding and gained approval from SAIL to extend the data access which will enable them to continue the work in linking databases created in this project 
  • The prototype early identification model will be refined, incorporating risk factors from the wealth of datasets 
  • With respect to the Child and Adolescent Data resource (CADRE), the team will replicate the analysis efforts from SAIL to understand if similar patterns are found in Cambridgeshire and Peterborough 

The team has recognised the necessity of including data from diverse populations and regions in order to build accurate and early identification tools for childhood mental health problems. The team have therefore begun, with the help of Alan Turing Institute and HDR UK, the process of developing data sharing agreements and governance structures for a network of Trusted Research Environments (TREs) with federated analytics. The network, based on populations from Cambridgeshire, Peterborough, Birmingham and Essex, has been built through active engagement with advisory panels of young people, parents and carers, to ensure that the project makes use of people’s data in a way which is acceptable, desirable and understandable.