When recruiting patients is not enough: How synthetic control arms can enhance clinical trial design

May 4, 2023


Clinical trials are a crucial component of present-day clinical research, providing the data that regulatory agencies rely on to make decisions about the safety and efficacy of new drugs and treatments. However, traditional clinical trials are often slow and expensive and may not fully reflect the diversity of patients in the real world. For this reason, the past few years have witnessed a surge in interest in novel trial designs that can tackle these limitations.

We sat down with Professor Krishnarajah Nirantharakumar (KN), MD, from Dexter, a cutting-edge startup and Globant’s partner out of the University of Birmingham, to discuss key concepts related to synthetic control arms, an emerging approach to study design that is transforming the field. 

Randomized controlled trials

There are two main groups of patients in randomized controlled trials (RCT) the intervention (or exposure) arm, which receives the new drug or treatment in testing, and the control arm, which receives either a placebo or a standard-of-care treatment. The control group serves as a reference point to which researchers compare the effects of the intervention. Controls in clinical trials help determine if the observed effects are due to the intervention, and they minimize possible unintended bias or unforeseen confounder effects. 

The process of recruiting a large number of participants for a clinical trial can be time-consuming and resource-demanding. Challenges include finding qualifying patients, the difficulty in recruiting a diverse pool of participants, narrow eligibility criteria (which reduce the number of qualifying candidates for participation), lack of understanding of clinical trials among the general population, and a level of mistrust associated with clinical research, especially among minority groups. The number of participants in a trial directly impacts the statistical power of the study and its ability to detect an actual effect of the treatment given a specific sample size and level of significance. Increasing the statistical power reduces the likelihood of a type II error, where a treatment effect exists but is not detected due to insufficient sample size. Therefore an inability to recruit a large number of patients, further complicated by often insufficient patient retention during a trial, is one factor that leads to a high failure rate in clinical trials.

KN:In certain circumstances, [recruiting enough participants] can be quite problematic. And this is where the so-called external or synthetic controls come into play. The most common application is in oncology, and the second most common is in rare diseases. In oncology, when new treatments are studied, there is a lot of interest, and most cancer patients consent to participate in such studies with the hope they will be receiving the treatment. No one wants to be put in the placebo group. So in such circumstances, we have two issues. One is that we might not have enough cancer patients to randomize them into two groups, so we don’t have the [statistical] power. The second is that it might not be ethical not to give the treatment. In those cases, we may try to find external controls [or synthetic controls]. We can find a set of cancer patients in some other place, not receiving this treatment, but are getting a similar kind of management otherwise.”

Synthetic control arms

Synthetic control arms (SCAs), a type of external control arms, is an innovative approach in which researchers use data to construct a virtual or synthetic control rather than recruiting new patients for a control group. Building an SCA involves utilizing patient data found in pre-existing datasets, such as electronic health records (EHR), that is de-identified or stripped of any personally identifiable information (PII). Such controls mimic actual patients who would otherwise be recruited for the trial. 

KN: “If randomization has worked well in an RCT, the characteristics of the people you find in the intervention arm will be similar to those in the control arm. You will find that the ages are very similar, the sex distribution is similar, and even their biomarkers and parameters may be similar. With external controls, we try to mimic this control population. We try to find [in pre-existing datasets] the type of people very similar to those receiving the intervention. And because we’re trying to mimic the characteristics of real people with pre-existing data, we call this type of control group synthetic controls. 

The datasets where we look for this data come from various places. They can come from cancer registries, electronic health records, and previous health surveys for common diseases. [To generate a synthetic arm, we must ensure that the data we use to represent a control population is very similar to the population receiving the intervention]. There are statistical methods to [test the comparability of the two groups]. The most common one we use is propensity score matching, but there are other methods, such as exact matching, inverse probability weighting, etc. There are many ways we ensure the two arms are very similar.”

Generating an SCA involves a tactical process that requires careful consideration of several factors. First, the patient population for the intervention arm needs to be defined, including their demographic and health characteristics. Next, the goal is to find patients with similar characteristics in EHRs or other RWD sources to use as controls. These patients should ideally be from the same period and geographic location as those in the exposure arm. They should represent those receiving the most current available treatment. The standard of care can vary significantly depending on geographic location and can change over time. Therefore, ensuring that the SCA patients receive the most recent advised treatment is essential to produce a fair comparison to the intervention arm. However, the selection of controls also depends on the outcome being measured and the drug’s proposed mechanism. 

KN: “Obviously, we decide early if we will use a synthetic control. Next, we need to know what the outcome is and if this outcome is available in the synthetic controls. If it is available, we need to know how the outcome is measured to determine if the outcome measurement process is accurate and applicable in our trial. We will then identify patients with a similar distribution of characteristics [to the intervention arm]. We also need to think about the eligibility criteria very early on. For example, we might want to exclude people with liver disease. Liver disease may not be well-coded in electronic health records, so the [synthetic] control arm may hold early-onset liver disease that is not detected. We may screen and exclude patients with early-onset liver disease in the treatment arm. Several other things have to be considered. You need to think carefully and ensure what a good comparator is and any trade-off between what we want to measure [in the intervention arm], what is measured [in the synthetic control arm], and how to make those comparisons.”

The critical role of EHR data in ethical and regulatory considerations 

The generation of SCAs relies on the availability of high-quality RWD, which is essential for accurately modeling control groups. EHR systems are an example of an important RWD source; their evolution is therefore critical. As EHR data become more comprehensive, the quality of SCAs will improve, leading to better outcomes in clinical trials. Additionally, machine learning (ML) algorithms can help identify patterns and relationships within EHR data, which can further improve the accuracy and effectiveness of SCAs. 

There are ethical considerations that must be taken into account when utilizing EHR data. The informed consent process for using EHR data in research must be carefully managed if the data used is not anonymized. Patients must be provided with clear information about how their data will be used and the risks associated with its use. Finally, using EHR data for synthetic controls must follow relevant regulations and guidelines to ensure that research is conducted ethically and with due regard for patient safety and welfare. Regulatory considerations may vary from country to country. 

KN: “If data are anonymized, they can be used as far as they are not identifiable and as long as there is a mechanism to ensure patient confidentiality. The key is that [patient data use] has to be communicated. Information should be available very clearly for those whose data we are making good use of, and people should have the right to withdraw their data from the electronic health records for research. If we are not using anonymized data, we need to get patient consent for those data to be used as synthetic controls.”

The FDA requires sponsors to consider two key factors to support regulatory review in drug development programs. Firstly, sponsors must communicate with the relevant FDA review division early in the program to determine whether an externally controlled trial is suitable. Sponsors should provide detailed information about the study design, proposed data sources for the external control arm, planned statistical analyses, and plans to meet the FDA’s data submission expectations. Secondly, sponsors must include patient-level data for the treatment and external control arms. If they do not own the data for the external control arm, they must ensure agreements with the data owners to provide patient-level data to the FDA.


The use of SCAs in clinical trials has numerous benefits, such as increasing the precision of some studies, reducing the cost of clinical research, or shortening the time-to-market for new drugs. They can solve the challenges of recruiting sufficient patients for rare disease or cancer studies. Additionally, they offer an appealing alternative when withholding the experimental treatment by assigning patients to a control group is unethical.

However, it is also essential to acknowledge the limitations of SCAs. They rely on assumptions and modeling, which may introduce bias or uncertainty in the results. Availability of data representation, imperfect matching, or inability to account for unmeasured confounding factors may all impact the outcomes of clinical trials that leverage SCAs.

The employment of SCAs has the potential to transform clinical trial design. As technology continues to evolve, EHRs become increasingly prevalent, the availability and quality of real-world data improve, the use of synthetic control arms becomes more feasible and reliable, and the potential benefits of this approach increase. 

Using artificial intelligence and ML algorithms may enhance the ability to identify appropriate synthetic controls, optimize the selection of covariates, and even create new datasets. By reducing the cost and time required for clinical trials, synthetic control arms could expedite the development of treatments for rare diseases and cancers, leading to improved outcomes and quality of life for people around the globe. The future of clinical trial design will continue to be shaped by innovative approaches whose potential impact on public health is genuinely remarkable.

Trending Topics
Data & AI
Globant Experience
Healthcare & Life Sciences
Media & Entertainment

Subscribe to our newsletter

Receive the latests news, curated posts and highlights from us. We’ll never spam, we promise.

The Healthcare & Life Sciences Studio aims to reinvent the life sciences industry ecosystem through tangible technology-driven solutions. Globant aims to bridge the gap to help life sciences and healthcare organizations to achieve their mission of delivering innovation and services faster and more efficiently to enhance patient value and improve outcomes.