First published in the November 2025 issue of DIA’s Global Forum, this article by Victoria Gamermanusa (RWD Insights) tackles one of rare disease research’s toughest dilemmas: how to run credible efficacy trials when patient numbers make traditional statistics unworkable. She charts how AI, Bayesian methods, transfer learning, and synthetic data are breaking the “statistical impossibility” barrier, why regulators are increasingly open to these approaches, and what it will take for the rare disease community to turn cutting-edge analytics into routine practice.
Traditional clinical trials, traditional trial designs, and analysis methodologies need hundreds of patients to achieve that significance with sufficient statistical power. But what happens when your entire global patient population is small, say 200 people, and you can only recruit 50 for a trial?This mathematical impossibility can block therapeutic development for thousands of rare diseases. Conventional power calculations, designed for diseases with large populations, fail when patient numbers drop below critical thresholds.
The result? People with ultrarare conditions face a cruel challenge: too few of them exist to prove that a treatment works, even when it does.
Fortunately, a quiet revolution is underway. Artificial intelligence and advanced statistical methods are rewriting the rules of what’s possible with small sample sizes. Regulators are paying attention.
AI Strategies Changing the Game
Bayesian Statistics: Borrowing Strength from the Past
Traditional frequentist statistics treat each trial in isolation. Bayesian approaches flip this assumption, systematically incorporating prior knowledge from natural history studies, related diseases, and expert clinical experience.
The impact can be dramatic:
- About 50% smaller required sample sizes compared to traditional methods while maintaining statistical rigor
- Adaptive designs that optimize patient allocation in real time, improving efficiency
Transfer Learning: Knowledge from Related Diseases
Another question the rare disease research community explores: Why start from zero when related diseases offer valuable insights?
Machine learning can enable AI models trained on data sets from similar conditions to be adapted for insights on rare diseases. Specifically, as was argued in a comprehensive Nature Methods review, transfer learning approaches have demonstrated success across multiple rare disease applications.
The MultiPLIER framework demonstrates this approach: researchers trained models on large public gene expression data sets and successfully transferred them to small rare disease cohorts, extracting meaningful biological signals that would be impossible to detect from disparate rare disease data sets alone. This enables pattern recognition in diseases with limited data by leveraging knowledge from related conditions with larger available data sets.
Such analytical breakthroughs extend beyond genomics. Some examples include: