中文

Synthetic Dataedit

Research topic on synthetic data, recursive training, selection bias, and model collapse.

Synthetic Data is the short research-topic label for Qiao's work on generated data, recursive training, and model collapse. The full cluster remains broader than the label: it includes recursive synthetic-data training, data selection, sample selection bias, model collapse, data silos, and Wasserstein geometry.1

Introductionedit

The topic treats synthetic data as both a resource and a risk. Generated samples can reduce data-access costs and support privacy-preserving workflows, but recursive use of selected synthetic data can also narrow the training distribution. This page records that tension in the specific setting of biased local selection and collaborative verification.

Role in this wikiedit

This page keeps the biography readable by giving the long technical background its own location. On the main page, "Synthetic Data" is enough to signal the topic. Here, the topic is unpacked as a research problem: generated samples can improve coverage or reduce access costs, but recursive use of generated data can amplify bias, erase modes, or distort the target distribution. The wiki therefore treats synthetic data as both an asset and a failure mode.

Publicationsedit

PaperVenue/status
When Sample Selection Bias Precipitates Model CollapseICML 2026, 6-11 July 2026, Seoul.

Connection to Qiao's workedit

When Sample Selection Bias Precipitates Model Collapse studies how local selection bias can trigger collapse in siloed recursive training, then uses collaborative Wasserstein-style signals to diagnose the problem. This connects synthetic-data reliability to AI and networks because the key difficulty is not only generation quality, but also distributed access to evidence about the data distribution.

See alsoedit

Footnotesedit

  1. Shumailov et al., "AI models collapse when trained on recursively generated data", Nature 631, 755-759 (2024), is a widely cited reference for the recursive model-collapse framing.