to download project base paper


Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast,
synthetic data can be generated infinitely using generative models such as DALL-E
and diffusion models, with minimal effort and cost. In this paper, we present
DatasetDM, a generic dataset generation model that can produce diverse synthetic
images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model
and extends text-guided image synthesis to perception data generation. We show
that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only
needs less than 1% (around 100 images) manually labeled images, enabling the
generation of an infinitely large annotated dataset. Then these synthetic data can
be used for training various perception models for downstream tasks.
To showcase the power of the proposed approach, we generate datasets with rich
dense pixel-wise labels for a wide range of downstream tasks, including semantic
segmentation, instance segmentation, and depth estimation. Notably, it achieves
(1) state-of-the-art results on semantic segmentation and instance segmentation; (2)
significantly more robust on domain generalization than using the real data alone;
and state-of-the-art results in zero-shot segmentation setting; and (3) flexibility for
efficient application and novel task composition (e.g., image editing).
The project website is at:

Leave a Comment


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *