Hybrid Pipeline Overview

Three-Stage Generation Process (S → D → G)

We sculpt high-fidelity synthetic data first, then use it to translate real-world clear frames into adverse conditions:

  • S – Simulation: CARLAi CARLA: Open-source autonomous driving simulator providing high-fidelity synthetic images with pixel-perfect labels.
    Visit CARLA →
    renders pixel-perfect clear/adverse pairs with full annotations.
  • D – Diffusion: Stable Diffusioni Stable Diffusion: Latent diffusion model for high-detail image synthesis.
    Learn more →
    / ALDMi ALDM: Adaptive Latent Diffusion Model that refines realism using segmentation guidance.
    View ALDM paper →
    boosts realism, guided by segmentation masks.
  • G – GAN Adaptation: DA-UNITi DA-UNIT: Domain Adaptation with Unsupervised Image-to-Image Translation Networks.
    View paper →
    learns on the curated S + D pairs plus a 10% mix of real ACDC-Cleari ACDC-Clear: Subset of the Adverse Conditions Dataset containing clear-weather driving images.
    Visit ACDC →
    frames.
    Inference: Feed any clear ACDC frame → DA-UNIT returns a photorealistic fog, rain, or night image with labels preserved.

Enhanced DA-UNIT Architecture

DA-UNIT model architecture diagram showing the enhanced GAN pipeline

Key Architectural Improvements

  • Support for depth, semantic, and instance data at encoder/decoder stages
  • Improved object shape preservation through auxiliary inputs
  • Enhanced label alignment with ground-truth data
  • Novel training strategy combining simulated and real images
Full-size DA-UNIT model architecture diagram showing encoder-decoder structure

Technical Details

Blending Technique

Our novel blending approach addresses key challenges in the generation process:

  • Adaptive merging of diffusion output with original simulated images
  • Mitigation of artifacts (e.g., distorted vehicles)
  • Preservation of photorealistic enhancements (e.g., wet roads, nighttime lighting)

Training Strategy

The enhanced training process combines multiple data sources:

  • Simulation images for perfect pixel-level matching
  • Unlabeled real images to close the simulation-to-real gap
  • Auxiliary inputs (depth, semantic segmentation) for improved guidance

Performance Results

Performance highlights (ACDC):

  • 78.57 % mIoU on ACDC-Adverse (test) — obtained with zero adverse-weather images in training.
  • +1.85 % mIoU on ACDC (val) overall, versus the baseline ( REINi REIN: Robust Enhancement via Instance Normalization pre-trained on Cityscapes and fine-tuned on ACDC-Clear.
    View Paper →
    pre-trained on Cityscapes → finetuned on ACDC-Clear )
    .
  • Night subset: +4.62 % mIoU on ACDC-Night (val) over the same baseline.

Applications

Practical Benefits

  • Cost-effective generation of adverse-condition training data
  • Significant reduction in real-world data collection needs
  • Improved robustness of autonomous perception systems
  • Flexible adaptation to various adverse conditions (night, rain, fog, snow)