CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing
Abstract
CARE-Edit introduces a condition-aware routing mechanism that dynamically allocates diffusion model computation to specialized experts for improved contextual image editing tasks.
Unified diffusion editors often rely on a fixed, shared backbone for diverse tasks, suffering from task interference and poor adaptation to heterogeneous demands (e.g., local vs global, semantic vs photometric). In particular, prevalent ControlNet and OmniControl variants combine multiple conditioning signals (e.g., text, mask, reference) via static concatenation or additive adapters which cannot dynamically prioritize or suppress conflicting modalities, thus resulting in artifacts like color bleeding across mask boundaries, identity or style drift, and unpredictable behavior under multi-condition inputs. To address this, we propose Condition-Aware Routing of Experts (CARE-Edit) that aligns model computation with specific editing competencies. At its core, a lightweight latent-attention router assigns encoded diffusion tokens to four specialized experts--Text, Mask, Reference, and Base--based on multi-modal conditions and diffusion timesteps: (i) a Mask Repaint module first refines coarse user-defined masks for precise spatial guidance; (ii) the router applies sparse top-K selection to dynamically allocate computation to the most relevant experts; (iii) a Latent Mixture module subsequently fuses expert outputs, coherently integrating semantic, spatial, and stylistic information to the base images. Experiments validate CARE-Edit's strong performance on contextual editing tasks, including erasure, replacement, text-driven edits, and style transfer. Empirical analysis further reveals task-specific behavior of specialized experts, showcasing the importance of dynamic, condition-aware processing to mitigate multi-condition conflicts.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts (2026)
- RegionRoute: Regional Style Transfer with Diffusion Model (2026)
- SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens (2026)
- Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers (2026)
- PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback (2026)
- PokeFusion Attention: Enhancing Reference-Free Style-Conditioned Generation (2026)
- Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper