Tuning-Free Image Customization with Image and Text Guidance

Pengzhi Li 1#, Qiang Nie 2,3#, Ying Chen 3, Xi Jiang 4, Kai Wu 3, Yuhuan Lin 3, Yong Liu 3, Jinlong Peng 3, Chengjie Wang 3, Feng Zheng 4*
1Tsinghua University 2HongKong University of Science and Technology (GZ)
3Tencent Youtu Lab 4Southern University of Science and Technology
ECCV 2024
T. Performance overview of the proposed method in image customization: (a) The proposed method enables the generation of any subject depicted in the reference image within the designated image region to be edited. Additionally, it allows for modifying the generated subject's attributes based on the user's text description. (b) Our method can extend to scenarios involving multiple subjects from different reference images and multiple regions to be edited. (c) Driven by text, the proposed method can transform the subject in the reference image into a different domain, such as converting it into a cartoon style.

Overview

1. We propose a tuning-free image customization framework, enabling content manipulation in the given region(s) of an image according to user-provided example images and text descriptions. 2. We propose a self-attention blending strategy for content customization, which addresses the issue of unintended changes in non-target area in previous image editing methods and achieves precise editing of specific theme attributes. 3. We propose a blended self-attention strategy for content customization, which addresses the issue 4. Our method outperforms previous approaches in human and quantitative evaluations, providing an efficient solution for numerous practical applications such as image synthesis, design, and creative photography.

Method

The pipeline of our method. Given an image $I$ to be edited and the target region(s) $R$ that needs edition, our goal is to synthesize an image $I_e$ that not only has the subject in the reference image(s) $I_r$ but also satisfies the description of text $T$ in a tuning-free manner. The text $T$ is utilized for controlling the attributes of the customized subject in $R$. This is a challenging task due to the following issues: (1) maintaining consistency in the non-target region between $I$ and $I_e$; (2) ensuring semantic coherence between the generated subject and the reference subject in the target region; (3) accurately controlling the attributes of the generated subject without changing the other part according to the text description; and (4) seamlessly integrating the generated subject in $R$ with the non-target region content in $I_e$. e

Experiments

Qualitative comparison with existing state-of-the-art methods. PBE and AnyDoor are methods guided only by images, while BLD uses text as the only guidance. To evaluate the efficiency of our method, we set up an additional group of two-step methods, including first using image stitching and harmonization followed by text guided image editing (DCCF + IP2P, MasaCtrl) and another method involving editing first and then harmonizing (IP2P + DCCF). These methods can only focus on text or image, global or local editing. Our method outperforms all these methods and overcomes their limitations, achieving text and image guided local editing and generation.

s

Potential applications

Some creative applications. As shown in the first row, given an indoor scene and a collection of materials, our method can edit the interior decorations and furnishings using reference subjects from the material library. Our method can also be applied to cross-domain graphic design creations, as shown in the second column, where cartoon characters are generated directly in real-world scenes.

s

More results

We show more visual comparison results. Our method outperforms all these methods and overcomes their limitations, achieving outstanding generative performance.

s s

Refer to the pdf paper linked above for more details on qualitative, quantitative, and ablation studies.

Citation

@inproceedings{li2024tuning,
      title={Tuning-Free Image Customization with Image and Text Guidance}, 
      author={Li, Pengzhi and Nie, Qiang and Chen, Ying and Jiang, Xi and Wu, Kai and Lin, Yuhuan and Liu, Yong and Peng, Jinlong and Wang, Chengjie and Zheng, Feng},
      booktitle={European Conference on Computer Vision},
      year={2024}
  }