Recent retouching methods based on deep learning rely on pairwise training data, which may have limited generalization ability when applied to new input images that were not seen during the training stage. Obtaining sufficient paired training images can also be challenging since manual enhancements require significant effort and may be difficult for non-experts who lack the necessary skills. In addition, recent end-to-end approaches produce deterministic results, which may not be suitable for retouching, an ill-posed problem with multiple reasonable solutions depending on the user's preferences and appropriate styles. Therefore, it is crucial to design a retouching approach that can fit the styles of high-quality retouched images, allowing for easy generalization to complicated input domains while maintaining high performance and stability.
To achieve this objective, the proposed model should be able to generate multiple reasonable results in the high-quality output domain. Results can also be controlled by multi-modal conditions, such as text prompts or guided images. The combination of text and images can represent the retouching preferences and content styles desired by users. In conclusion, this research aims to develop a multi-modal retouching method with good generalization capability to complex input domains.
- Constructing high-quality retouching datasets of at least four specific styles.
- A multimodal retouching model that establishes mappings between multi-modalities and retouching results, while holds good generalization capability to complex input domain and stability compared with SOTA.
- The proposed model should demonstrate good performance in generating retouching results with the aforementioned styles, which will be evaluated through human study and quantitative comparison.
Related Research Topics
- Multimodal image color editing
- Text-guided image colorization
- Representative learning for controlling color transformations