Apple has introduced a new artificial intelligence model named MGIE, or MLLM-Guided Image Editing, which allows users to modify images using simple text instructions. The tool, developed in collaboration with researchers from the University of California, Santa Barbara, promises to transform the landscape of image editing by combining the power of multimodal large language models (MLLMs) with user-friendly interfaces.
“We propose MLLM-Guided Image Editing (MGIE) to enhance instruction-based image editing via learning to produce expressive instructions. Instead of brief but ambiguous guidance, MGIE derives explicit visual-aware intention and leads to reasonable image editing,” according to a published a conference paper at ICLR 2024.
“We conduct extensive studies from various editing aspects and demonstrate that our MGIE effectively improves performance while maintaining competitive efficiency. We also believe the MLLM-guided framework can contribute to future vision-and-language research,” the paper added.
MGIE can understand and execute complex editing tasks based on natural language prompts. The model interprets text commands to perform tasks ranging from basic photo adjustments like cropping and rotating to more sophisticated edits such as object manipulation and background replacement.
This new model leverages the strengths of MLLMs, known for their proficiency in processing and correlating information across different modalities, such as text and images. By incorporating these models into the image editing workflow, MGIE translates textual prompts into precise, actionable editing instructions. For instance, a request to “make the sky more blue” is interpreted as a directive to enhance the saturation of the sky region, demonstrating the model’s ability to understand and implement user intent with remarkable accuracy.
Designed to cater to a wide array of editing needs, MGIE is not only a tool for professional designers but also for anyone looking to refine their photos with ease. Whether it is adjusting the overall look of an image or making detailed modifications to specific elements, MGIE’s versatile capabilities are said to ensure high-quality results.
According to a report by Venture Beat, the tool is released on GitHub as an open-source project, making it available to developers and users who wish to explore its potential. The GitHub repository includes the code, data, and pre-trained models, along with a demo notebook that showcases various editing possibilities., the report added. Additionally, a web demo hosted on Hugging Face Spaces offers a hands-on experience for users to test the model’s capabilities in real-time. Apple remains tight-lipped about MGIE’s future beyond its research origins.