OpenAI has built accurate image generation directly into ChatGPT. The new feature, called 4o Image Generation, relies on GPT-4o, a multimodal large language model. It understands context, complex instructions, object interactions, and even generates artifact-free text labels. It will be available to everyone today.
Image source: OpenAI
ChatGPT was already able to generate images using the Dall-E 3 neural network. However, the updated function works much better and more accurately. OpenAI representative Taya Christianson clarified that the limits for free users will remain the same as DALL-E, i.e. three images per day. Access to DALL-E is still possible through the ChatGPT user interface.
As head of research Gabriel Goh noted, using GPT-4o allows the AI to work with any type of data — text, images, audio, and video. In addition, Sora received a key improvement, which is the correct relationship between attributes and objects (binding). Goh explained that most AIs get confused when processing 5-8 elements. For example, an AI can be asked to draw a blue star and a red triangle, but create a red star and something other than a triangle. 4o Image Generation copes with 15-20 objects without errors.
Users will also notice an improvement in text rendering, which allows images to generate readable, typo-free text. In existing image generation tools, text was often distorted, and achieving quality rendering in this sense was a serious challenge, as even small errors in headings or text elements could render the entire image completely unusable.
Generated by request “make a very colorful risograph on how to make matcha”
The system also now uses a non-standard generation method. Images are created sequentially, from left to right and top to bottom, rather than as a whole, as is the case in DALL-E. According to Guo, this explains the superiority of 4o Image Generation in handling text and complex scenes.
OpenAI demonstrated 4o Image Generation’s capabilities on scientific diagrams, such as Newton’s prism experiment, comics, and posters. It also demonstrated practical applications in creating images with transparent backgrounds for stickers, restaurant menus, and logos. 4o Image Generation successfully completed all tasks, without introducing any errors in the text.
4o Image Generation is also capable of editing images uploaded by the user based on simple requests, adding elements to them or removing them.
Example of adding elements to a photo using GPT-4o
The new system does take longer to generate images than previous systems, but OpenAI sees that as a worthwhile tradeoff. “While we certainly have room to improve response times, the quality of these images, the capabilities, the knowledge of the world really makes up for the extra seconds of waiting,” the company said.
When asked about security measures, citing the controversial Taylor Swift deepfakes created using Microsoft’s model, xAI’s Grok’s ability to depict Kamala Harris with a gun, and Google Gemini’s watermark removal, the OpenAI team emphasized the robust mechanisms in place to protect against abuse.
OpenAI design director Jackie Shannon said the tool prevents watermark removal, blocks the generation of deepfakes involving the human body, and denies requests for the creation of child sexual abuse material (CSAM). Shannon also explained that all generated images will include standard C2PA metadata to mark the image as being created by OpenAI.