Alibaba’s Free Picture Technology Mannequin is Right here!


Is there one thing Qwen fashions can’t do? To date, their textual content and coding fashions are topping many of the charts and arenas. That’s the reason Alibaba’s Qwen crew received onto the “artistic” aspect. They’ve simply launched “Qwen-Picture” – a local textual content rendering picture technology mannequin designed to problem the supremacy of GPT-4.1, DALL-E 2, or Midjourney. The perfect half? It’s Free, and what’s even higher is that it’s accessible for everybody! On this weblog, we’ll offer you all the main points about Qwen-Picture, together with entry it, its efficiency, functions, and extra. 

Let’s examine if the Qwen-Picture is “Qwen-tastic” or not!

What’s Qwen-Picture?

Qwen Picture is the most recent Picture technology mannequin by Alibaba’s Qwen crew. It’s a 20 B MMDiT picture basis mannequin, which means that the mannequin consists of 20 billion parameters and is a multimodal diffusion transformer mannequin. Qwen-Picture is an open-weight text-to-image technology mannequin that presently ranks fifth on the Synthetic Evaluation Picture Area Leaderboard and is the one open-weight mannequin to be current within the high 10 record!

Artifical Analysis Image Arena
Supply: X

 How does the Qwen-Picture mannequin work?

The Qwen-Picture mannequin follows an method that was final seen in OpenAI’s GPT-4o. It makes use of an autoregressive transformer structure for picture technology and modifying. To do that, the  mannequin takes a twin encoding method: 

  • The Qwen2.5-VL encodes the semantic which means of the immediate
  • Picture technology occurs in a latent area utilizing MMDiT, a diffusion mannequin
  • The ultimate picture is produced from this latent area utilizing a VAE encoder. 

You may learn the total technical report of the Qwen-Picture mannequin right here.

Key Options of Qwen-Picture

A number of the key highlights that make Qwen-Picture stand aside are:

  1. Enhanced Textual content Incorporation: The Qwen-Picture fashions are distinctive in terms of incorporating advanced texts, whether or not in multi-line layouts, paragraphs, and even fine-grained particulars. It really works equally properly with each alphabetic languages (comparable to English) and logographic languages (like Chinese language), with the identical ease. 
  2. Environment friendly Picture Enhancing: The mannequin provides superior picture modifying capabilities. Throughout the modifying course of, the mannequin preserves each the semantic and visible which means of the particular photographs whereas incorporating the brand new modifications. 
  3. Ease of Use: The mannequin is simple to make use of and works properly even with easy prompts. 

These options, together with the wonderful efficiency of this mannequin, have been showcased on numerous benchmarks- making Qwen-Picture a formidable picture technology mannequin.

Tips on how to entry Qwen-Picture?

To entry the Qwen-Picture mannequin by Chat, 

  1. Head to https://chat.qwen.ai/
  2. Choose any of the non-coding fashions like Qwen-235B-A3B-2507 

3. Beneath the textual content field, in the course of the display, choose “Picture Technology”

    Enter your immediate within the textual content field and get began!

    You may entry the fashions in different methods, like:

    Qwen-Picture: Handson

    Now that we have now coated loads of particulars about Qwen-Picture, let’s take a look at it for 3 important duties:

    1. Producing a text-heavy Picture
    2. Producing an Infographic
    3. Enhancing an Picture

    Let’s begin with every of them one after the other:

    Activity: 1: Design a Internet Web page

    Immediate: Create a visually participating touchdown web page for a shampoo product. Spotlight the shampoo’s distinctive options (e.g., hydration, restore, or pure components) with a clear and fashionable design. Embrace a hero part with the shampoo bottle picture, a catchy headline like ‘Remodel Your Hair As we speak,’ and a call-to-action button (‘Store Now’ or ‘Be taught Extra’). Add sections for advantages, key components, buyer testimonials, and a subscription possibility. Use gentle, contemporary colours, high-quality visuals, and make sure the format is mobile-friendly and conversion-focused.”

    Output:

    Web design with Qwen Image

    The generated picture was good; it had loads of the textual content that I had requested to be included. It captured the essence of the immediate properly and designed your entire picture appropriately. However there have been just a few misses. Though spellings had been right, at one place a phrase was incomplete, and a few phrases that I had talked about weren’t included. I preferred the color theme that the mannequin selected for this activity.

    Activity 2: Create a Flowchart

    Immediate: “ Design a transparent, fashionable infographic that explains the picture technology technique of a 20B MMDiT basis mannequin in 3 steps:

    • Immediate Encoding: Present Qwen2.5-VL encoding the semantic which means of the consumer’s immediate.
    • Latent Area Technology: Visualize MMDiT diffusion creating an summary picture in latent area.
    • Ultimate Picture Creation: Illustrate a VAE decoder remodeling the latent illustration into the ultimate high-quality picture.

    Use icons, arrows, and brief labels for every step. The move needs to be visually logical and simple to observe, with a tech-inspired colour palette.”

    Output:

    Inforgraphic with Qwen Image

    I didn’t just like the output in any respect. The textual content was lacking in some locations and utterly imprecise at different locations. The icons and total picture felt a bit disoriented. The move from step 1 to 2 to three was there, however the picture is sort of unclear. 

    Activity 3: Picture Enhancing

    Enter picture:

    Input image

    Immediate: “Change the evening right into a sunny morning, exchange the person’s garments with an orange shirt and white shorts, and exchange the cat with a small pet.”

    Output:

    Image editing Qwen image

    This consequence was simply excellent. Actually Good. All of the modifications that I had requested for occurred within the picture. The lighting was appropriate, the garments and the animal had been all modified. A minor difficulty: whereas the mannequin changed evening with day, it didn’t take away the moon, though it made it appear like a spherical cloud. A really properly edited picture that took just some seconds to generate!

    My Overview Utilizing Qwen-Picture

    General, I actually preferred the modifying capabilities of the mannequin, however the picture technology, particularly incorporating a considerable amount of textual content or designing infographics, is the place Qwen-Picture would wish loads of enchancment going ahead – particularly if it needs to compete with the likes of OpenAI, Google, or X. 

    Frames

    But it surely has one actually cool function that many of the high fashions don’t. You may really choose the body dimension that you simply want to work with, proper from the textual content field! If you’re a content material creator, this actually would provide help to to create the “right-sized” picture for every of your social media platforms.

    Qwen Picture: Efficiency 

    Now that we have now examined the mannequin, let’s have a look at the outcomes that the Qwen crew has launched for the efficiency of the Qwen-Picture mannequin in opposition to its counterparts:

    1. For Picture Technology and Enhancing Benchmarks

    Image rendering Qwen image
    • Qwen-Picture mannequin leads or is at par with the most effective fashions in virtually all of the picture technology & modifying benchmarks. 
    • GPT-4.1 and Seedream3.0 are shut opponents of Qwen-Picture, matching its scores on a number of benchmarks.
    • FLUX.1 fashions are a superb competitors however lag behind the Qwen-image mannequin 

    2. For Textual content Rendering Benchmarks:

      Text rendering Qwen image
      • Qwen-Picture leads for textual content rendering in Chinese language and can be fairly forward for English languages
      • GPT4.1 – surpasses or matches Qwen-image at numerous benchmarks. 
      • Seeddream 3.0 is an in depth competitor however lags behind Qwen-Picture in each Chinese language and English benchmarks. 

      Conclusion:

      Qwen fashions are presently ruling the leaderboards for textual content and coding-based duties. Qwen-Picture holds related promise however shouldn’t be fairly there but. The mannequin adheres to prompts however struggles with large context. But it surely’s an incredible reward to the open-source neighborhood. It competes with the top-paid fashions whereas being utterly open-weight. As customers and builders use Qwen-Picture increasingly, we will quickly count on the Qwen-Picture mannequin to steer the Picture Technology Evaluation too!

      My closing thought – attempt the Qwen-Picture Mannequin. It’s good, we’re simply surrounded by loads of nice fashions to not realise its potential. 

      You can even examine Discovering the Greatest AI Picture Technology Mannequin.

      If you wish to examine different FREE picture technology fashions, you may discuss with the next weblog: Prime 7 AI Picture Turbines to Attempt in 2025.

      Anu Madan is an skilled in tutorial design, content material writing, and B2B advertising and marketing, with a expertise for remodeling advanced concepts into impactful narratives. Together with her deal with Generative AI, she crafts insightful, progressive content material that educates, conjures up, and drives significant engagement.

Login to proceed studying and luxuriate in expert-curated content material.