Let’s take a fast historical past lesson and look again on the state of AI picture era a yr in the past. We could not reliably generate faces, DALL-E 2 had simply been launched just a few months prior and had blended outcomes, Midjourney V4 was beginning to make some noise, and Steady Diffusion’s main the best way with 2.0.
In only a yr, AI artwork has been practically excellent besides for 2 vital roadblocks: nuance and textual content era.
Quick ahead to at the moment: we simply had DALL-E 3 just a few months again, and earlier this week, Midjourney V6 was lastly launched. Can these lastly be the AI picture mills that deal with textual content completely? Let’s discover out.
Why Midjourney and DALL-E 3?
For some time now, DALL-E 3 has been the one AI picture generator that may constantly create photographs with textual content. It is one among their most important promoting factors, together with improved creativity and nuance. It is even showcased on their announcement web page with this picture:
Not too long ago, Midjourney unveiled its latest mannequin: V6. And what are you aware, they’re additionally highlighting higher nuance, creativity, and, most significantly, minor textual content drawing as their enhancements. I’ve at all times averted utilizing textual content era when evaluating Midjourney in opposition to different mills as a result of it will be unfair, however now that we’re getting this function, it solely is sensible to pit it in opposition to the perfect.
Head-to-Head: Midjourney vs. DALL-E for Textual content Technology
Every comparability will give attention to textual content, however we’ll additionally analyze their nuance and creativity in making use of the textual content. So, with out additional ado, this is a direct comparability of Midjourney and DALL-E 3 utilizing the identical prompts:
Easy Textual content
Textual content: “That is textual content.”
When it comes to the textual content itself, Midjourney carried out higher than DALL-E 3 due to a small mistake the latter made when writing the final a part of the textual content. Nevertheless, DALL-E reveals extra cohesion as a picture as a result of the trainer in Midjourney is utilizing a pen to jot down on a chalkboard.
Winner: Midjourney V6
Lengthy Textual content
Textual content: “The fast brown fox jumps over a lazy canine, and promptly tripped over the canine’s tail, incomes a disgruntled grumble.”
Each tried so as to add their very own aptitude to a easy immediate (a bit of paper with writing on it), however neither truly made readable textual content. This reveals that AI picture mills can write brief phrases or sentences, however they worsen as you add extra phrases.
Winner: None.
Keyboard
For this one, I did not ask both mannequin to jot down a selected phrase or sentence, however I tasked them to generate an correct QWERTY keyboard. Clearly, neither is definitely right, however DALL-E could not even organize the letters correctly, whereas Midjourney in some way received the right placement for greater than half the letters.
Winner: Midjourney V6
Brand
Textual content: “Matcha.”
Each of those photographs display an incredible understanding of my authentic immediate (a inexperienced espresso mug brand) and showcase creativity. There’s nothing flawed with both textual content both, and it even matches the artwork model every generator created for his or her brand.
Winner: Tie spherical.
Postcards
Textual content: “Joyful Halloween.”
As AI picture fashions evolve, I’ve to be extraordinarily nitpicky with how I choose their textual content era prowess. Living proof: I’d like to make this a tie spherical, however the minor errors on DALL-E’s output (triple Ls in “Halloween” and inconsistent coloring in “Joyful”) prevents me from doing so.
I’ll say this although: I choose DALL-E’s postcard over Midjourney.
Winner: Midjourney V6
Indicators
Textual content: “Bacon and Eggs.”
This can be a clear win for DALL-E. Midjourney V6 tried its finest, however the pointless and out-of-place yellow “and” signal stops this spherical from changing into a tie.
DALL-E additionally reveals superb nuance this spherical by turning “and” to an ampersand and making a separate “Diner” neon signal with out me asking. It isn’t simply readable; it is also artistic, distinctive, and immersive.
Winner: DALL-E 3
E book Covers
Textual content: “Shapes and Stuff.”
I will admit: DALL-E 3 created a significantly better guide cowl than V6. Nevertheless, the guide title generated by DALL-E has far too many errors, so I’ve to provide this level to Midjourney, which completely rendered “Shapes and Stuff” in a constant font. V6’s cowl design additionally showcases its improved comprehension by highlighting the textual content’s key phrases.
Winner: Midjourney V6.
Comedian Panel
Textual content: “Knock knock!”
Midjourney V6 and DALL-E 3 each made minor errors in writing the textual content. Since each of those are nonetheless readable and their paintings is amazingly executed, I am declaring this spherical one other tie.
Winner: Tie spherical.
Surreal Settings
Textual content: “To infinity”
Simply to offer slightly background: my immediate for this spherical explicitly states that the textual content ought to be composed of stars. Though I discussed that the main focus can be on the textual content itself, which Midjourney did higher this spherical, DALL-E’s minor mistake will not forestall me from awarding this level to them as a result of they did, in reality, create the textual content utilizing stars.
Winner: DALL-E 3
The Last Tally and Observations
Nearly excellent textual content, and showcases a excessive stage of nuance and creativity. |
Excellent textual content, and showcases a very good stage of nuance and creativity. |
|
Letters aren’t positioned in the correct order. |
Round half of the letters are positioned within the right order. |
|
Excellent textual content, and showcases a excessive stage of nuance and creativity. |
Excellent textual content, and showcases a excessive stage of nuance and creativity. |
|
Nearly excellent textual content, and showcases a excessive stage of nuance and creativity. |
Excellent textual content, and showcases a excessive stage of nuance and good creativity. |
|
Excellent textual content, and showcases an extremely excessive stage of nuance and creativity. |
Nearly excellent textual content with a noticeable mistake. Showcases good stage of nuance and creativity. |
|
A superb try with just a few noticeable errors. Showcases nice stage of creativity. |
Excellent textual content, and showcases a very good stage of nuance and creativity. |
|
Nearly excellent textual content, and showcases an extremely excessive stage of nuance and creativity. |
Nearly excellent textual content, and showcases an extremely excessive stage of nuance and creativity. |
|
Nearly excellent textual content, and showcases a excessive stage of nuance and creativity. |
Excellent textual content however reveals low understanding of the immediate. |
One issues I’ve observed on this testing is that DALL-E 3 seems to have the next error charge in comparison with Midjourney. Then again, Midjourney tends to lack the identical stage of creativity and nuance when tasked with producing photographs that particularly asks for textual content. I imagine that V6 is compromising a portion of its creativity when fed with prompts that explicitly focuses on textual content era.
Wrapping Up
This face to face is quite a bit nearer than I anticipated, however Midjourney V6 pulls via with a win. Nevertheless, like I mentioned earlier, V6’s improved however nonetheless restricted nuance is stopping it from producing textual content whereas making full use of its creativity.
Nevertheless, that is to be anticipated as a result of this is not the ultimate model of V6 but. Midjourney is barely going to get higher from right here as they progressively enhance the mannequin behind it. There is not any concrete information on DALL-E 4 but, however we are able to anticipate the identical enhancements for that mannequin too. However for now, Midjourney’s the one main the area in textual content era undoubtedly.
That is it for this direct comparability. When you’re in search of extra articles about V6 and DALL-E 3, I extremely recommend studying this text. Good luck!