How Midjourney Evolved Over Time (Comparing V1 to V6 Outputs)

H-Tech News

January 8, 2024

How Midjourney Evolved Over Time (Comparing V1 to V6 Outputs)

It is arduous to imagine that, two years in the past to this date, AI was largely handled as science fiction.

It wasn’t till November of 2022 that ChatGPT turned publicly obtainable. DALL-E was solely accessible to a choose few. DeepMind and OpenAI had been the one two corporations that had been closely investing in deep studying.

One of many earliest mainstream AI merchandise was launched early that 12 months: Midjourney. It now has tens of millions of every day customers worldwide. With its newest mannequin, we’re witnessing how superior and terrifying AI artwork may be for the longer term.

However it hasn’t at all times been that approach.

Midjourney had a difficult begin, to say the least. Now, sufficient time has handed that we are able to look again at its enhancements during the last 23 months. Here’s what Midjourney seemed like two years in the past, in comparison with the place it’s at the moment:

Midjourney’s Evolution By Photos

Individuals who had been late within the sport by no means skilled the tough beginnings of Midjourney. There was a time when folks questioned if it was actually value pursuing AI picture era due to poor outcomes from each DALL-E and Midjourney. Listed below are some reminders of how far we have come since then:

Portraits – Day

prime quality images of a younger Japanese girl smiling, backlighting, pure pale mild, movie digicam, by Rinko Kawauchi, HDR

There’s not a lot distinction between V1, V2, and V3. The pictures produced by these fashions are a whole mess, however they seem to be a product of their time. It was a interval the place the one accessible AI picture fashions had been the primary iteration of DALL-E (which was acquired higher by critics) and a few early makes an attempt at creating reasonable photographs from a dataset like ThisPersonDoesNotExist.

V4 was Midjourney’s actual turning level. It removed the jigsaw-like faces and changed it with a more in-depth approximation of how a human face ought to appear to be. Nevertheless, it nonetheless had points with overemphasis. For instance, once I specified that I needed a Japanese girl as my topic, V4’s first intuition was to go overboard with monolid eyes (all of the variations’ eyes appear to be the one depicted above).

V5 is ten occasions higher than V4. My solely problem with it, as I’ve talked about in my earlier articles, is that it tends to create flawlessly easy faces, that are lifeless giveaways that a picture is AI. V6 solved this problem by creating extra reasonable facial options and an asymmetrical construction.

Portraits – Night time

portrait, a fantastic younger girl, glamour road medium format images, female, shot on cinealta, evening, pastel hues

Every little thing that I’ve already stated above applies on this set of images as effectively. An absence of logical construction characterizes V1 to V3, however you’ll be able to nonetheless decide what the mannequin is making an attempt to make. V4 is the actualization of these ideas: creating coherent and extra reasonable portraits, though a bit uncanny.

V5, once more is the place it begins to turn out to be higher, however the topic continues to be too excellent. V6’s topic and background particulars are much more delicate, which makes for higher realism whereas growing its creativity.

Panorama

panorama, an autumn within the lake throughout nightfall, tranquility

V1 is definitely a bit amusing since you’ll be able to clearly see a Shutterstock brand on the bottom-left nook, exhibiting us the place the Midjourney group initially sourced the coaching knowledge and an perception into how they refined their dataset pre-processing. V2 and V3 is much more coherent right here than their counterparts, however they nonetheless cannot generate HD photographs. The reflections on the water are additionally inconsistent.

V4 is extra inventive, but it surely nonetheless has some nuance points, as seen within the bushes submerged within the lake. V5 perfected reflections however nonetheless hasn’t resolved its realism points but. After which now we have V6, which precisely emulates actual images by including little particulars corresponding to small waves and pure sky gradients.

Meals Pictures

a photorealistic cheeseburger, white clear background, industrial images

If I had been to explain V1 to V3’s photographs in a sentence, I might say it is what aliens should assume a cheeseburger appears to be like like. V1 and V2’s burgers, particularly, do not even have patties — solely onions and an enormous block of cheese.

Then V4 creates an virtually excellent burger, however the proportions appear a bit off and it seems to have a texture resembling Play-Doh. If I had been to nitpick V5‘s output, I might say there are just a few sesame seeds on the backside when there should not be.

When you’re in search of a photorealistic cheeseburger, V6 will not disappoint you.

Product Pictures

industrial images, a girls’s necklace with a sunflower pendant, minimal background, pure mild

Product Photography - Midjourney V1 — V1

If there’s something that the sooner variations of Midjourney lack, it is construction. Within the photographs above, it is clear that it does not see form the best way we do, and that problem does not get resolved till V4.

On this case, I am proud of V4, V5, and V6‘s outputs. They’re all good product mockups in their very own proper, even when that they had totally different interpretations of my output.

Pixel Artwork

pixel artwork scene, the eiffel tower at midnight, metropolis lights, romantic

This could be controversial however I believe V4 has one of the best pixel artwork paintings right here. The dimensions of the “pixels” are extra constant and the artwork fashion jogs my memory quite a lot of earlier 8-bit video games. That stated, I nonetheless choose V5 and V6’s outputs visually. The one factor weighing them down is the inconsistency of pixel sizes, which is extra obvious within the former’s output in case you zoom in.

Animation

anime film nonetheless, studio ghibli, a girl going to the seashore alone

It happens to me that immediate comprehension is not an enormous problem with the sooner variations of Midjourney, a minimum of for easy prompts. In fact, they’re nonetheless unpolished, however you’ll be able to see that they’ve managed to grasp “how” to create what I am asking for, they simply did not have the instruments to make it.

V4 is a large step up but it surely’s nonetheless a low-resolution. As for V5, there is no seashore on the planet the place its waves bodily make sense, and it does not resemble Studio Ghibli paintings. V6 manages to seize the hand-drawn realism of Studio Ghibli anime movies whereas creating a reasonably darn good animation nonetheless.

Textual content Technology

evening images, a neon signal outdoors a restaurant saying “Dinner is served”

One thing bizarre that I observed on this comparability is how shut V2 and V3 are to writing “Dinner is served,” which means that Midjourney should’ve pulled its focus away from textual content era once they rolled out with V4 and V5.

I’ve already stated that is in my different V6 articles, however Midjourney is without doubt one of the greatest AI picture fashions relating to textual content, and its output above proves that time additional.

A number of Topics [High Context]

a rabbit, a porcupine, two cats, and a wizard having a tea social gathering:: 90s animated television sequence

None of those photographs nailed the immediate in any respect, however V6 is the closest one. It has two rabbits (as an alternative of 1), a cat (who additionally occurs to be a wizard), and a few kind of cat-porcupine hybrid. Midjourney continues to be removed from DALL-E 3’s nuance, but it surely’s getting there.

Some Observations

After going by way of all these photographs, I’ve come to the conclusion that every Midjourney mannequin should have targeted on just a few facets each time they’ve upgraded after V3. To be extra particular:

V4: Immediate cohesion and output construction. Determining how you can put shapes and concepts collectively to create a coherent picture.
V5: As soon as they’ve discovered how you can create coherent photographs, they improved the generator’s general creativity.
V6: That is one in every of their greatest updates to this point, with vital enhancements on realism, textual content era, and understanding.

The Backside Line

By these photographs, we are able to clearly see how Midjourney has improved during the last two years. It is not solely higher than most AI picture turbines, however it may well additionally genuinely create artwork higher than folks.

Midjourney V6’s realism, creativity, and pace of enchancment are each fascinating and horrifying. For us hobbyists and reviewers, it is a cool product for creating paintings. For artists and the world normally, it has the potential to erase jobs and gasoline faux information due to deepfakes.

However that is not for a minimum of a few years. For now, let’s simply get pleasure from what Midjourney has to supply. Have enjoyable prompting!