AI model for near-instant image creation on consumer-grade hardware

Sharlin · on Dec 10, 2024

For those wondering, it's an adversarially distilled SDXL finetune, not a new base model.

throwaway314155 · on Dec 10, 2024

Thanks! This article is pretty heavy with PR bullshit.

dcreater · on Dec 11, 2024

Typical university/science journalism written by a lay person without sufficient industry knowledge or scientific expertise

vidarh · on Dec 11, 2024

My favorite test of image models:

Drawing of the inside of a cylinder.

That's usually bad enough. Then try to specify size, and specify things you want to place inside the cylinder relative to the specified size.

(e.g. try to approximate an O'Neill cylinder)

I love generative AI models, but they're really bad at that, and this one is no exception, but the speed makes playing around with prompt variations to try to see if I get somewhere a lot easier (I'm not getting anywhere...)

James_K · on Dec 11, 2024

Careful how much you say that. I'm sure there's more than a few AI engineers willing to use some 3d graphics program to add a hundred thousand views of the inside of randomly generated shapes to the training set.

numpad0 · on Dec 11, 2024

That's okay. People will come up with new edge cases and keep hallucinating reasons why AI training is unethical, until the results would be up to their quality standards.

vidarh · on Dec 11, 2024

Hey, if they want to special case on my pet test case, then awesome. I'd like to be able to use them for this.

test6554 · on Dec 11, 2024

"I will make it legal"

jonplackett · on Dec 11, 2024

I have a favourite test for LLMs that is also still surprisingly not passed by many:

You walk up to a glass door. It has 'push' written on it in mirror writing. What should you do and why.

Very few can get it right, even fewer can get it right and explain the right reason. They’ll start going on about how mirror writing is secret writing and push written backwards is code for pull, rather than just that it’s a message for the person on the other side.

No version of Gemini has ever passed.

m741 · on Dec 11, 2024

Here's Gemini's response, for what it's worth:

'You should pull the door.

Since the word "push" is written in mirror writing, it's intended to be read from the other side of the door. Therefore, the intended action for someone on your side of the door is the opposite of "push," which is "pull."'

Seems right to me.

jonplackett · on Dec 11, 2024

Yeah does seem right. Which version are you using and any modified prompt? See my comment below for what I got out of the various geminis.

ryankrage77 · on Dec 11, 2024

GPT-4 got it right first try for me, with a slightly modified prompt:

> Here's a simple logic puzzle: You walk up to a glass door. It has 'push' written on it in mirror writing. What should you do and why?

> ChatGPT said:

> If the word "push" is written in mirror writing on the glass door, it means the writing is reversed as if reflected in a mirror. When viewed correctly from the other side of the door, it would read "push" properly.

> This implies that you are meant to pull the door from your side, because the proper "push" instruction is for someone on the other side of the door. Mirror writing is typically used to convey instructions to the opposite side of a glass surface.

jonplackett · on Dec 11, 2024

Yes GPT-4 was the first LLM that could get it! It's one of the reasons I really like GPT-4.

thot_experiment · on Dec 11, 2024

* llama3.3:70b-instruct-q3_K_M *

A clever sign!

Since the word "push" is written in mirror writing, that means it's intended to be read from the other side of the door. In other words, if you were on the other side of the door, the text would appear normally and say "push".

Given this, I should... pull the door open!

The reasoning is that the sign is instructing people on the other side of the door to push it open, which implies that from my side, I need to pull it open.

jonplackett · on Dec 11, 2024

Yeah I've tried Llama 3 too. It was the second one that could get it after GPT-4.

Some of the large Mistral ones can get it too and I think 8xMixtral can too.

Kiro · on Dec 11, 2024

All models I tested it on (4o, 4o mini and Gemini) answered it correctly without any strange reasoning.

jonplackett · on Dec 11, 2024

I am still not getting it with Gemini.

*Gemini Flash: The word "push" is written in reverse, which means it is meant to be read in a mirror. Therefore, you should pull the door open.

The mirror writing is a visual cue indicating the opposite action of what the word suggests.

So gets it, but wrong reasoning.

*Gemini Flash 2:

You should pull the door.

The mirror writing is a clue. If the word "push" is written backward as if reflected in a mirror, it means the door is designed to be pulled. The reflection indicates that the intended action is the opposite of what the word appears to say at first glance.

Again, right answer, wrong reasons.

*Gemini Advance:

You should push the door to open it. Here's why:

Mirror Writing: The word "push" being in reverse mirror writing is a deliberate design choice. It's meant to be read correctly from the opposite side of the door (where someone would be exiting). Standard Practice: Most glass doors are designed to swing outward for safety reasons (in case of emergencies where people need to evacuate quickly). Therefore, even though the writing might seem confusing at first, it's actually a helpful indicator that you need to push the door to enter.

Which is worse than flash! Complete nonsense.

mft_ · on Dec 12, 2024

Honestly, if you’ve accurately reproduced your prompt, I had trouble interpreting it initially too. “written on it in mirror writing” is quite ambiguous. Are the letters reflective, but on my side of the door? Is there a mirror with handwriting in it somehow? Oh… does it mean “push is written on the other side of the glass, facing away, so that the writing appears backwards, or mirrored”?

Kiro · on Dec 11, 2024

DALL-E gave me a much better picture than I expected. When googling "inside of a cylinder" I barely got anything and I had a hard time even imagining a picture in my head ("if I would stand inside a cylinder looking into the wall, how would it look like as a flat 2D image?").

vidarh · on Dec 11, 2024

Yeah, the inside of a cylinder google results explain a lot in terms of lacking training data. If you google "O'Neill cylinder" you'll get what I was actually after originally, and the generators do badly there too, even though there are more examples (but still way too few).

I think these kind of unusual requests will eventually need synthetic data, or possibly some way to give the model an "inner eye" by letting it build a 3d model of described scenes and "look at it", as there are lot of things like this that you can construct a mental idea of if you just work through it in your mind or draw it, but that most people won't have many conscious memories off unless you try to describe it in terms of something else.

E.g. for the cylinder example, you get better results if you ask for a tunnel - which often can be "almost" a cylinder. But trying to then nudge it toward an O'Neill cylinder, and it fails to grasp the scale or that there isn't a single "down", and starts putting openings.

mensetmanusman · on Dec 11, 2024

Also showing a wine cup overflowing is fun.

quikoa · on Dec 10, 2024

Github: https://chendaryen.github.io/NitroFusion.github.io/

Paper: https://arxiv.org/html/2412.02030v2

iLoveOncall · on Dec 10, 2024

> Instant image generation that responds as users type – a first in the field

Stable Diffusion Turbo has been able to do this for more than a year, even on my "mere" RTX 3080.

vidarh · on Dec 11, 2024

Notably, fal.ai used to host a demo here[1] that was very impressive at the time.

[1] https://fastsdxl.ai/

ajdjspaj · on Dec 10, 2024

What does consumer-grade mean in this context - is this referring to an M1 MacBook or a tower full of GPUs? I couldn't find in the paper or README.

whynotmaybe · on Dec 11, 2024

One Nvidia A100.

From the paper :

> We train using the AdamW [26] optimizer with a batch size of 5 and gradient accumulation over 20 steps on a single NVIDIA A100 GPU

So it's "consumer-grade" because it's available to anyone, not just businesses.

spott · on Dec 11, 2024

That is the training gpu… the inference gpu can be much smaller.

whynotmaybe · on Dec 11, 2024

I stand corrected.

Found on Yi-Zhe Song's Linkedin :

> Runs on a single NVIDIA 4090

https://www.linkedin.com/feed/update/urn:li:activity:7270141...

ajdjspaj · on Dec 11, 2024

Thanks!

ericra · on Dec 11, 2024

I wasn't able to get many decent results after playing with the demo for some time. I guess my question is...what exactly is this for? I was able to get substantially better results about 2 years ago running SD 2 locally on a gaming laptop. Sure, the images took 30 seconds or so each, but the quality was better than I could get in the demo. Not sure what the point of instantly generating a ton of bad quality images is.

What am I missing?

nomel · on Dec 11, 2024

Here's 2.1 demo, released 2 years ago, for comparison: https://huggingface.co/spaces/stabilityai/stable-diffusion

dcreater · on Dec 11, 2024

Nothing. This is useful as cool feature and for demos. Maybe some application in cheap entertainment

betenoire · on Dec 10, 2024

Here is the demo https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I

I'm unable to get anything that looks as good as the images in the README, what's the trick for good image prompts?

deckar01 · on Dec 10, 2024

I had the same issue, so I pulled in the SDXL refiner. Night and day better even at one step.

https://gist.github.com/deckar01/7a8bbda3554d5e7dd6b31618536...

betenoire · on Dec 10, 2024

thank you!

avereveard · on Dec 10, 2024

I get pretty close result with seed 0

paper https://i.imgur.com/l90WYrT.png

replication on hf https://i.imgur.com/MqN1Qwc.png

betenoire · on Dec 10, 2024

the imgur link is bad, but I hadn't noticed the prompt tucked away in those reference images and that helps. Thanks

(I had asked for a rock climber dangling from a rope, eating a banana, and they were wildly nonsensical images)

speerer · on Dec 10, 2024

I always just assume it's the magic of selection bias.

wruza · on Dec 11, 2024

The trick is called cherry picking. Mine the seed until you get something demo-worthy.

tgsovlerkhgsel · on Dec 10, 2024

The models seem to have gotten to a point where even something I can run locally will give decent results in a reasonable time. What is currently "the best" (both from an output quality and ease of installation perspective) setup to just play with local a) image generation, b) image editing?

LeoPanthera · on Dec 10, 2024

If you have a Mac, get "Draw Things": https://drawthings.ai/releases/

It supports all major models and has a native Mac UI, and as far as I can tell there's nothing faster for generation.

The "best" models, and a bunch more, are built-in. The state of the art is FLUX.1, "dev" version for quality, "schnell" version for speed.

SDXL is an older, but still good model, and is faster.

yk · on Dec 11, 2024

For runtime, I use ComfyUi [0] which is node based and therefore a bit hard to learn. But you can just look at the examples on their github. Foocus [1] also seems to be popular and a bit more conventional perhaps, though I didn't try it.

For models, Flux [2] is pretty good and quite straightforward to use. (In general, you will have a runtime and then you have to get the model weights seperately). Which Flux variant depends on your graphics card, the Flux.1 schnell should work for most decently modern ones. (And the website, civitai.com is a repository for models and other associated tools.)

[0] https://github.com/comfyanonymous/ComfyUI

[1] https://github.com/lllyasviel/Fooocus

[2] https://civitai.com/models/618692?modelVersionId=699279

Multicomp · on Dec 11, 2024

EasyDiffusion is almost completely download and run, i'm too lazy to setup comfyui, I just want to do model downloads -> run easy diffusion -> input my prompts into the web UI -> start cooking my poor graphics card

cut3 · on Dec 10, 2024

ComfyUI has all the bells and whistles and is node based which is wonderful. In comfyui you can use any of these and more:

Flux has been very popular lately.

Pony is popular especially for adult content.

SDXL is still great as it has lots of folks tweaking it. I chose it to make a comic as it worked well with LoRas trained on my drawings. (article on using it for a comic here https://www.classicepic.com/p/faq-what-are-the-steps-to-make...)

qclibre22 · on Dec 10, 2024

git clone https://github.com/lllyasviel/stable-diffusion-webui-forge.g...

download models and all vae files for the model, put in right place, run batch file, configure correctly and then gen images using browser.

LZ_Khan · on Dec 10, 2024

Edit: never mind seems like this recommendation is not the best

A1111 is a good place to start. Very beginner friendly UI. You can lookup some templates on Runpod to get started if you don't have a GPU.

someone else mentioned a local setup which might be even easier

BoredPositron · on Dec 10, 2024

A1111 is EoL.

nprateem · on Dec 10, 2024

The devil's in the details as always. A "cartoon of a cat eating an icecream on a unicycle" doesn't bring back any of the 6-pawed mutant cats riding a unicycle, etc. Still, impressive speed.

NikkiA · on Dec 11, 2024

It gave me plenty of cats with 3 front paws though

wruza · on Dec 11, 2024

Isn’t this a year old news?

It was called LCM/Turbo in SD and it generated absolute crap most of the times, just like this one. Which is likely yet another “ground-breaking” finetune of SD.

musicale · on Dec 12, 2024

> Surrey announces world's first AI model for near-instant image creation on consumer-grade hardware

Kind of like what you can do on an iPhone?

smusamashah · on Dec 11, 2024

Stream Diffusion already exists and gives you images as you type. Worked fine on RTX 3080

gloosx · on Dec 11, 2024

creation... wow, they really love themselves by choosing that vocabulary; to create is divine, and this AI model is merely generating.