Using BLIP for image recognition to produce the prompt, hoping that it could improve the quality somehow. It didn't. Stable Diffusion 3 yielded some interesting results, but most people looked... ruined.
Prompt prepend: "Extremely realistic and detailed photograph"
BLIP Model Loader
Salesforce/blip-image-captioning-base
Salesforce/blip-vqa-base
BLIP Analyze Image: min_length 4, max_length 128, num_beams 5, no_repeat_ngram_size 3, early_stopping false
Load Checkpoint: Stable Diffusion 3 Medium T5XXL
Upscale Latent: nearest-exact, 1920x904, center
KSampler: seed 1337, steps 64, cfg 4, heun++2, sgm_uniform, denoise 0.65
Original file information:
MKV, 28,848,795,512 bytes
Video: FFV1, 70.09 fps, 1920x1080
Audio: pcm_s16le, 48000 Hz, stereo
Software used:
DosBOX
ffmpeg
Python scripts
Pixel Composer
ComfyUI