Installation of Flux Models through ComfyUI

This guide explores prompt engineering for high-quality image generation with Generative AI. It highlights the benefits of Flux by Black Forest Labs, compares it with other AI tools, and offers a step-by-step guide to installing and using ComfyUI for efficient and effective image creation.

Installation of Flux Models through ComfyUI

Generating High-Quality Images through Prompt Engineering has grown into an separate area of specialisation within the AI and ML space, primarily due to its potential to reduce operational overheads incurred by a business. Such overheads or expenses are incurred, often, in the form of monetary of human resources allocated to, lets say, create a piece of marketing content for social media distribution, among other business operations. Yet, with fairly substantial resources allocated to this task, the resultant output need not be of the highest quality. In 2024, to guarantee such expected quality from the on-set, business leaders started exploring Generative AI (GenAI) tools to create a foundation or structure for content creation, which would be improvised through skilled personnel for authenticity and relevance within the scope of the business; skilled labour which would otherwise be invested in creating content from scratch can now focus their attention on ensuring that the generated content can be improvised enough for distribution.

But when the value attainable is uncertain, business leaders would like to experience the utility of GenAI tools first; in this case, image generation capabilities, before breaking their bank. That said, premium tools come at a premium price, which inherently restricts such businesses to even explore utilitarian tools, let alone adopt them within their business. Next, business leaders look at open source alternatives, but implementation and support bottlenecks further causes a sense of reluctance in actually procuring a solution. But, if a repeatable play book exists, that breakdowns how one could unleash the power of such open-source models, business leaders would be more open to idea of embracing GenAI tools.

Through this article, we at Codemonk, are devising an open playbook for the effortless installation of ComfyUI to harness the power of Flux models by Black Forrest Labs, thereby gaining access to one of the most powerful image generation tools available today.

Let's dive into the world of Flux.

Flux is a powerful generative AI model developed by Black Forest Labs, designed for image generation by combining the strengths of transformer and diffusion models, leading to superior image quality and prompt adherence. Similar to contemporary large language models, Flux uses a transformer to encode text prompts into a numerical representation. The encoded prompts are then used to guide the generation process, where a noise-added image is gradually refined to match the desired content.

Prominent applications of Flux

Creative Arts: Generating unique and visually appealing images for various artistic purposes, wherein factors such as granularity and detail of the image can be controlled.

Design: Creating concept art, product designs, and visual assets for marketing and branding. Templatization of design artifacts and generation of repeatable artefacts.

Gaming: Developing high-quality game graphics and environments. Incorporate different design languages based on the theming engine employed.

Research: Assisting in scientific research by generating visual representations of data.

Here are a few examples that highlight Why Flux could be preferred over contemporaries.

Prompt - Focus on Photorealistic Style of Image Generation

Generate an Image for the following: 
Subject: a white colored, 5-door, Suzuki Jimny 
Action: standing on the roadside next to a coffee plantation with misty mountains in the backdrop. 
More Context: Include a man drinking coffee next to the Suzuki Jimny
Art form: **Photorealistic image**

GPT output:

DalleE-Jimmy

Positives

  1. Of the three images generated, GPT output looks more appealing to the eyes for the way misty mountains are added to the image.

Negatives

the prompt specified “a man drinking coffee next to the Suzuki Jimny,” but in the below image, the man appears to be seated on thin air - Violating the law of physics

The prompt specified “A suzuki Jimny, standing on the roadside next to a coffee plantation”, but the generated image showcases what looks like a tea plantation

Gemini Output:

Gemini_Jimny.

Positives

  1. Of the three images generated, Gemini seems to capture intricate details that are not mentioned in the prompt, but are present regardless such as details of within the leaves of the coffee plant, hinting at the improvisations made by Gemini over the given prompt

Negatives

the prompt specified “a man drinking coffee next to the Suzuki Jimny,” but in the below image, the man is missing

The prompt specified “A suzuki Jimny, standing on the roadside next to a coffee plantation, but the road appears to be missing and instead looks like Jimny is parked on a trench.

A foreign object (what looks like a hat) is placed on the Jimny, which should not exist in the image

Flux Output:

ComfyUI_00009_

Positives:

Among the 3 images generated, Flux seems to adhere to the proportionality & physical laws more accurately

The recreation of Suzuki Jimny, coffee plantations, and misty mountains as per actuals in the real world is the most accurate of the 3 comparisons.

Negatives: None as per prompt provided.

Lets take a look at another example of one prompt executed on the same three models

Prompt - Focus on Illustrative style of Image generation

Create an image for the following: 
Subject of the Image: A curious cat exploring a deserted alley
Action: Peering into a glowing box
Time and Day: Midnight under a full moon
Art Form: **Illustration**

GPT Output:

DalleE.

Positives:

  1. All of the queries mentioned as a part of the prompt is generated as an artefact within the image

Negatives:

Style of art is not illustration

Although the prompt specifies a deserted alley, the generate image showcases a well-lit alley, with buildings on both sides

Cat’s eyes look alien

Gemini Output:

Gemini.jpeg

Positives:

None (too many negatives)

Negatives:

Cat, box, and even the moon in the sky are not as per scale

Image resembles a painting more than an illustration

Characteristics within a specific artefact (cat’s head and feet) are not as per scale.

Flux Output:

ComfyUI_00008

Positives:

Most important factors among all, style of art retains the illustration style, as compared to the painting style image generations

Stays true to prompt and captures all queries added as a part of the prompt in terms of specificity

Captures scenic descriptions such as deserted alley in the most accurate manner possible

Negatives:

  1. cat is looking away from the box, as opposed to the prompt, which specifies the cat looking into the box

Inference: The above observations validate the accuracy with which each LLM recreates the text prompt into a definitive image. Athough GPT and Gemini present compelling cases, Flux retains the originality of thought conveyed through the given prompt, proving itself as the clear winner in in this comparison, in terms of accuracy and detail.

With applicability and demonstrations out of the way, lets look at how anyone could actually use Flux through ComfyUI.

Essential Prerequisites

Operating systems: Windows or Linux

Minimum Memory: 18 GB and above of unified memory (CPU + GPU) / Dedicated GPU is recommended with 24 GB combined memory

Software Requirements: Python 3 and above; Anaconda and pip packages; Pytorch

Do ensure at the start that the above prerequisites are met, before proceeding to install Flux.

Installing on Windows/Linux OS configurations

Open terminal on your computer, and check if the above packages are installed. Once verified, create a new directory where ComfyUI has to be installed.

Git clone this repo:

<https://github.com/comfyanonymous/ComfyUI.git>

Place your checkpoint files (GenAI - image generation models) within models/checkpoints, inside the ComfyUI folder

Checkpoint files: (Grab files suited to your PC configuration)

<https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main> 

Navigate to the VAE folder within ComfyUI directory, and place your VAE in: models/vae

AMD GPUs (Linux only)

AMD GPU users can install rocm and pytorch with pip if not already installed

pip install torch torchvision torchaudio --index-url <https://download.pytorch.org/whl/rocm6.0>

Below is the command line to install the nightly with ROCm 6.0:

pip install --pre torch torchvision torchaudio --index-url <https://download.pytorch.org/whl/nightly/rocm6.1>

NVIDIA

  • Nvidia users should install stable pytorch using this command:
pip install torch torchvision torchaudio --extra-index-url <https://download.pytorch.org/whl/cu121>

This is the command to install pytorch nightly instead which might have performance improvements:

pip install --pre torch torchvision torchaudio --index-url <https://download.pytorch.org/whl/nightly/cu124>

Upon completing the above steps, push the ‘requirements.txt’ through terminal using the command line below

pip install -r requirements.txt

Once installed, you are all set to launch ComfyUI running Flux models on your computer. To access ComfyUI, run the below command:

python [main.py](<http://main.py/>)

ComfyUI will run on terminal and provide a specific URL (127.0.0.1:8188) through which the GUI can be accessed. Navigating to the URL should open up a webpage resembling the below image.

ComfyUI

In the upcoming sequel to this article, we will be exploring the usability of ComfyUI, where we will be breaking down how an individual with less to no development experience can generate images like the above.

Stay Tuned for more interesting implementations of GenAI.

The Compute-Optimal Approach to Training large language models (LLMs)

Generative AI Vs Discriminative AI - GenAI Explorer

Interact with any media using video intelligence - A Codemonk showcase

Subscribe