ComfyUI custom node for ComfyUI_WolfSpectrum

Python 100%

Find a file

Repository files (latest commit first)
Filename	Latest commit message	Latest commit date
Balazs Horvath 0bf42ab23e Enhance KV Cache Integration for FLUX.2 Klein Models - Implemented KVCacheModelPatcher for monkey-patching ComfyUI's ModelPatcherDynamic, enabling robust KV cache support. - Improved device detection in Flux2KVCacheModel to handle various model types effectively. - Added comprehensive lifecycle management for patching, including logging for better traceability. - Created a new kv_cache/patchers module to encapsulate patching logic. - Integrated patcher into WolfSpectrumKVCacheUnified for seamless model call routing. This update significantly enhances the KV cache functionality, ensuring efficient model interactions and improved performance.		2026-03-13 01:21:55 +01:00
assets	blep	2026-03-12 00:39:59 +01:00
comfy	Refactor KV Cache Implementation for FLUX.2 Klein Models	2026-03-13 00:47:07 +01:00
docs	Refactor KV Cache Implementation for FLUX.2 Klein Models	2026-03-13 00:47:07 +01:00
kv_cache	Enhance KV Cache Integration for FLUX.2 Klein Models	2026-03-13 01:21:55 +01:00
nodes	Enhance KV Cache Integration for FLUX.2 Klein Models	2026-03-13 01:21:55 +01:00
spectrum	Refactor KV Cache Implementation for FLUX.2 Klein Models	2026-03-13 00:47:07 +01:00
tests	Refactor KV Cache Implementation for FLUX.2 Klein Models	2026-03-13 00:47:07 +01:00
.gitignore	blep	2026-03-12 01:22:11 +01:00
__init__.py	Implement KV Cache Control for FLUX.2 Klein models	2026-03-12 20:52:16 +01:00
CHANGELOG.md	Enhance KV Cache Integration for FLUX.2 Klein Models	2026-03-13 01:21:55 +01:00
README.md	Implement KV Cache Control for FLUX.2 Klein models	2026-03-12 20:52:16 +01:00
TAU_MAPPING_GUIDE.md	blep	2026-03-12 01:22:11 +01:00
ty.toml	the broken stuff never ends	2026-03-11 20:25:00 +01:00

README.md

ComfyUI_WolfSpectrum

Training-free diffusion sampling acceleration via Spectrum — adaptive spectral feature forecasting for ComfyUI. Skips full transformer forwards on selected steps by predicting intermediate features using Chebyshev polynomial regression, blended with a discrete Taylor predictor.

Reference: Han et al., "Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration", CVPR 2026, arXiv:2603.01623.

How It Works
Installation
Usage
Parameters
Step Schedule
Mathematical Specification
Supported Models
KV Cache Control
Architecture
Performance
Troubleshooting
Adding New Models
Dependencies

How It Works

Standard diffusion sampling runs the full transformer denoiser at every step. Spectrum observes that the intermediate hidden states (the pre-output features of the final layer) evolve smoothly across timesteps. Instead of recomputing them from scratch each step, Spectrum fits a Chebyshev polynomial to the history of observed features and extrapolates forward — then runs only the cheap output head on the predicted feature.

flowchart TD
    A[Diffusion Step t] --> B{Warmup or<br/>Actual Step?}
    B -- Yes --> C[Full Transformer Forward]
    C --> D[Hook captures h at final_layer]
    D --> E[Update Chebyshev forecaster]
    E --> F[Return output]
    B -- No --> G[Predict h̃ via Chebyshev + Taylor blend]
    G --> H[Compute vec_orig<br/>timestep + guidance embeddings]
    H --> I[Run output head only<br/>head h̃ vec_orig]
    I --> J[Postprocess to latent shape]
    J --> F

The key insight is that final_layer is cheap — it's a single linear projection. The expensive part is the stack of double and single transformer blocks before it. Spectrum skips those on cached steps.

Installation

cd ComfyUI/custom_nodes
git clone <repo_url> ComfyUI_WolfSpectrum

Restart ComfyUI. The node Apply Spectrum will appear under model/spectrum.

No extra pip install is required if ComfyUI already runs Flux or Chroma.

Usage

Load a supported model (Flux, Flux2, Flux2-Klein, or Chroma).
Add the Apply Spectrum node and connect your model to it.
Connect the output model to your sampler as usual.
Set parameters (defaults are a safe starting point).
Run — fewer full transformer passes will be executed after warmup.

graph LR
    A[Load Model] --> B[Apply Spectrum]
    B --> C[KSampler / SamplerCustomAdvanced]
    C --> D[VAE Decode]
    D --> E[Image]

Parameters

Parameter	Default	Range	Description
w	0.5	0.0–1.0	Blend weight: 0 = pure Taylor, 1 = pure Chebyshev. Recommended 0.5–1.0.
lam	0.1	1e-5–1.0	Ridge regularisation λ for the Chebyshev least-squares fit.
m	4	1–10	Number of Chebyshev bases (polynomial order M; P = M+1 coefficients).
window_size	2.0	1.0–10.0	Initial window size `\mathcal{N}` — how many steps between actual forwards.
flex_window	0.75	0.0–2.0	`\alpha`: window size increment per actual step (adaptive scheduling).
warmup_steps	5	0–30	Number of initial steps that always run the full transformer.
taylor_order	1	1–3	Order of the discrete Newton–Taylor predictor used in the blend.

Quick tuning guide:

More speed, lower quality: increase window_size, decrease warmup_steps.
More quality, less speed: increase warmup_steps, decrease window_size.
Noisy/corrupted images: increase warmup_steps to at least M+2 (= m+2), or increase lam.
Short runs (< 10 steps): Spectrum is not beneficial below ~8 steps; use it only for 14+ step inference.

Step Schedule

The schedule determines which steps run the full transformer vs. the cached head.

Warmup phase

For the first warmup_steps steps, the full transformer always runs. This populates the Chebyshev sliding window with enough observations to fit a stable polynomial. A minimum of M+2 actual observations is required before any cached step is attempted.

Post-warmup: adaptive window

After warmup, the step is an actual forward if:

\text{actual\_forward} = \bigl(n_{\text{cached}} + 1\bigr) \bmod \lfloor \mathcal{N} \rfloor = 0

where n_{\text{cached}} is the number of consecutive cached steps since the last actual forward, and \mathcal{N} is the current window size.

After each actual forward, the window grows:

\mathcal{N} \leftarrow \mathcal{N} + \alpha

This means caching intervals lengthen as the run progresses (features change more slowly near the end of denoising).

Example: 14 steps, warmup=3, window=2.0, flex=0.75

gantt
    title Step Schedule (A = Actual Forward, C = Cached)
    dateFormat  X
    axisFormat %s

    section Steps
    A Warmup 0   :milestone, 0, 0
    A Warmup 1   :milestone, 1, 1
    A Warmup 2   :milestone, 2, 2
    C Cached 3   :done, 3, 4
    A Actual 4   :milestone, 4, 4
    C Cached 5   :done, 5, 6
    A Actual 6   :milestone, 6, 6
    C Cached 7   :done, 7, 8
    C Cached 8   :done, 8, 9
    A Actual 9   :milestone, 9, 9
    C Cached 10  :done, 10, 11
    C Cached 11  :done, 11, 12
    C Cached 12  :done, 12, 13
    A Actual 13  :milestone, 13, 13

Result: 7 actual forwards out of 14 steps ≈ 2× speedup on transformer compute.

Mathematical Specification

Feature Observation

At each actual step i, the forward pre-hook on final_layer captures:

h_i = \text{final\_layer\_input}(\mathbf{x}_{t_i}) \in \mathbb{R}^{B \times L \times C}

This is stored flattened as \bar{h}_i \in \mathbb{R}^{B \times F} where F = L \cdot C.

Chebyshev Basis

Diffusion timesteps t \in [0, 50] are mapped to \tau \in [-1, 1]:

\tau = \frac{2(t - t_{\min})}{t_{\max} - t_{\min}} - 1, \quad t_{\min} = 0,\ t_{\max} = 50

The Chebyshev basis evaluates M+1 polynomials via the recurrence:

T_0(\tau) = 1, \quad T_1(\tau) = \tau, \quad T_m(\tau) = 2\tau T_{m-1}(\tau) - T_{m-2}(\tau)

The design matrix for K observed timesteps is:

\mathbf{X} = \begin{bmatrix} T_0(\tau_1) & T_1(\tau_1) & \cdots & T_M(\tau_1) \\ \vdots & & & \vdots \\ T_0(\tau_K) & T_1(\tau_K) & \cdots & T_M(\tau_K) \end{bmatrix} \in \mathbb{R}^{K \times P}, \quad P = M+1

Ridge Regression

Coefficients are fit by ridge regression (regularisation \lambda):

\hat{\mathbf{C}} = \bigl(\mathbf{X}^\top \mathbf{X} + \lambda \mathbf{I}_P\bigr)^{-1} \mathbf{X}^\top \mathbf{H}

where \mathbf{H} \in \mathbb{R}^{K \times B \times F} is the history buffer (reshaped to K \times BF for the solve), and \hat{\mathbf{C}} \in \mathbb{R}^{P \times B \times F}.

The solve uses a Cholesky factorisation for numerical stability:

\mathbf{L}\mathbf{L}^\top = \mathbf{X}^\top \mathbf{X} + \lambda \mathbf{I}_P, \quad \hat{\mathbf{C}} = \text{cholesky\_solve}(\mathbf{X}^\top \mathbf{H},\ \mathbf{L})

Chebyshev Prediction

At a target step t^*, the Chebyshev prediction is:

\hat{h}^{\text{cheb}}(t^*) = \mathbf{x}^* \hat{\mathbf{C}}, \quad \mathbf{x}^* = \bigl[T_0(\tau^*),\ T_1(\tau^*),\ \ldots,\ T_M(\tau^*)\bigr] \in \mathbb{R}^{1 \times P}

Discrete Taylor Predictor

The Newton forward-difference predictor of order d uses the last d+1 observations:

First order (d=1):

\hat{h}^{\text{taylor}}(t^*) = h_i + k \cdot \Delta h_i, \quad k = \frac{t^* - t_i}{t_i - t_{i-1}}, \quad \Delta h_i = h_i - h_{i-1}

Second order (d=2):

\hat{h}^{\text{taylor}}(t^*) = h_i + k \Delta h_i + \frac{k(k-1)}{2} \Delta^2 h_i, \quad \Delta^2 h_i = h_i - 2h_{i-1} + h_{i-2}

Third order (d=3):

\hat{h}^{\text{taylor}}(t^*) = \hat{h}^{(2)}(t^*) + \frac{k(k-1)(k-2)}{6} \Delta^3 h_i

Blended Prediction

The final prediction blends Chebyshev and Taylor with weight w \in [0,1]:

\hat{h}(t^*) = (1-w)\,\hat{h}^{\text{taylor}}(t^*) + w\,\hat{h}^{\text{cheb}}(t^*)

This is then unflattened back to (B, L, C) and fed to the output head.

Cached Forward

Given \hat{h}(t^*), the cached step computes:

\mathbf{v} = \text{time\_embed}(t^*) + \text{guidance\_embed}(g) + \text{vector\_embed}(\mathbf{y})

\text{output} = \text{postprocess}\bigl(\text{final\_layer}(\hat{h},\ \mathbf{v})\bigr)

This replaces the full transformer forward at a fraction of the cost.

Supported Models

Model	Adapter	Hook Target	Notes
Flux.1-dev	`FluxAdapter`	`model.final_layer`	Guidance embedding enabled
Flux.1-schnell	`FluxAdapter`	`model.final_layer`	Guidance embedding skipped (Identity)
Flux2	`FluxAdapter`	`model.final_layer`	Same layout as Flux
Flux2-Klein	`FluxAdapter` (via alias)	`model.final_layer`	Same layout as Flux2
Chroma	`ChromaAdapter`	`model.final_layer`	Distilled guidance; `get_modulations()`

KV Cache Control

The KV Cache Control node enables accelerated multi-reference image editing for FLUX.2 Klein models using KV-caching technology.

Key Features:

Works with both 4B and 9B variants using existing models
Up to 2.66× speedup for workflows with 4 reference images
Automatic cache extraction on step 0, reuse on subsequent steps
Memory efficient: ~2-4 MB per reference token

How It Works:

sequenceDiagram
    participant User
    participant Node as KV Cache Control
    participant Model as Flux2 Model
    participant Cache as KV Cache

    User->>Node: Provide model + reference images
    Node->>Model: forward_kv_extract (Step 0)
    Model->>Cache: Store K/V pairs for ref tokens
    Cache-->>Node: Return prediction
    Node->>User: Return intermediate state

    loop Steps 1 to N
        User->>Node: Continue denoising
        Node->>Model: forward_kv_cached (Steps 1+)
        Model->>Cache: Read cached K/V pairs
        Cache-->>Model: Inject cached K/V
        Model->>Node: Return prediction
        Node->>User: Return intermediate state
    end

Usage:

Load FLUX.2 Klein model (4B or 9B)
Load reference images (optional)
Add KV Cache Control node from model/kv_cache category
Connect:
- model: From loaded model
- reference_images: Optional batch of reference images
- enable_cache: Set to True
Connect to KSampler as usual

For complete documentation, see KV Cache Control Documentation.

Architecture

classDiagram
    class ApplySpectrum {
        +apply(model, w, lam, m, ...) MODEL
        -clone model
        -create SpectrumRunState
        -install wrappers
    }

    class SpectrumRunState {
        +w, lam, m, window_size, flex_window
        +warmup_steps, taylor_order
        +cnt, num_consecutive_cached_steps
        +curr_ws, forecaster, adapter
        +init_for_run(num_steps)
        +is_actual_step() bool
        +after_step(actual_forward)
        +update_forecaster(t, h)
        +predict(step) Tensor
    }

    class Spectrum {
        +cheb ChebyshevForecaster
        +taylor_order, w
        +predict(t_star) Tensor
        +update(t, h)
        -_local_taylor_discrete(t_star) Tensor
    }

    class ChebyshevForecaster {
        +M, K, lam
        +t_buf, _H_buf, _coef
        +update(t, h)
        +predict(t_star) Tensor
        -_fit_if_needed()
        -_taus(t) Tensor
        -_build_design(taus) Tensor
    }

    class BaseSpectrumAdapter {
        <<abstract>>
        +supports(model) bool
        +hook_target(model) Module
        +compute_vec_orig(...) Tensor
        +run_head(model, h, vec) Tensor
        +postprocess_output(...) Tensor
    }

    class FluxAdapter {
        +supports(model) bool
        +hook_target(model) final_layer
        +compute_vec_orig(...) Tensor
        +run_head(model, h, vec) Tensor
        +postprocess_output(...) Tensor
    }

    class ChromaAdapter {
        +MOD_INDEX_LENGTH 344
        +supports(model) bool
        +compute_vec_orig(...) Tensor
    }

    ApplySpectrum --> SpectrumRunState
    SpectrumRunState --> Spectrum
    SpectrumRunState --> BaseSpectrumAdapter
    Spectrum --> ChebyshevForecaster
    FluxAdapter --|> BaseSpectrumAdapter
    ChromaAdapter --|> BaseSpectrumAdapter

Data flow on an actual step

sequenceDiagram
    participant S as Sampler
    participant W as DiffusionWrapper
    participant M as Transformer
    participant FL as final_layer
    participant FC as ChebyshevForecaster

    S->>W: invoke(x, t, ...)
    W->>W: is_actual_step? → True
    W->>FL: register_forward_pre_hook
    W->>M: executor(x, t, ...)
    M->>FL: forward(h, vec)
    FL-->>W: hook captures h
    M-->>W: out
    W->>FL: hook.remove()
    W->>FC: update(t, h)
    W-->>S: out

Data flow on a cached step

sequenceDiagram
    participant S as Sampler
    participant W as DiffusionWrapper
    participant FC as Spectrum forecaster
    participant A as FluxAdapter
    participant FL as final_layer

    S->>W: invoke(x, t, ...)
    W->>W: is_actual_step? → False
    W->>FC: predict(t)
    FC->>FC: _fit_if_needed (Cholesky)
    FC->>FC: Chebyshev + Taylor blend
    FC-->>W: ĥ (B, L, C)
    W->>A: compute_vec_orig(t, y, guidance)
    A-->>W: vec_orig
    W->>A: run_head(model, ĥ, vec_orig)
    A->>FL: forward(ĥ, vec_orig)
    FL-->>A: head_out
    A->>A: postprocess_output (rearrange)
    A-->>W: out (B, C, H, W)
    W-->>S: out

Performance

Measured speedup depends heavily on warmup_steps, window_size, and total step count. Representative results on Flux.1-dev:

Steps	Warmup	Window	Actual Fwds	Speedup
14	3	2.0	7 / 14	~1.8×
20	5	2.0	9 / 20	~2.0×
28	5	2.0	10 / 28	~2.5×
28	3	3.0	7 / 28	~3.2×

Note: Spectrum is not beneficial for fewer than ~8 steps. The warmup overhead dominates short runs.

Troubleshooting

`UNSTABLE PREDICTION DETECTED` / NaN every cached step

The Chebyshev fit is numerically degenerate when fewer than P+1 = M+2 actual observations are available. This happens when warmup_steps is too small relative to m.

Fix: Set warmup_steps ≥ m + 2 (default m=4 → minimum warmup_steps=6).

Corrupted / blurry images despite no NaN

The Chebyshev extrapolation is drifting from the true features. Try:

Increasing lam (e.g. 0.1 → 0.5) to regularise the polynomial fit.
Decreasing window_size to force more actual forwards.
Setting w=1.0 (pure Chebyshev) or w=0.0 (pure Taylor) to isolate which predictor is misbehaving.

`torch._dynamo hit config.recompile_limit`

The wrapper or predictor code is being recompiled by torch.compile on each new step index. Ensure:

@torch._dynamo.disable is applied to the sampler and apply_model wrappers in step_injector.py.
No Python .item() calls appear in any code path that runs inside a torch.compile region.
t_star is passed as a float tensor, not a raw Python int.

`Selected FluxAdapter for this model` printed every step

The adapter is being re-detected each step instead of being cached. This was fixed in the March 2026 update — ensure SpectrumRunState.adapter is set and reused.

Very slow first step / model loading

This is normal: the first step loads the model onto GPU. Spectrum does not affect load time.

Adding New Models

Implement a subclass of BaseSpectrumAdapter in comfy/adapters/:

from .base import BaseSpectrumAdapter

class MyModelAdapter(BaseSpectrumAdapter):

    @classmethod
    def supports(cls, diffusion_model) -> bool:
        # Return True if this is your model type
        return hasattr(diffusion_model, "my_final_layer")

    def hook_target(self, model):
        # Return the nn.Module whose first input is the feature to cache
        return model.my_final_layer

    def compute_vec_orig(self, model, timestep, y, guidance, device, dtype):
        # Reconstruct the conditioning vector (time + guidance + vector embed)
        # that your final_layer expects as its second argument
        t_emb = my_timestep_embedding(timestep, model.hidden_size)
        return model.time_proj(t_emb)

    def run_head(self, model, hidden, vec_orig):
        # Run only the output head on the predicted feature
        return model.my_final_layer(hidden, vec_orig)

    def postprocess_output(self, model, head_output, x, img_tokens):
        # Convert head_output to the same shape as full forward output
        # For patch-based models, rearrange from (B, tokens, C) to (B, C, H, W)
        from einops import rearrange
        bs, c, h, w = x.shape
        p = model.patch_size
        out = head_output[:, :img_tokens]
        return rearrange(out, "b (h w) (p q c) -> b c (h p) (w q)",
                         h=h//p, w=w//p, p=p, q=p)

Then register it in comfy/adapters/__init__.py:

from .my_model import MyModelAdapter

def get_adapter_for_model(diffusion_model):
    if MyModelAdapter.supports(diffusion_model):
        return MyModelAdapter()
    if ChromaAdapter.supports(diffusion_model):
        return ChromaAdapter()
    if FluxAdapter.supports(diffusion_model):
        return FluxAdapter()
    return None

See docs/MODELS.md for per-model notes on feature location and head structure.

Dependencies

ComfyUI with Flux/Flux2/Chroma support
PyTorch ≥ 2.0 (bfloat16, torch.linalg.cholesky)
einops (for spatial rearrangement in adapters)

Both torch and einops are standard in ComfyUI environments — no additional installation needed.

License

MIT

README.md Unescape Escape