CLI output compressor for AI agents - saves 50-95% of tokens on verbose command output
Find a file
Repository files (latest commit first)
Filename Latest commit message Latest commit date
Balazs Horvath a50ced4e60 docs: fix broken mermaid diagrams in README
Fix Auto-Detect Fallback diagram: replace 'starts with { or []' with
'starts with brace or bracket' (bracket character breaks mermaid parsing).

Fix CompressionResult class diagram: use 'class X' syntax instead of
'class X {' (brace character breaks mermaid classDiagram parsing).
2026-06-09 14:38:16 +02:00
benches feat(choppa): add vitest filter with homoglyph stripping 2026-06-09 14:07:34 +02:00
src feat(choppa): add vitest filter with homoglyph stripping 2026-06-09 14:07:34 +02:00
test_data Fix hdrhistogram repository - remove embedded git repos and properly configure vendor management 2026-05-09 14:42:18 +02:00
tests feat(choppa): add vitest filter with homoglyph stripping 2026-06-09 14:07:34 +02:00
.gitignore Fix hdrhistogram repository - remove embedded git repos and properly configure vendor management 2026-05-09 14:42:18 +02:00
Cargo.toml chore(deps): update dependency versions 2026-05-22 21:52:16 +02:00
README.md docs: fix broken mermaid diagrams in README 2026-06-09 14:38:16 +02:00

Choppa — Terminal Output Compression

Choppa compresses verbose CLI output into compact, readable summaries. Pipe any command through Choppa and get the signal without the noise.

Quick Start

# Library usage
cargo install --git https://git.sly.so/kade/choppa choppa

# CLI usage
cargo run --bin choppa-cli -- git status < $(git status)
cargo run --bin choppa-cli -- --stats docker ps < $(docker ps)

What Choppa Does

Choppa takes command output and produces compact summaries:

Command Raw Output Choppa Output
git status 21 modified files listed unstaged(21): src/main.rs, src/lib.rs, ...
docker ps Full table with ports, commands, etc. web-api (node:18) Up 3 days
cargo clippy Full diagnostics with source snippets clippy::needless_return (3): src/lib.rs:42, ...
terraform plan Full resource diff Plan: 3 to add, 1 to change, 0 to destroy
kubectl get pods Wide table with many columns NAME STATUS RESTARTS AGE (key columns only)

Architecture

Core Pipeline

flowchart TD
    A[Command Output] --> B{Compressor}
    B --> C{Config Check}
    C -->|Disabled| D[Pass Through]
    C -->|Enabled| E{Command Filter}
    E -->|Match Found| F[Command-Specific Filter]
    E -->|No Match| G[Auto-Detect]
    F --> H[CompressionResult]
    G --> H
    H --> I[Compressed Output]

    subgraph Config
        C
        C1[CompressionMode]
        C2[CompressionLevel]
        C3[Command Overrides]
        C4[Disabled Commands]
        C --> C1
        C --> C2
        C --> C3
        C --> C4
    end

    subgraph Filters
        F
        F1[Git]
        F2[Docker]
        F3[NPM/Cargo]
        F4[Kubectl]
        F5[Terraform]
        F6[Linters]
        F7[Cloud]
        F8[Package Managers]
        F --> F1
        F --> F2
        F --> F3
        F --> F4
        F --> F5
        F --> F6
        F --> F7
        F --> F8
    end

    subgraph Utilities
        G
        G1[JSON Compressor]
        G2[Log Pattern]
        G3[Redaction]
        G4[Sanity Guards]
        G --> G1
        G --> G2
        G --> G3
        G --> G4
    end

Filter Dispatch System

Command-specific filters take priority over auto-detection. Dispatch routes commands to handlers based on command name prefix matching:

flowchart LR
    A[Input Command] --> B{Command Prefix}
    
    B -->|git| C{Git Subcommand}
    C -->|status| C1[git_status]
    C -->|log| C2[git_log]
    C -->|diff| C3[git_diff]
    C -->|branch| C4[git_branch]
    C -->|clone| C5[git_clone]
    C -->|fetch| C6[git_fetch]
    C -->|pull| C7[git_pull]
    C -->|push| C8[git_push]
    C -->|stat| C9[git_stat]
    
    B -->|docker| D{Docker Subcommand}
    D -->|ps| D1[docker_ps]
    D -->|images| D2[docker_images]
    D -->|build| D3[docker_build]
    D -->|logs| D4[docker_logs]
    D -->|pull| D5[docker_pull]
    D -->|rmi| D6[docker_rmi]
    D -->|diff| D7[docker_diff]
    D -->|history| D8[docker_history]
    D -->|inspect| D9[docker_inspect]
    D -->|network| D10[docker_network]
    D -->|stats| D11[docker_stats]
    D -->|system| D12[docker_system]
    D -->|top| D13[docker_top]
    D -->|volume| D14[docker_volume]
    
    B -->|npm| E{NPM Subcommand}
    E -->|install| E1[npm_install]
    E -->|list| E2[npm_list]
    E -->|test| E3[npm_test]
    
    B -->|cargo| F{Cargo Subcommand}
    F -->|build| F1[cargo_build]
    F -->|clippy| F2[cargo_clippy]
    F -->|test| F3[cargo_test]
    
    B -->|kubectl| G{Kubectl Subcommand}
    G -->|apply| G1[kubectl_apply]
    G -->|describe| G2[kubectl_describe]
    G -->|get| G3[kubectl_get]
    G -->|other| G4[kubectl generic]
    
    B -->|terraform| H{Terraform Subcommand}
    H -->|apply| H1[terraform_apply]
    H -->|init| H2[terraform_init]
    H -->|plan| H3[terraform_plan]
    H -->|other| H4[terraform generic]
    
    B -->|rspec| I1[rspec]
    B -->|rubocop| I2[rubocop]
    B -->|ruff| I3[ruff]
    B -->|mypy| I4[mypy]
    B -->|eslint| I5[eslint]
    B -->|tsc| I6[tsc]
    B -->|golangci-lint| I7[golangci_lint]
    B -->|pip| J{Pip Subcommand}
    J -->|install| J1[pip_install]
    J -->|list| J2[pip_list]
    B -->|yarn| J3[yarn]
    B -->|pnpm| J4[pnpm]
    B -->|bun| J5[bun]
    B -->|poetry| K{Poetry Subcommand}
    K -->|install/add/update| K1[poetry_install]
    K -->|show| K2[poetry_show]
    B -->|composer| K3[composer]
    B -->|brew| K4[brew]
    B -->|apt| K5[apt]
    B -->|aws| L{AWS Subcommand}
    L -->|s3 ls| L1[aws_s3_ls]
    L -->|ec2 describe| L2[aws_ec2_describe]
    L -->|logs| L3[aws_logs]
    L -->|other| L4[aws generic]
    B -->|gcloud| M{GCloud Subcommand}
    M -->|compute instances list| M1[gcloud_instances_list]
    M -->|other| M2[gcloud generic]
    B -->|helm| N{Helm Subcommand}
    N -->|install/upgrade| N1[helm_install]
    N -->|list| N2[helm_list]
    B -->|argocd app| O{ArgoCD Subcommand}
    O -->|list| O1[argocd_app_list]
    O -->|sync| O2[argocd_app_sync]
    O -->|get| O3[argocd_app_get]
    B -->|flux| P{Flux Subcommand}
    P -->|get| P1[flux_get]
    P -->|reconcile| P2[flux_reconcile]
    B -->|az| Q{Azure Subcommand}
    Q -->|vm list| Q1[azure_vm_list]
    Q -->|resource list| Q2[azure_resource_list]
    Q -->|other| Q3[azure generic]
    
    B -->|unknown| R[auto_detect]

CompressionResult Output

Each filter produces a structured result with metadata:

classDiagram
    class CompressionResult
        CompressionResult : +String compressed
        CompressionResult : +usize original_size
        CompressionResult : +usize compressed_size
        CompressionResult : +f64 compression_ratio
        CompressionResult : +u64 duration_ms
        CompressionResult : +String filter_used
    
    class Compressor
        Compressor : +Config config
        Compressor : +new(Config) Compressor
        Compressor : +compress(&str, &str, bool) Option~CompressionResult~
    
    class Config
        Config : +bool enabled
        Config : +CompressionMode mode
        Config : +CompressionLevel default_level
        Config : +HashMap~String, CommandConfig~ command_overrides
        Config : +Vec~String~ disabled_commands
    
    class CommandCompressionConfig
        CommandCompressionConfig : +bool enabled
        CommandCompressionConfig : +CompressionLevel level
        CommandCompressionConfig : +Vec~String~ custom_filters
    
    enum CompressionMode
        CompressionMode : Off
        CompressionMode : Conservative
        CompressionMode : Aggressive
    
    enum CompressionLevel
        CompressionLevel : None
        CompressionLevel : Low
        CompressionLevel : Medium
        CompressionLevel : High
    
    Compressor --> Config
    Config --> CommandCompressionConfig
    Config --> CompressionMode
    Config --> CompressionLevel
    Compressor ..> CompressionResult : produces

Auto-Detect Fallback

When no command-specific filter matches, auto-detect classifies output and applies the best generic strategy:

flowchart TD
    A[Auto-Detect Input] --> B{Output Type}
    
    B -->|JSON| C[JSON Detection]
    C --> C1{Valid JSON?}
    C1 -->|Yes| C2[JSON Compressor]
    C1 -->|No| D[Plain Text]
    
    B -->|Log| E[Log Detection]
    E --> E1{Log-like?}
    E1 -->|Yes| E2[Log Compressor]
    E1 -->|No| D
    
    B -->|Plain| D[Plain Text]
    
    C2 --> F[Compressed Output]
    E2 --> F
    D --> F
    
    subgraph Detection Logic
        C1
        E1
        E3[Pattern: timestamps + levels]
        E4[Pattern: 20+ lines, 500+ chars]
        C5[Pattern: starts with brace or bracket]
        C6[serde_json parsing]
        E3 --> E1
        E4 --> E1
        C5 --> C1
        C6 --> C1
    end

Filter Categories

Version Control (Git)

Compresses git output by extracting meaningful information and discarding noise:

  • git_status — Parses staged/unstaged/untracked sections; outputs staged(3): file1, file2, file3
  • git_log — Condenses verbose commit output to hash + message; limits to 20 entries
  • git_diff — Extracts file-level stats (lines added/removed); summarizes with totals
  • git_branch — Shows current branch + locals; summarizes remotes by prefix grouping
  • git_clone — Keeps clone target + totals; drops progress noise
  • git_fetch — Shows ref updates; drops transfer progress
  • git_pull — Shows merge strategy + conflicts; drops unpacking noise
  • git_push — Shows ref updates + errors; drops transfer progress
  • git_stat — Sorts files by change count; shows top 5

Container Management (Docker)

Column-based extraction from tabular output:

  • docker_ps — Extracts NAME, IMAGE, STATUS columns
  • docker_images — Handles both classic and new DISK USAGE formats
  • docker_build — Parses step count, image ID/tag, warnings (classic + BuildKit)
  • docker_logs — Delegates JSON logs to JSON compressor
  • docker_pull — Shows status, digest, image reference; drops layer progress
  • docker_rmi — Shows deleted image names; drops SHA digests
  • docker_diff — Counts A/C/D changes; shows paths when few
  • docker_history — Extracts IMAGE, SIZE, CREATED BY; filters zero-size layers
  • docker_inspect — JSON parse + redact + deep compress
  • docker_network — Extracts NAME, DRIVER, SCOPE columns
  • docker_stats — Extracts NAME, CPU%, MEM, PIDs columns
  • docker_system — Passes through (already compact)
  • docker_top — Header + first 15 + last 5 processes

Package Managers

Install/test/list compression across ecosystems:

  • npm_install — Summary line + audit info + warnings/errors
  • npm_list — Top-level deps + total package count
  • npm_test — Test suites/results + failure details
  • cargo_build — Errors/warnings with locations; "built ok" for clean
  • cargo_clippy — Errors in full; warnings grouped by lint rule
  • cargo_test — Test results + failure blocks; doc-test tracking
  • yarn — Done line + errors/warnings
  • pnpm — Package count + errors/warnings
  • bun — Installed line + errors
  • poetry_install — Operations + updates + installs + lock file
  • poetry_show — Package count summary
  • composer — Operations + autoload generation + errors
  • pip_install — Installed packages + warnings/errors
  • pip_list — Package count + first 10 names
  • brew — Section headers + install status + warnings; drops download noise
  • apt — Install/upgrade actions + summary; drops fetch noise

Kubernetes & IaC

Resource-focused compression for infrastructure tools:

  • kubectl_get — Column filtering by resource type (pods, deployments, services, nodes)
  • kubectl_apply — Groups by action (created/configured/unchanged/deleted)
  • kubectl_describe — Strips verbose sections; keeps events (warnings first)
  • kubectl generic — First 5 + last 5 lines for unknown commands
  • terraform_plan — Parses plan summary + resource actions + errors
  • terraform_apply — Resource actions with timing + summary + errors
  • terraform_init — Provider versions + success status + errors
  • terraform generic — Plan/apply summary lines extraction
  • helm_install — Name, status, namespace, revision
  • helm_list — Name, namespace, status, chart per release
  • argocd_app_list — Name, status, health columns
  • argocd_app_sync — Resources + sync/health status
  • argocd_app_get — Key fields + errors/degraded warnings
  • flux_get — Name, ready, message columns; suspended/not-ready flags
  • flux_reconcile — Success/error lines; errors prioritized

Cloud Providers

JSON-aware compression for cloud CLI output:

  • aws_s3_ls — Prefix grouping with file counts and human sizes
  • aws_ec2_describe — Instance summary with ID, state, type, name tag
  • aws_logs — Event deduplication with counts
  • aws generic — JSON compress or pass-through
  • gcloud_instances_list — Name, zone, status table
  • gcloud generic — JSON compress or table filter
  • azure_vm_list — Name, resource group, power state
  • azure_resource_list — Name, resource group, provisioning state
  • azure generic — JSON compress

Linters & Type Checkers

Error grouping by rule/code for static analysis tools:

  • eslint — Groups by rule; shows file:line locations; fixable count
  • tsc — Groups by error code; shows file count; tilde underline removal
  • mypy — Groups by error code; file:line locations; success detection
  • ruff — Groups by error code; file:line locations; fixable detection
  • rubocop — Groups by cop; file:line locations; autocorrect count
  • golangci_lint — Groups by linter; ANSI stripping; samples + more count
  • rspec — Summary + failed examples; execution time

Utilities

Cross-cutting concerns shared across filters:

  • json_compress — Deep JSON traversal with depth limit; string truncation; array/object summarization
  • log_pattern — Timestamp/level detection; line grouping by fingerprint; deduplication
  • redact — Sensitive key detection (19 keywords); header redaction; JSON recursive redaction
  • sanity_guards — Empty output prevention; filter result validation

Usage

As a Library

use choppa::{Compressor, Config, CompressionMode};

// Default config (conservative, medium level)
let compressor = Compressor::default();

// Custom config
let config = Config {
    mode: CompressionMode::Aggressive,
    default_level: CompressionLevel::High,
    disabled_commands: vec!["git status".to_string()],
    command_overrides: HashMap::from([
        ("docker ps".to_string(), CommandCompressionConfig {
            enabled: true,
            level: CompressionLevel::Low,
            custom_filters: vec![],
        }),
    ]),
    ..Default::default()
};
let compressor = Compressor::new(config);

// Compress output
let result = compressor.compress("git status", output, false);
if let Some(result) = result {
    println!("{}", result.compressed);
    println!("Ratio: {:.2}", result.compression_ratio);
    println!("Filter: {}", result.filter_used);
    println!("Duration: {}ms", result.duration_ms);
}

CLI

# Basic usage
choppa-cli git status < $(git status)

# With compression statistics
choppa-cli --stats docker ps < $(docker ps)

# Help
choppa-cli --help

Running Benchmarks

Three benchmark suites measure different aspects:

# Filter performance
cargo bench --bench filter_benchmarks

# Compression overhead for small/medium/large output
cargo bench --bench compression_overhead

# Real-world compression scenarios (JSON, git status, logs)
cargo bench --bench compression_bench

Running Tests

# All tests
cargo test

# Specific test
cargo test test_git_status_compression

Tests cover:

  • filters_test.rs — Git status, log patterns, redaction, disabled mode, short output passthrough
  • cli_test.rs — CLI stats/no-stats/no-compression/help/error handling

Dependencies

  • serde (forked from git.sly.so/kade/serde) — Configuration serialization
  • serde_json (forked from git.sly.so/kade/serde_json) — JSON parsing and compression
  • regex (forked from git.sly.so/kade/regex) — Pattern matching in filters
  • lazy_static — Static regex compilation
  • thiserror (forked from git.sly.so/kade/thiserror) — Error types
  • tracing (forked from git.sly.so/kade/tracing) — Logging
  • smol (optional) — Async runtime for future extensions

Design Principles

  1. Command-specific first — Dedicated filters for known commands beat generic detection
  2. Auto-detect fallback — JSON, log patterns, and plain text handle unknown commands
  3. Safety — Sanity guards prevent empty output; redaction protects sensitive data
  4. Performance — Lazy-compiled regexes; efficient line processing; minimal allocations
  5. Extensibility — Add filters by implementing compress(&str, bool) -> Option<String>

License

[License information to be added]