Hugging Face Xet data storage system client library
  • Rust 96.6%
  • Python 1.7%
  • Shell 1.3%
  • JavaScript 0.2%
  • HTML 0.1%
Find a file
Repository files (latest commit first)
Filename Latest commit message Latest commit date
dependabot[bot] feb8ddb6fd
Some checks failed
xet-core CI / Rustfmt (push) Has been cancelled
xet-core CI / detect-unused-dependencies (push) Has been cancelled
xet-core CI / Check benchmarks compile (push) Has been cancelled
Test huggingface_hub xet tests / hub-python-tests (push) Has been cancelled
Release / linux (map[manylinux:auto runner:ubuntu-22.04 target:x86_64], 3.13t) (push) Has been cancelled
Release / linux (map[manylinux:auto runner:ubuntu-22.04 target:x86_64], 3.14) (push) Has been cancelled
Release / linux (map[manylinux:auto runner:ubuntu-22.04 target:x86_64], 3.14t) (push) Has been cancelled
Release / linux (map[manylinux:manylinux_2_28 runner:ubuntu-22.04 target:aarch64], 3.13t) (push) Has been cancelled
Release / linux (map[manylinux:manylinux_2_28 runner:ubuntu-22.04 target:aarch64], 3.14) (push) Has been cancelled
Release / linux (map[manylinux:manylinux_2_28 runner:ubuntu-22.04 target:aarch64], 3.14t) (push) Has been cancelled
Release / windows (map[runner:windows-11-arm rust_target:aarch64-pc-windows-msvc target:aarch64], 3.14t) (push) Has been cancelled
xet-core CI / build_and_test-linux (push) Has been cancelled
xet-core CI / build_and_test-win (push) Has been cancelled
xet-core CI / build_and_test-macos (push) Has been cancelled
xet-core CI / Build WASM (push) Has been cancelled
xet-core CI / Cargo Audit (push) Has been cancelled
git-xet Release / linux (map[runner:ubuntu-22.04 target:x86_64]) (push) Has been cancelled
git-xet Release / linux (map[runner:ubuntu-22.04-arm target:aarch64]) (push) Has been cancelled
git-xet Release / macos (map[runner:macos-15 target:aarch64]) (push) Has been cancelled
git-xet Release / macos (map[runner:macos-15-intel target:x86_64]) (push) Has been cancelled
git-xet Release / windows (map[runner:windows-11-arm target:aarch64 wix_arch:arm64]) (push) Has been cancelled
git-xet Release / windows (map[runner:windows-latest target:x86_64 wix_arch:x64]) (push) Has been cancelled
Release / musllinux (map[runner:ubuntu-22.04 target:aarch64], 3.13t) (push) Has been cancelled
Release / musllinux (map[runner:ubuntu-22.04 target:aarch64], 3.14) (push) Has been cancelled
Release / musllinux (map[runner:ubuntu-22.04 target:aarch64], 3.14t) (push) Has been cancelled
Release / musllinux (map[runner:ubuntu-22.04 target:x86_64], 3.13t) (push) Has been cancelled
Release / musllinux (map[runner:ubuntu-22.04 target:x86_64], 3.14) (push) Has been cancelled
Release / musllinux (map[runner:ubuntu-22.04 target:x86_64], 3.14t) (push) Has been cancelled
Release / windows (map[runner:windows-11-arm rust_target:aarch64-pc-windows-msvc target:aarch64], 3.13t) (push) Has been cancelled
Release / windows (map[runner:windows-11-arm rust_target:aarch64-pc-windows-msvc target:aarch64], 3.14) (push) Has been cancelled
Release / windows (map[runner:windows-latest rust_target:x86_64-pc-windows-msvc target:x64], 3.13t) (push) Has been cancelled
Release / windows (map[runner:windows-latest rust_target:x86_64-pc-windows-msvc target:x64], 3.14) (push) Has been cancelled
Release / windows (map[runner:windows-latest rust_target:x86_64-pc-windows-msvc target:x64], 3.14t) (push) Has been cancelled
Release / macos (map[runner:macos-14 rust_target:aarch64-apple-darwin target:aarch64], 3.13t) (push) Has been cancelled
Release / macos (map[runner:macos-14 rust_target:aarch64-apple-darwin target:aarch64], 3.14) (push) Has been cancelled
Release / macos (map[runner:macos-14 rust_target:aarch64-apple-darwin target:aarch64], 3.14t) (push) Has been cancelled
Release / macos (map[runner:macos-15-intel rust_target:x86_64-apple-darwin target:x86_64], 3.13t) (push) Has been cancelled
Release / macos (map[runner:macos-15-intel rust_target:x86_64-apple-darwin target:x86_64], 3.14) (push) Has been cancelled
Release / macos (map[runner:macos-15-intel rust_target:x86_64-apple-darwin target:x86_64], 3.14t) (push) Has been cancelled
Release / sdist (push) Has been cancelled
git-xet Release / Create GitHub release (push) Has been cancelled
Release / Release PyPi (push) Has been cancelled
Release / Create GitHub release (push) Has been cancelled
Bump openssl from 0.10.76 to 0.10.79 (#836)
Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from
0.10.76 to 0.10.79.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Di Xiao <seanses@users.noreply.github.com>
2026-05-08 11:43:17 -07:00
.cargo fixing some issues identified in cargo audit (#802) 2026-04-20 14:49:48 -07:00
.github Remove third party codesign & notarize action (#829) 2026-05-06 06:05:28 -07:00
.vscode Test suite for directory logging functionality (#536) 2025-10-24 10:06:26 -07:00
api_changes Fix spelling typos in comments and docs (#826) 2026-04-30 13:15:18 -07:00
docs Fix simulation deletion controls and soft-delete behavior for GC simulation (#736) 2026-03-20 10:02:21 -07:00
git_xet Fix spelling typos in comments and docs (#826) 2026-04-30 13:15:18 -07:00
hf_xet Bump openssl from 0.10.76 to 0.10.79 (#836) 2026-05-08 11:43:17 -07:00
openapi V2 reconstruction with client-side optional single range splitting (#703) 2026-03-16 14:10:50 -07:00
scripts feat: smoke tests using hf CLI with bucket and large-file coverage (#710) 2026-03-17 19:07:05 -07:00
simulation Bump openssl from 0.10.76 to 0.10.79 (#836) 2026-05-08 11:43:17 -07:00
wasm chore: use ctor 1.0 (#830) 2026-05-08 07:02:21 -07:00
xet_client Fix spelling typos in comments and docs (#826) 2026-04-30 13:15:18 -07:00
xet_core_structures Fix spelling typos in comments and docs (#826) 2026-04-30 13:15:18 -07:00
xet_data Remove unnecessary UniqueId -> UniqueID type alias (#824) 2026-05-01 04:10:14 -07:00
xet_pkg Remove unnecessary UniqueId -> UniqueID type alias (#824) 2026-05-01 04:10:14 -07:00
xet_runtime chore: use ctor 1.0 (#830) 2026-05-08 07:02:21 -07:00
.gitignore Move test-only deps to dev-dependencies in git_xet (#767) 2026-03-31 13:31:20 -07:00
Cargo.lock Bump openssl from 0.10.76 to 0.10.79 (#836) 2026-05-08 11:43:17 -07:00
Cargo.toml chore: use ctor 1.0 (#830) 2026-05-08 07:02:21 -07:00
CODE_OF_CONDUCT.md Added CoC, contribution guide, and updated readme (#133) 2025-01-09 14:55:32 -08:00
CONTRIBUTING.md Added CoC, contribution guide, and updated readme (#133) 2025-01-09 14:55:32 -08:00
LICENSE Added CoC, contribution guide, and updated readme (#133) 2025-01-09 14:55:32 -08:00
markdownlint.toml spec draft (#422) 2025-09-29 10:25:25 -07:00
README.md Add README.md files and Cargo.toml updates needed for publishing hf-xet (#773) 2026-04-03 12:34:47 -07:00
rustfmt.toml run cargo fmt on everything (#59) 2024-10-23 17:57:45 -07:00

License GitHub release Contributor Covenant

🤗 xet-core - xet client tech, used in huggingface_hub

Welcome

xet-core enables huggingface_hub to utilize xet storage for uploading and downloading to HF Hub. Xet storage provides chunk-based deduplication, efficient storage/retrieval with local disk caching, and backwards compatibility with Git LFS. This library is not meant to be used directly, and is instead intended to be used from huggingface_hub.

Key features

chunk-based deduplication implementation: avoid transferring and storing chunks that are shared across binary files (models, datasets, etc).

🤗 Python bindings: bindings for huggingface_hub package.

network communications: concurrent communication to HF Hub Xet backend services (CAS).

🔖 local disk caching: chunk-based cache that sits alongside the existing huggingface_hub disk cache.

Packages

This repository produces the following packages:

Rust Crates (crates.io)

Crate Description
hf-xet High-level client library for uploading and downloading files with chunk-based deduplication
xet-client HTTP client for communicating with Hugging Face Xet storage servers
xet-data Data processing pipeline for chunking, deduplication, and file reconstruction
xet-core-structures Core data structures including MerkleHash, metadata shards, and Xorb objects
xet-runtime Async runtime, configuration, logging, and utility infrastructure

Python Package (PyPI)

Package Description
hf-xet Python bindings for the Xet storage system, used by huggingface_hub

Built from the hf_xet/ directory using maturin.

CLI Binary

Binary Description
git-xet Git LFS compatible command-line tool for Xet storage

Built from the git_xet/ directory. Distributed via GitHub releases.

Contributions (feature requests, bugs, etc.) are encouraged & appreciated 💙💚💛💜🧡❤️

Please join us in making xet-core better. We value everyone's contributions. Code is not the only way to help. Answering questions, helping each other, improving documentation, filing issues all help immensely. If you are interested in contributing (please do!), check out the contribution guide for this repository.

Issues, Diagnostics & Debugging

If you encounter an issue with hf-xet, please collect diagnostic information and attach it when creating a new Issue.

The scripts/diag/ directory contains platform-specific scripts that download debug symbols, configure logging, and capture periodic stack traces and core dumps:

OS Script
Linux scripts/diag/hf-xet-diag-linux.sh
macOS scripts/diag/hf-xet-diag-macos.sh
Windows (Git-Bash) scripts/diag/hf-xet-diag-windows.sh
# prefix your failing command with the script for your OS, e.g.:
./scripts/diag/hf-xet-diag-macos.sh -- python my-script.py

See scripts/diag/README.md for full usage, output layout, dump analysis instructions, and how to install debug symbols manually.

Quick debugging environment variables:

RUST_BACKTRACE=full          # full Rust backtraces on panic
RUST_LOG=info                # enable hf-xet logging
HF_XET_LOG_FILE=/tmp/xet.log # write logs to a file (defaults to stdout)

Local Development

Repo Organization

  • xet_pkg/ (hf-xet): High-level session API for uploading and downloading files with deduplication.
  • xet_client/ (xet-client): HTTP client for CAS and Hub backend services.
  • xet_data/ (xet-data): Chunking, deduplication, and file reconstruction pipeline.
  • xet_core_structures/ (xet-core-structures): MerkleHash, metadata shards, Xorb objects, and shared data structures.
  • xet_runtime/ (xet-runtime): Async runtime, configuration, logging, and utilities.
  • hf_xet/: Python bindings (maturin/PyO3), produces the hf-xet PyPI package.
  • git_xet/: Git LFS compatible CLI tool (git-xet).
  • wasm/: WebAssembly builds (hf_xet_wasm, hf_xet_thin_wasm).
  • simulation/: Simulation and benchmarking infrastructure.

Build, Test & Benchmark

To build xet-core, look at requirements in GitHub Actions CI Workflow for the Rust toolchain to install. Follow Rust documentation for installing rustup and that version of the toolchain. Use the following steps for building, testing, benchmarking.

Many of us on the team use VSCode, so we have checked in some settings in the .vscode directory. Install the rust-analyzer extension.

Build:

cargo build

Test:

cargo test

Benchmark:

cargo bench

Linting:

cargo clippy -r --verbose -- -D warnings

Formatting (requires nightly toolchain):

cargo +nightly fmt --manifest-path ./Cargo.toml --all

Building Python package and running locally (on *nix systems):

  1. Create Python3 virtualenv: python3 -mvenv ~/venv
  2. Activate virtualenv: source ~/venv/bin/activate
  3. Install maturin: pip3 install maturin ipython
  4. Go to hf_xet crate: cd hf_xet
  5. Build: maturin develop
  6. Test:
ipython
import hf_xet as hfxet
hfxet.upload_files()
hfxet.download_files()

Developing with tokio console

Prerequisite is installing tokio-console (cargo install tokio-console). See https://github.com/tokio-rs/console

To use tokio-console with hf-xet there are compile hf_xet with the following command:

RUSTFLAGS="--cfg tokio_unstable" maturin develop -r --features tokio-console

Then while hf_xet is running (via a hf cli command or huggingface_hub python code), tokio-console will be able to connect.

Ex.

# In one terminal:
pip install huggingface_hub
RUSTFLAGS="--cfg tokio_unstable" maturin develop -r --features tokio-console
hf download openai/gpt-oss-20b

# In another terminal
cargo install tokio-console
tokio-console

Building universal whl for MacOS:

From hf_xet directory:

MACOSX_DEPLOYMENT_TARGET=10.9 maturin build --release --target universal2-apple-darwin --features openssl_vendored

Note: You may need to install x86_64: rustup target add x86_64-apple-darwin

Testing

Unit-tests are run with cargo test, benchmarks are run with cargo bench. Some crates have a main.rs that can be run for manual testing.

References & History