Benchmark Analysis #1

Open
opened 2026-04-21 20:17:49 +02:00 by kade · 0 comments
Owner

Benchmark Analysis

This analysis examines Mappy's benchmarking infrastructure and empirical performance data.

Benchmark Setup

Mappy uses Criterion.rs for benchmarking. The benchmark suite includes:

Benchmark File Purpose
basic_quotient_filter_benchmarks.rs Core quotient filter operations
quotient_filter_benchmarks.rs Quotient filter features
quotient_filter_benchmarks_slot_finding.rs Slot finding performance
simple_quotient_filter_benchmarks.rs Simple quotient filter operations
maplet_vs_std.rs Comparison with std collections
storage_benchmarks.rs Storage backend performance

Empirical Data

Insert Performance

Size Time (mean) Throughput Outliers
1,000 55.555 µs 18.000 Melem/s 6.00%
10,000 555.09 µs 18.015 Melem/s 11.00%
100,000 12.848 ms 7.783 Melem/s 0%

Insert performance scales linearly with dataset size. Throughput decreases at larger sizes due to cache effects and collision handling.

Query Performance

Size Time (mean) Throughput Outliers
1,000 20.840 µs 47.985 Melem/s 8.00%
10,000 246.45 µs 40.576 Melem/s 1.00%
100,000 8.8069 ms 11.355 Melem/s 14.00%

Query performance is faster than insert operations (2-3x). Throughput degradation at 100k items suggests cache pressure.

Delete Performance

Size Time (mean) Throughput Outliers
1,000 110.93 µs 9.014 Melem/s 3.00%
10,000 1.1630 ms 8.598 Melem/s 10.00%
100,000 35.773 ms 2.795 Melem/s 5.00%

Delete operations are slowest, requiring tombstone management and run reconstruction.

Hash Function Comparison

Hash Function Time (mean) Relative Speed Outliers
AHash 834.37 µs 1.00x (baseline) 7.00%
TwoX 939.38 µs 0.89x 2.00%
Fnv 742.28 µs 1.12x (fastest) 4.00%

Fnv performs best in this benchmark, though AHash is recommended for general use.

Slot Finding Performance

Size Time (mean) Throughput Outliers
1,000 16.401 µs 60.973 Melem/s 9.00%
10,000 179.97 µs 55.564 Melem/s 9.00%
100,000 4.3894 ms 22.782 Melem/s 0%

Slot finding is fast, benefiting from the quotient filter structure.

Multiset Operations

Size Time (mean) Throughput Outliers
1,000 41.814 µs 23.915 Melem/s 7.00%
10,000 478.81 µs 20.885 Melem/s 3.00%
100,000 5.2887 ms 18.908 Melem/s 7.00%

Multiset operations show stable throughput across sizes.

Memory Usage

Size Time (mean)
1,000 460.61 ns
10,000 3.8064 µs
100,000 36.554 µs
1,000,000 398.03 µs

Memory allocation time scales sub-linearly.

Mathematical Analysis

Space Complexity

S(n) = n \cdot (O(\log(1/\varepsilon)) + v)

Where:

  • n = number of items
  • \varepsilon = false-positive rate
  • v = value size in bits

Time Complexity

Insert: T_{insert}(n) = O(1) average case, O(\log n) worst case
Query: T_{query}(n) = O(1) average case, with \Pr[\text{false positive}] \leq \varepsilon
Delete: T_{delete}(n) = O(1 + \alpha) where \alpha is fraction of deleted entries

graph LR
    A[Quotient Filter Operations] --> B[Insert: 18M ops/s]
    A --> C[Query: 41M ops/s]
    A --> D[Delete: 9M ops/s]
    A --> E[Slot Finding: 56M ops/s]

Benchmarking Practices

Based on Rust API Guidelines and Criterion.rs documentation:

  • Criterion.rs is the de facto standard for Rust benchmarking
  • Micro-benchmarks operate at unit test level
  • Macro-benchmarks operate at integration test level
  • Benchmark at the lowest level of abstraction for maintainability
  • Include macro-benchmarks for critical path validation

Assessment

Strengths:

  • Uses Criterion.rs (industry standard)
  • Comprehensive micro-benchmarks for core operations
  • Multiple data sizes for scalability analysis
  • Statistical analysis with confidence intervals
  • Outlier detection and reporting

Areas for Improvement:

  • Limited macro-benchmarks (end-to-end scenarios)
  • No continuous benchmarking in CI
  • Outlier rates are high (6-14% in some benchmarks)
  • No performance regression detection
  • Missing comparative benchmarks with competitors

Recommendations

Immediate

  1. Reduce outliers by increasing warm-up time and using CPU frequency scaling governors
  2. Add CI integration with baseline tracking
  3. Set up performance regression detection using Bencher or similar tool

Long-term

  1. Add macro-benchmarks for end-to-end workflows
  2. Benchmark against Bloom filters, Cuckoo filters, and Redis
  3. Add performance characteristics to API docs
  4. Create performance tuning guide

References

# Benchmark Analysis This analysis examines Mappy's benchmarking infrastructure and empirical performance data. ## Benchmark Setup Mappy uses Criterion.rs for benchmarking. The benchmark suite includes: | Benchmark File | Purpose | |---------------|---------| | basic_quotient_filter_benchmarks.rs | Core quotient filter operations | | quotient_filter_benchmarks.rs | Quotient filter features | | quotient_filter_benchmarks_slot_finding.rs | Slot finding performance | | simple_quotient_filter_benchmarks.rs | Simple quotient filter operations | | maplet_vs_std.rs | Comparison with std collections | | storage_benchmarks.rs | Storage backend performance | ## Empirical Data ### Insert Performance | Size | Time (mean) | Throughput | Outliers | |------|-------------|------------|----------| | 1,000 | 55.555 µs | 18.000 Melem/s | 6.00% | | 10,000 | 555.09 µs | 18.015 Melem/s | 11.00% | | 100,000 | 12.848 ms | 7.783 Melem/s | 0% | Insert performance scales linearly with dataset size. Throughput decreases at larger sizes due to cache effects and collision handling. ### Query Performance | Size | Time (mean) | Throughput | Outliers | |------|-------------|------------|----------| | 1,000 | 20.840 µs | 47.985 Melem/s | 8.00% | | 10,000 | 246.45 µs | 40.576 Melem/s | 1.00% | | 100,000 | 8.8069 ms | 11.355 Melem/s | 14.00% | Query performance is faster than insert operations (2-3x). Throughput degradation at 100k items suggests cache pressure. ### Delete Performance | Size | Time (mean) | Throughput | Outliers | |------|-------------|------------|----------| | 1,000 | 110.93 µs | 9.014 Melem/s | 3.00% | | 10,000 | 1.1630 ms | 8.598 Melem/s | 10.00% | | 100,000 | 35.773 ms | 2.795 Melem/s | 5.00% | Delete operations are slowest, requiring tombstone management and run reconstruction. ### Hash Function Comparison | Hash Function | Time (mean) | Relative Speed | Outliers | |---------------|-------------|----------------|----------| | AHash | 834.37 µs | 1.00x (baseline) | 7.00% | | TwoX | 939.38 µs | 0.89x | 2.00% | | Fnv | 742.28 µs | 1.12x (fastest) | 4.00% | Fnv performs best in this benchmark, though AHash is recommended for general use. ### Slot Finding Performance | Size | Time (mean) | Throughput | Outliers | |------|-------------|------------|----------| | 1,000 | 16.401 µs | 60.973 Melem/s | 9.00% | | 10,000 | 179.97 µs | 55.564 Melem/s | 9.00% | | 100,000 | 4.3894 ms | 22.782 Melem/s | 0% | Slot finding is fast, benefiting from the quotient filter structure. ### Multiset Operations | Size | Time (mean) | Throughput | Outliers | |------|-------------|------------|----------| | 1,000 | 41.814 µs | 23.915 Melem/s | 7.00% | | 10,000 | 478.81 µs | 20.885 Melem/s | 3.00% | | 100,000 | 5.2887 ms | 18.908 Melem/s | 7.00% | Multiset operations show stable throughput across sizes. ### Memory Usage | Size | Time (mean) | |------|-------------| | 1,000 | 460.61 ns | | 10,000 | 3.8064 µs | | 100,000 | 36.554 µs | | 1,000,000 | 398.03 µs | Memory allocation time scales sub-linearly. ## Mathematical Analysis ### Space Complexity $$S(n) = n \cdot (O(\log(1/\varepsilon)) + v)$$ Where: - $n$ = number of items - $\varepsilon$ = false-positive rate - $v$ = value size in bits ### Time Complexity Insert: $T_{insert}(n) = O(1)$ average case, $O(\log n)$ worst case Query: $T_{query}(n) = O(1)$ average case, with $\Pr[\text{false positive}] \leq \varepsilon$ Delete: $T_{delete}(n) = O(1 + \alpha)$ where $\alpha$ is fraction of deleted entries ```mermaid graph LR A[Quotient Filter Operations] --> B[Insert: 18M ops/s] A --> C[Query: 41M ops/s] A --> D[Delete: 9M ops/s] A --> E[Slot Finding: 56M ops/s] ``` ## Benchmarking Practices Based on Rust API Guidelines and Criterion.rs documentation: - Criterion.rs is the de facto standard for Rust benchmarking - Micro-benchmarks operate at unit test level - Macro-benchmarks operate at integration test level - Benchmark at the lowest level of abstraction for maintainability - Include macro-benchmarks for critical path validation ## Assessment **Strengths:** - Uses Criterion.rs (industry standard) - Comprehensive micro-benchmarks for core operations - Multiple data sizes for scalability analysis - Statistical analysis with confidence intervals - Outlier detection and reporting **Areas for Improvement:** - Limited macro-benchmarks (end-to-end scenarios) - No continuous benchmarking in CI - Outlier rates are high (6-14% in some benchmarks) - No performance regression detection - Missing comparative benchmarks with competitors ## Recommendations ### Immediate 1. Reduce outliers by increasing warm-up time and using CPU frequency scaling governors 2. Add CI integration with baseline tracking 3. Set up performance regression detection using Bencher or similar tool ### Long-term 1. Add macro-benchmarks for end-to-end workflows 2. Benchmark against Bloom filters, Cuckoo filters, and Redis 3. Add performance characteristics to API docs 4. Create performance tuning guide ## References - Rust API Guidelines: https://rust-lang.github.io/api-guidelines/ - Criterion.rs Documentation: https://docs.rs/criterion/ - Bencher: https://bencher.dev/learn/benchmarking/rust/criterion/
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
kade/mappy#1
No description provided.