Skip to main content

NetMediate Benchmark Results

GenDI pattern: The benchmark scenarios assume the current NetMediate ecosystem, where startup projects use NetMediate.SourceGeneration and supporting services can follow the GenDI [Injectable] + [Inject] model.

This document describes the performance characteristics of NetMediate under the current implementation, which uses compile-time source generation (no assembly scanning), GenDI-based dependency registration, and benchmark handlers configured as singleton + global thread isolation (ThreadIsolation = ThreadIsolationPolicy.None).


Reference benchmark environment

The table below is updated automatically by CI on every PR benchmark run. System info comes from the BenchmarkDotNet host environment.

KeyValue
OSLinux Ubuntu 24.04.4 LTS (Noble Numbat)
CPUAMD EPYC 7763 2.82GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK10.0.300
Runtime.NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
Last CI run2026-06-02 03:15 UTC
Branchdependabot/nuget/dot-config/main/minor-and-patch-c5eb370c7b
Commitc09fe43

🚀 Core dispatch throughput

Measured with BenchmarkDotNet (CoreDispatchBenchmarks) — no decorators, no resilience, no adapters registered. Mean is the BenchmarkDotNet ShortRun mean (ns/op). Throughput is the derived ops/s. Alloc Δ compares per-call allocation bytes against the baseline — allocations are deterministic and unaffected by CPU load, making this the most reliable regression signal. The vs timing column compares dispatch time against stored target-branch values (±10% = no change on shared CI hardware; ✅ = improved, ⚠️ = degraded).

BenchmarkMeanErrorGen0Gen1Gen2AllocatedAlloc ΔThroughputvs timing
Command Send71.49 ns±1.779 ns000-✅ -48 B~14.0M msg/s✅ improved (-21.0%)
Notification Notify124.40 ns±11.928 ns0.015100256 B✅ -32 B~8.0M msg/s≈ (-3.5%)
Request Request71.94 ns±4.554 ns000-✅ -112 B~13.9M msg/s✅ improved (-20.0%)
Stream RequestStream140.76 ns±11.415 ns0.007600128 B✅ -88 B~7.1M msg/s✅ improved (-28.2%)

¹ Stream measures complete stream invocations (3 items each). Higher throughput = better.

Note on stream vs other types: Stream invocations are inherently more expensive because each call allocates a new IAsyncEnumerator<T> and drives it through multiple MoveNextAsync cycles with Task.Yield() inside the handler. The per-invocation cost is higher by design.


BenchmarkDotNet project

For artifact-reproducible, statistically rigorous benchmarks including allocation data and GC gen0/1/2 counts, use the dedicated NetMediate.Benchmarks project:

# Standard JIT run (produces BenchmarkDotNet HTML/CSV artifacts in BenchmarkDotNet.Artifacts/)
dotnet run -c Release --project tests/NetMediate.Benchmarks/

# Quick dry-run to verify benchmark classes compile and can execute (no statistical warming)
dotnet run -c Release --project tests/NetMediate.Benchmarks/ -- --job Dry

# NativeAOT comparison — publish a native binary then run it
dotnet publish tests/NetMediate.Benchmarks/ -c Release -p:AotBenchmark=true -o /tmp/bench-aot
/tmp/bench-aot/NetMediate.Benchmarks

CoreDispatchBenchmarks covers the four core message types:

BenchmarkDescription
Command SendIMediator.Send<BenchCommand>() — no decorators
Notification NotifyIMediator.Notify<BenchNotification>() — no decorators
Request RequestIMediator.Request<BenchRequest, BenchResponse>() — no decorators
Stream RequestStream (3 items/call)IMediator.RequestStream<BenchStreamRequest, BenchStreamItem>() — drains 3 items per invocation

BenchmarkDotNet output columns: Method, Mean, Error, StdDev, Gen0, Gen1, Gen2, Allocated. The --job Short flag runs 3 warmup + 3 measured iterations.


⚡ Hot-path throughput

Once warm, JIT and NativeAOT produce identical throughput for the same registration model. In the benchmark profile, handlers are registered as singleton/global via GenDI (ThreadIsolation = ThreadIsolationPolicy.None), and runtime dispatch uses cached non-key handler resolution.

AspectJIT (CoreCLR)NativeAOT
Warm throughputBaselineSame ¹
Cold-start (first dispatch)JIT compiles on first callPre-compiled binary; no JIT overhead
Startup overheadNone (explicit registration only)None
Binary sizeStandardLarger (trimmed single-file)
Compatible registrationAllExplicit registration + source generator only

¹ Identical because the hot path makes no reflection, no MakeGenericType, and no dynamic IL calls — all resolved types are closed generics fixed at compile time.

How to run the comparison

JIT (standard dotnet test):

NETMEDIATE_RUN_PERFORMANCE_TESTS=true \
dotnet test tests/NetMediate.Tests/ --configuration Release \
--filter "FullyQualifiedName~CoreDispatchThroughput OR FullyQualifiedName~BenchmarkSystemInfo" \
--logger "console;verbosity=detailed"

NativeAOT (publish then run the native binary):

# 1. Publish NativeAOT test host
dotnet publish tests/NetMediate.Tests/ \
--configuration Release \
-p:PublishAot=true \
-p:TrimmerRootAssembly=NetMediate.Tests \
--output /tmp/nativeaot-bench

# 2. Run the native binary with the performance flag
NETMEDIATE_RUN_PERFORMANCE_TESTS=true \
/tmp/nativeaot-bench/NetMediate.Tests \
--filter "CoreDispatchThroughput|BenchmarkSystemInfo"

Look for execution_mode=jit vs execution_mode=nativeaot in the output to confirm which runtime produced each result line.

Trimming without NativeAOT

Publishing with --self-contained -p:PublishTrimmed=true reduces binary size but does not change dispatch throughput. The source-generated registration model is trimmer-safe by design.


Implementation model

All handlers are registered through source generation and standard DI:

builder.Services.AddNetMediate();

At startup the source generator registers each handler implementation directly as its interface. Cross-cutting concerns (logging, resilience, etc.) are applied via GenDI decorators using [DecoratorFor]:

[DecoratorFor<ICommandHandler<MyCommand>>]
public sealed class MyCommandDecorator(ICommandHandler<MyCommand> inner) : ICommandHandler<MyCommand>
{
public async Task Handle(MyCommand message, CancellationToken cancellationToken = default)
{
// pre-processing
await inner.Handle(message, cancellationToken);
// post-processing
}
}

GenDI registers the decorator chain in DI automatically. No MakeGenericType, no assembly scanning — fully NativeAOT-compatible.


Dispatch semantics

OperationMethodSemantics
SendIMediator.Send<TMsg>All ICommandHandler<TMsg> instances iterated sequentially
RequestIMediator.Request<TMsg, TResp>Single IRequestHandler<TMsg, TResp> (first registered)
NotifyIMediator.Notify<TMsg>Fire-and-forget per handler; all INotificationHandler<TMsg> instances started individually; exceptions logged
RequestStreamIMediator.RequestStream<TMsg, TResp>All registered IStreamHandler<TMsg, TResp> instances, items merged sequentially

🧬 DI lifetime profile (benchmark)

Benchmark handlers and benchmark message services are declared with:

  • ServiceLifetime.Singleton
  • ThreadIsolation = ThreadIsolationPolicy.None

This enforces a global singleton registration profile in benchmark runs, aligned with the requested GenDI setup.

Singleton/global registrations in benchmark profile stabilize handler lifetime across runs. Non-key dispatch uses per-provider handler caches in `Mediator`/`Notifier`; keyed dispatch still resolves from DI on each call.

🧠 Cache strategy constraints

Current cache strategy must:

  • respect handler interface contracts and the developer-defined ServiceLifetime / ThreadIsolation
  • keep cache scope isolated per DI provider/container
  • preserve AOT and trimming compatibility

How to reproduce benchmarks

NETMEDIATE_RUN_PERFORMANCE_TESTS=true \
dotnet test tests/NetMediate.Tests/ --configuration Release \
--filter "FullyQualifiedName~BenchmarkSystemInfo" \
--logger "console;verbosity=detailed"

Minimum CI assertions

Test classScenarioThreshold
BenchmarkSystemInfoTestsSystem info printalways runs

Thresholds are deliberately lenient to remain green on any CI hardware. The BenchmarkDotNet --job Short run on every PR provides the authoritative throughput numbers and regression gate.


See Also


Latest CI Benchmark Run

Run: 2026-06-02 03:15 UTC | Branch: dependabot/nuget/dot-config/main/minor-and-patch-c5eb370c7b | Commit: c09fe43

ℹ️ Timing baseline loaded from stored target-branch docs (different run — ±10% is noise).

System specification

Linux Ubuntu 24.04.4 LTS (Noble Numbat)
AMD EPYC 7763 2.82GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.300
Runtime: .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3

Performance summary (BenchmarkDotNet — ShortRun job)

BenchmarkMeanErrorGen0Gen1Gen2AllocatedAlloc ΔThroughputvs timing
Command Send71.49 ns±1.779 ns000-✅ -48 B~14.0M msg/s✅ improved (-21.0%)
Notification Notify124.40 ns±11.928 ns0.015100256 B✅ -32 B~8.0M msg/s≈ (-3.5%)
Request Request71.94 ns±4.554 ns000-✅ -112 B~13.9M msg/s✅ improved (-20.0%)
Stream RequestStream140.76 ns±11.415 ns0.007600128 B✅ -88 B~7.1M msg/s✅ improved (-28.2%)

Comparison vs baseline (main, median of ≤3 runs)

Timing: ✅ improved (>10% faster) |  ≈ no change (±10%) |  ⚠️ degraded (>10% slower) Alloc Δ: ✅ same / ✅ −N B (less) / ⚠️ +N B (more)

BenchmarkBaseline (main, median of ≤3 runs)CurrentΔ timingAlloc Δ
Command Send90.55 ns71.49 ns✅ -21.0%✅ -48 B
Notification Notify128.85 ns124.40 ns≈ -3.5%✅ -32 B
Request Request89.91 ns71.94 ns✅ -20.0%✅ -112 B
Stream RequestStream196.07 ns140.76 ns✅ -28.2%✅ -88 B