Performance: KDTree benchmarks and guidance

This page summarizes how to measure KDTree performance in this repository and how to compare the two internal backends (Linear vs Gonum) that you can select at build/runtime.

How benchmarks are organized

Micro-benchmarks live in bench_kdtree_test.go, bench_kdtree_dual_test.go, and bench_kdtree_dual_100k_test.go and cover:
Nearest in 2D and 4D with N = 1k, 10k (both backends)
Nearest in 2D and 4D with N = 100k (gonum-tag job; linear also measured there)
KNearest(k=10) in 2D/4D with N = 1k, 10k
Radius (mid radius r≈0.5 after normalization) in 2D/4D with N = 1k, 10k
Datasets: Uniform and 3-cluster synthetic generators in normalized [0,1] spaces.
Backends: Linear (always available) and Gonum (enabled when built with -tags=gonum).

Run them locally:

# Linear backend (default)
go test -bench . -benchmem -run=^$ ./...

# Gonum backend (optimized KD; requires build tag)
go test -tags=gonum -bench . -benchmem -run=^$ ./...

GitHub Actions publishes benchmark artifacts on every push/PR: - Linear job: artifact bench-linear.txt - Gonum job: artifact bench-gonum.txt

Backend selection and defaults

Default backend is Linear.
If you build with -tags=gonum, the default switches to the optimized KD backend.
You can override at runtime:

// Force Linear
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendLinear))
// Force Gonum (requires build tag)
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendGonum))

Supported metrics in the optimized backend: L2 (Euclidean), L1 (Manhattan), L∞ (Chebyshev). Cosine/Weighted-Cosine currently use the Linear backend.

What to expect (rule of thumb)

Linear backend: O(n) per query; fast for small-to-medium datasets (≤10k), especially in low dims (≤4).
Gonum backend: typically sub-linear for prunable datasets and dims ≤ ~8, with noticeable gains as N grows (≥10k–100k), especially on uniform or moderately clustered data and moderate radii.
For large radii (many points within r) or highly correlated/pathological data, pruning may be less effective and behavior approaches O(n) even with KD-trees.

Interpreting results

Benchmarks output something like:

BenchmarkNearest_10k_4D_Gonum_Uniform-8   50000  12,300 ns/op   0 B/op   0 allocs/op

ns/op: lower is better (nanoseconds per operation)
B/op and allocs/op: memory behavior; fewer is better
KNearest incurs extra work due to sorting; Radius cost scales with the number of hits.

Improving performance

Normalize and weight features once; reuse across queries (see Build*WithStats helpers).
Choose a metric aligned with your policy: L2 usually a solid default; L1 for per-axis penalties; L∞ for hard-threshold dominated objectives.
Batch queries to benefit from CPU caches.
Prefer the Gonum backend for larger N and dims ≤ ~8; stick to Linear for tiny datasets or when using Cosine metrics.

Reproducing and tracking performance

Local (Linear): go test -bench . -benchmem -run=^$ ./...
Local (Gonum): go test -tags=gonum -bench . -benchmem -run=^$ ./...
CI artifacts: download bench-linear.txt and bench-gonum.txt from the latest workflow run.
Optional: add historical trend graphs via Benchstat or Codecov integration.