Breaking3.0.0-rc.1
Tempo v3.0.0-rc.1 - Major Ingest Architecture Refactor
Tempo 3.0-rc.1 is a major release featuring a new ingest/write architecture, removal of deprecated 2.x components and ingester modules, TraceQL metrics improvements with comparison operators, and migration tooling from 2.x configs. Multiple breaking changes including CLI flag restructuring, default retry behavior changes, and removal of legacy overrides.
Tempo v3.0.0-rc.1
Tempo 3.0 is a major release candidate focused on the new ingest/write architecture, removal of deprecated 2.x components, migration tooling, TraceQL metrics improvements, and live-store/block-builder correctness and observability fixes.
Breaking Changes
- Remove duplicate "compaction" prefix from CompactorConfig CLI flags. Affected flags:
compaction.block-retention,compaction.max-objects-per-block,compaction.max-block-bytes,compaction.compaction-windowby @electron0zero in #6909 - Enable RetryInfo by default.
distributor.retry_after_on_resource_exhaustednow defaults to5s(was0) so OTLP clients receive a retry hint onResourceExhaustederrors by @electron0zero in #7088
Set to0to disable cluster-wide, or set the per-tenant overrideingestion.retry_info_enabled: falseto disable for a single tenant. - Centralize block and WAL config:
block_builderandlive_storenow always usestorage.trace.blocksettings; per-module block config fields are removed by @stoewer in #6647 - Remove Opencensus receiver by @javiermolinar in #6523
- Remove legacy
mem-ballast-size-mbscli flag by @orkhan-huseyn in #6403 - tempo-cli: Support relative time (now, now-1h) for start/end args and standardize on RFC3339 in all commands by @electron0zero in #6458
query searchcommand no longer accepts timestamps without timezone (e.g.2024-01-01T00:00:00), use RFC3339 (e.g.2024-01-01T00:00:00Z) or relative time instead. - Consolidate read configuration for recent data cutoff.
query_frontend.search.query_ingesters_untilis removed in favor of onlyquery_frontend.search.query_backend_afterby @mapno in #6507 - Remove deprecated
querier.query_live_storeconfig. This field must be removed from configs on upgrade by @javiermolinar in #7048 - Optimize TraceQL AST by rewriting conditions on the same attribute to their array equivalent by @stoewer in #6353
Slightly changes the array matching semantics of != and !~ operators and introduces stricter rules for regex literals. - Remove partition ring livestore config by @javiermolinar in #6981
- Remove ingester module by @javiermolinar in #6959
- Remove ingest.enabled config by @javiermolinar in #6873
- Disable legacy (flat, unscoped) overrides by default. Tempo will refuse to start if legacy overrides are detected. Set
enable_legacy_overrides: trueor-config.enable-legacy-overrides=trueto opt back in temporarily. Legacy overrides will be removed in a future release by @electron0zero in #6741 - Remove remaining app ingester config by @javiermolinar in #6667
- Remove span-metrics leftovers and lazy-init generator clients by @javiermolinar in #6618
- Decommission livestore MetricsGenerator query service by @javiermolinar in #6615
- Remove metrics-generator localblocks processor and related local block storage plumbing by @javiermolinar in #6555
- Remove ingesters by @javiermolinar in #6504
- Remove ingesters and compactor alerts by @javiermolinar in #6369
- Removed
v2block encoding and compactor component by @joe-elliott in #6273
This includes the removal of the following CLI commands which werev2specific:list block,list index,view index,gen index,gen bloom. - SpanMetricsSummary is removed and querier code simplified by @javiermolinar in #6496 and #6510
- Sets the
alltarget to be 3.0 compatible and removes thescalable-single-binarytarget by @joe-elliott in #6283 - Clean up enterprise jsonnet by @javiermolinar in #6505
Changes
- Stop publishing 32-bit ARM binary archives. Release artifacts continue to include amd64 and arm64 binaries by @javiermolinar in #7106
- Upgrade Tempo to Go 1.26.0 by @stoewer in #6443
- Allow duplicate dimensions for span metrics and service graphs. This is a valid use case if using different instrumentation libraries, with spans having "deployment.environment" and others "deployment_environment", for example by @carles-grafana in #6288
- Update default max duration for TraceQL metrics queries up to one day by @javiermolinar in #6285
- Set TraceQL query metrics checks by default in Vulture by @javiermolinar in #6275
- Make Tempo single-binary example use the local backend by @javiermolinar in #7033
- Bump ingestion limits by @javiermolinar in #7034
- TraceQL metrics - change default step intervals to align with new vParquet5 timestamp columns by @mdisibio in #6413
- Remove all traces of ingesters from the dashboards by @javiermolinar in #6352
- jsonnet: Add emptyDir data volume to block-builder StatefulSet by @mapno in #6648
- Add quick checks to tempo mixin runbook by @javiermolinar in #6696
- Deprecate metrics-generator no-local-blocks by @javiermolinar in #6707
- Own local block and partition ring helpers by @javiermolinar in #6808
- Track invalid trace and span id discards by @javiermolinar in #6799
- Deprecate
query_frontend.rf1_afterand query all blocks regardless of replication factor for non-metrics paths. Simplifies 2.x to 3.0 migration by @mapno in #6969 - Flush blocks to backend storage from the Live store in single binary mode by @javiermolinar in #6941
- Remove stale config from the examples by @javiermolinar in #6980
- tempo-cli: Rewrite
migrate overrides-configand addmigrate overrides-per-tenantcommand to help migrate legacy flat overrides to the new scoped format by @electron0zero in #6793 - Decouple livestore from metrics-generator by @javiermolinar in #6506 and #6535
- Expose otlp http and grpc ports for Docker examples by @javiermolinar in #6296
Features
- Add span profiling support via otelpyroscope. Enable with
span_profiling: true(or-span-profilingCLI flag) to attach pprof labels to OTel spans by @simonswine in #7063 - Add
tempo-cli migrate configcommand for migrating Tempo 2.x configs to 3.0 by @mapno in #6982 - jsonnet: Add KEDA-based horizontal pod autoscaling support for microservices deployment by @mapno in #6970
- Add automemlimit support for automatic GOMEMLIMIT configuration. Enable with
memory.automemlimit_enabled: trueby @oleg-kozlyuk-grafana in #6313 - Support comparison operators in TraceQL Metrics queries by @ruslan-mikhailov in #6474
- metrics-generator: Add span filtering to service graphs through
filter_policiesby @javiermolinar in #6453 - Add new include_any filter policy for spanmetrics filter by @javiermolinar in #6392
- Add span_multiplier_key to overrides. This allows tenants to specify the attribute key used for span multiplier values to compensate for head-based sampling by @carles-grafana in #6260
- metrics-generator: Add per-label limiter to control cardinality by @electron0zero in #6414
Addsmax_cardinality_per_labelper tenant override and new metrics to estimate per label cardinality demand estimate. - Add an extension mechanism for per-tenant overrides by @stoewer in #6758
- Extend
TraceRedactorinterface to support hiding complete traces viaErrTraceHiddenby @stoewer in #6811 - Single-binary mode: push distributor local ingest directly to live-store and metrics-generator without Kafka by @javiermolinar in #6729
Enhancements
- Support OR conditions for tag name and tag value autocomplete (search tags v2) by @ie-pham in #6827
- Expose MinIO retry settings via S3 config by @rwhitty in #6561
- Reduce default livestore WAL size and align query defaults:
max_block_duration1mto30s,max_block_bytes100MiBto50MiB,complete_block_timeout1hto20m, metricsquery_backend_after30mto15mby @zhxiaogg in #6974 - Enable native histogram emission for all promauto-registered histograms, including
tempo_request_duration_seconds. Both classic and native formats are emitted simultaneously; existing scrapers are unaffected by @zalegrala in #6910 - tempo-cli: Add
--headerflag toquery apicommands for custom headers by @Nouuu in #6768 - tempo-cli: add
redactcommand to submit trace redaction jobs to the backend scheduler by @zalegrala in #6832 - Block builder: deduplicate spans within traces during block creation and track removed duplicates via
tempo_block_builder_spans_deduped_totalmetric by @zhxiaogg in #6539 - metrics-generator: Support extracting span multiplier from W3C tracestate OTel probability sampling threshold via
enable_tracestate_span_multiplierconfig option by @csmarchbanks in #6684 - Add new alerts and runbooks entries by @javiermolinar in #6276
- Double the maximum number of dedicated string columns in vParquet5 and update tempo-cli to determine the optimum number for the data by @mdisibio in #6282
- TraceQL metrics - experimental faster read path for most metrics queries, accessible behind the query hint
spanonly_fetch=truewhenunsafe_query_hintsis enabled by @mdisibio in #6359 - TraceQL metrics - add new per-tenant override to opt-in or opt-out of the new experimental faster read path for most metrics queries by @mdisibio in #6849
- Vulture: extend data consistency checks to include more strings, integers, and blobs, at resource/span/event scopes, and perform deeper trace content check by @mdisibio in #6731
- Improve attribute truncating observability by @javiermolinar in #6400
- Log truncated oversized attributes by @carles-grafana in #6467
- livestore: make
trace_too_largelog line an insight by @carles-grafana in #6371 - Remove live-store partition owner from ring on shutdown to prevent stale owner entries by @oleg-kozlyuk-grafana in #6409
- Improved live store readiness check and added
readiness_target_lagandreadiness_max_waitconfig parameters. Live store will now - ifreadiness_target_lagis set - not report/readyuntil Kafka lag is brought under the specified value by @oleg-kozlyuk-grafana and @ruslan-mikhailov in #6238 and #6405 - Expose a new histogram metric to track the jobs per query distribution by @javiermolinar in #6343
- Do deep validation for filter policies in user configurable overrides API by @electron0zero in #6407
- Allow span_name_sanitization to be set via user-configurable overrides API by @Logiraptor in #6411
- Add
fail_on_high_lagparameter to allow live-store to fail if it is lagged by @ruslan-mikhailov and @carles-grafana in #6363, #6567 and #7066 - Add support for per-tenant left-padding of trace IDs by @mapno in #6489
- Add new metric for generator ring size:
tempo_distributor_metrics_generator_tenant_ring_sizeby @zalegrala in #5686 - Remove explicit
runtime.GC()calls in vParquet5 compactor/block creation and CLI by @oleg-kozlyuk-grafana in #6603 - Reduce allocations in
extendReuseSlicegrowth path during WAL writes and block creation by @mapno in #6863 - Implemented anti-affinity for pods in same livestore zone by @zhxiaogg in #6757
- Livestore: skipped WAL complete op during shutdown by @zhxiaogg in #6839
- Add metric to track livestore block cut reasons by @zhxiaogg in #6922
- Enable async parquet read mode for WAL completion path by @zhxiaogg in #6967
- metrics-generator: add
leave_consumer_group_on_shutdownto send LeaveGroup on shutdown for immediate partition reassignment instead of waiting for session timeout by @zalegrala in #6575
Bugfixes
- Fix tempo-vulture ignoring
-tempo-push-tlsflag in normal operating mode by @zachfi in #6976 - livestore: check readiness before lag for SearchRecent and QueryRange queries by @zhxiaogg in #6911
- Fix integer overflow in query parameters by using
strconv.ParseUintinstead ofstrconv.Atoi/strconv.ParseIntfor unsigned integer fields by @ricardbejarano in #6612 - Fix live-store SearchTagValuesV2 disk cache never being populated on complete blocks by @mapno in #6858
- Fix dedicated columns fallback in
block_builderandlive_storeto usestorage.trace.block.parquet_dedicated_columnswhen not set via overrides by @stoewer in #6647 - Force live-store to rehydrate from Kafka lookback period when local data is missing (e.g. PVC wipe, new node) instead of resuming from the committed consumer group offset by @oleg-kozlyuk-grafana in #6428
- fix: reload span_name_sanitization overrides during runtime by @electron0zero in #6435
- fix: live store honor the config options for block and WAL versions by @mdisibio in #6509
- fix: block builder honor the global storage block config for block and WAL versions by @Harry-kp in #6532
- fix: normalize allowlist headers when building the allowlist map by @javiermolinar in #6481
- fix: bug related to dedicated column filtering by @stoewer in #6586
- fix: compactor deduped spans metric uses wrong type (gauge instead of counter) by @bejaratommy in #6576
- metrics-generator: Fix active-series counter underflow in local series limiter when overflow series are deleted by @carles-grafana in #6568
- fix: skip per-label limiter and sanitizer for target_info and host_info metrics in metrics-generator by @electron0zero in #6660
- fix(traceql): err on division by zero by @Proximyst in #6580
- fix(traceql): stop intPow from hanging by @Proximyst in #6581
- fix(traceql): Fix incorrect search results for some queries on new blob columns by @mdisibio in #6815
- fix(vparquet5) Fix buffer-reuse bug where event attributes in dedicated columns could be persisted on additional spans and events by @mdisibio in #6914
- fix: race condition where
remove_owner_on_shutdownflag was set too late — after context cancellation already triggered the lifecycler's shutdown, causing the partition owner to remain in the ring by @oleg-kozlyuk-grafana in #6693 - Return 400 instead of 500 when query_range or query_instant requests have unparseable start/end parameters by @ruslan-mikhailov in #6694
- fix: correct block-builder fetch metrics to use counters instead of gauges by @WinterCabbage in #6578
- Log tenant on receiver push errors by @javiermolinar in #6780
- Fix race conditions in WAL block by @ruslan-mikhailov in #6773
- metrics-generator: Fix
target_infobeing skipped when resource attributes have empty values by @carles-grafana in #6774 - metrics-generator: Drain old series on metric replacement to prevent limiter leak and permanent overflow by @carles-grafana in #6653
- live-store: fixed unsuccessful deregistering from membership/partition rings during shutdown by @zhxiaogg in #6848
- fix: respect context cancellation when reading WAL block iterator by @zhxiaogg in #6928
- Complete lifecycler shutdown on errors by @javiermolinar in #6906
- livestore: fix concurrent WAL writes from periodic and shutdown flushes by @zhxiaogg in #6972
- live-store: fix race conditions for tag values endpoint by @ruslan-mikhailov in #7000
- live-store: correct backoff duration calculation by @ruslan-mikhailov in #6999
- vulture: fix for recent traces when query_end_cutoff is enabled by @ruslan-mikhailov in #7018
- Fix live-store producing WAL blocks exceeding max_block_bytes when flushing large batches of idle traces by @ruslan-mikhailov in #6971
- live-store: skip lookback replay when partition is Inactive during scaling down by @zhxiaogg in #7101
New Contributors
Thanks to the following first-time contributors:
- @evan361425 made their first contribution in #5968
- @mihaelmiklec made their first contribution in #6442
- @Harry-kp made their first contribution in #6532
- @bejaratommy made their first contribution in #6576
- @jasuade made their first contribution in #6610
- @antonio-mazzini made their first contribution in #6609
- @orkhan-huseyn made their first contribution in #6403
- @ricardbejarano made their first contribution in #6612
- @rwhitty made their first contribution in #6561
- @WinterCabbage made their first contribution in #6578
- @csmarchbanks made their first contribution in #6684
- @gounthar made their first contribution in #6756
- @Nouuu made their first contribution in #6768
- @EoinTrial made their first contribution in #6905
- @sethmccombs made their first contribution in #7108
Full Changelog: v2.10.0-rc.0...v3.0.0-rc.1
breakingarchitecturetraceqlmigrationconfiguration
Source: original entry ↗