OpenTelemetry Is Great, but Who the Hell Is Going to Pay for It?

by thunderbongon 6/30/25, 6:28 PMwith 37 comments
by stego-techon 6/30/25, 7:50 PM

Excellent critique of the state of observability, especially for us IT folks. We’re often the first - and last, until the bills come - line of defense for observability in orgs lacking a dedicated team. SNMP Traps get us 99% of the way there with anything operating in a standard way, but OTel/Prometheus/New Relic/etc all want to get “in the action” in a sense, and hoover up as much data points as possible.

Which, sure, if you’re willing to pay for it, I’m happy to let you make your life miserable. But I’m still going to be the Marie Kondo of IT and ask if that specific data point brings you joy. Does having per-second interval data points actually improve response times and diagnostics for your internal tooling, or does it just make you feel big and important while checking off a box somewhere?

Observability is a lot like imaging or patching: a necessary process to be sure, but do you really need a Cadillac Escalade (New Relic/Datadog/etc) to go to the grocery store when a Honda Accord (self-hosted Grafana + OTel) will do the same job more efficiently for less money?

Honestly regret not picking the Observability’s head at BigCo when I had the chance. What little he showed me (self-hosted Grafana for $90/mo in AWS ECS for the corporate infrastructure of a Fortune 50? With OTel agents consuming 1/3 to 1/2 the resources of New Relic agents? Man, I wish I had jumped down that specific rabbit hole) was amazingly efficient and informative. Observation done right.

by denysvitalion 6/30/25, 6:40 PM

I don't think the comparison is correct. For sure OTEL adds some overhead, but if you're ingesting raw JSON data, then even with the overhead it's probably going to be reduced since internally the system talks OTLP - which is often (always?) encoded with protobuf and most of the time sent over via gRPC.

It's then obviously your receiver end's job to take the incoming data and store it efficiently - grouping it by resource attributes for example (since you probably don't want to store 10 times the same metadata). But especially thanks to the flexibility of adding all the surrounding metadata (rather than just shipping the single log line), you can do magic thinks like routing metrics to different tenants / storage classes or drop them.

Having said that, OTEL is both a joy and an immense pain to work with - but I still love the project (and still hate the fact that every release has breaking changes and 4 different version identifiers).

Btw, one of the biggest win in the otel-collector would be to use the new Protobuf Opaque API as it will most likely save lots of CPU cycles (see https://github.com/open-telemetry/opentelemetry-collector/is...) - PRs are always welcome I guess.

by maplemuseon 6/30/25, 6:59 PM

The part about SNMP made me laugh. I remember integrating SNMP support into an early network security monitoring tool about 25 years ago, and how it seemed clunky at the time. But it's continued to work well, and be supported all these years. It was a standard, but with very broad tool support, so you weren't locked into a particular vendor.

by njpatelon 7/1/25, 1:34 PM

I'm not sure the log message v structured message comparison makes sense - most log and trace events are batched and compressed before being transported, and so the size difference isn't really an issue. On the receiving side most services have some kind of column store as their main datastore and so there's no issue there either. The benefits of structured logging are worth it.

Regarding pricing - there is an option for every kind of usage out there, it's mostly a solved problem that comes with choosing two of large scale, low cost, low latency. For instance we serve large scale and low cost.

by cortesofton 6/30/25, 8:18 PM

Setting up a self hosted prometheus and grafana stack is pretty trivial when starting out. I run a Cortex cluster handling metrics for 20,000 servers, and it requires very little maintenance.

Self-hosting metrics at any scale is pretty cost effective.

by pojzonon 6/30/25, 7:54 PM

Difference between OTel and other previous standards is that OTel was created by “modern” engineers that dont care about resource consumption or dont even understand it. Which is funny because thats what the tool is about.

So yea, cost of storage and network traffic is only going to balloon.

There is room for improvements and I can already see new projects that will most likely gain traction in upcoming years.

by hermanradtkeon 6/30/25, 7:48 PM

New Relic, Datadog, etc are selling their original offering but now with otel marketing.

I encourage the author to read the honeycomb blog and try to grok what makes otel different. If I had to sum it up in two points:

- wide rows with high cardinality

- sampling

by rison 6/30/25, 9:23 PM

The logging examples given don't appear to be too different to what any structured & annotated logging mechanism would give you. On top of that it's normally encoded with grpc, so that's already one-up on basic json-encoded structured logs.

The main difference I see with otel is the ability to repeatedly aggregate/decimate/discard your data at whatever tier(s) you deem necessary using opentelemetry-collector. The amount of data you end up with is up to you.

by ramon156on 6/30/25, 9:22 PM

All I want is spanned logs in JS. Why do I need OTEL? Why can't pino do this for me?

by sheerunon 6/30/25, 11:07 PM

Trapped in his time I see

by invalidnameon 7/1/25, 2:21 AM

Cloud companies will charge you if you use their features without limits: news at 11...

Observability solutions of this type are for the big companies that can typically afford the bill shock. These companies can also afford the routine auditing process that makes sure the level of observability is sufficient and inexpensive. Smaller companies can just log into a few servers or a dashboard to get a sense of what's going on. They don't need something at this scale.

Pretty much every solution listed lets you fine tune the level of data you observe/retain to a very deep level. I'm personally more versed in OneAgent than OTel and you can control everything to a very fine level of data ingestion.

by dborehamon 6/30/25, 7:37 PM

Uhhh. The point of OTel is that you can host it yourself. And should do imho unless you're part of a VC money laundering scheme where they want to puff up NR or DD or whoever portfolio company numbers.