SigNoz is a good product. That is not why we left it.
We left because for infrastructure monitoring — hosts, containers, databases, hypervisors — the modular Grafana stack fits the job better. And Grafana Alloy replaced two separate agents with one.
TL;DR
We ran SigNoz with the OpenTelemetry Collector on our private cloud infrastructure. We migrated to Prometheus, Loki, Alertmanager, and Grafana, with Grafana Alloy as the single collection agent on every host. The modular stack gives us a better exporter ecosystem, more expressive alerting, and one fewer agent per machine.
How We Got Here
When we started building ZCP — our private cloud platform running on Proxmox and Apache CloudStack — we needed observability fast. SigNoz was the obvious choice: a single service that ingests OpenTelemetry signals natively, stores traces, metrics, and logs in one place, and provides a unified query interface.
We deployed it, enrolled our first nodes, and it worked. For the bootstrapping phase, that was enough.
What SigNoz Does Well
To be clear about why we chose it:
- OpenTelemetry-native. Exporters speak OTLP directly to the collector. No translation layer.
- Unified backend. One service for traces, metrics, and logs. One URL.
- Good default dashboards. Infrastructure views land immediately after enrollment.
- Self-hosted. Data stays in the DC.
For tracing distributed applications (microservices, APIs, request paths across services), SigNoz is excellent. Trace correlation is where it earns its reputation.
Where It Started Feeling Wrong
Infrastructure monitoring is a different workload than application tracing.
The things we needed day-to-day: is this host healthy, what is this disk doing, why did this container restart, show me the last 24 hours of auth logs from these three nodes. SigNoz answered those questions. But the workflow was tuned for tracing paths through services, not drilling into host metrics across a fleet.
We also hit practical friction in three areas:
Exporter ecosystem
The Prometheus exporter ecosystem is larger and older. mysqld_exporter, pve_exporter, ceph_mgr_prometheus are all designed to be scraped by Prometheus, not to push OTLP. Running them through an OTel collector pipeline works, but it adds a translation step for every exporter. Each new exporter is a new pipeline to wire.
Alerting
SigNoz has alert rules. Alertmanager has been the standard for infrastructure alerting for a decade. The routing logic (group by severity, route by host role, send to the right channel) is more expressive in Alertmanager for fleet-scale operations. It is also YAML, version controlled, and idempotent.
Log queries
LogQL in Loki, paired with Grafana dashboards, fits infrastructure log queries better than the SigNoz logs tab. This is especially true when correlating logs against metrics on the same panel with time-synced variables and shared filters.
Why Grafana Alloy Instead of Promtail
The standard answer to "how do I ship logs to Loki" used to be Promtail. Promtail is a purpose-built Loki log shipper. It works. It is also a second agent to deploy and maintain alongside the OTel Collector or whatever else you are using for metrics.
Grafana Alloy changes that. Alloy is the next-generation collection agent from Grafana Labs. It replaces both the OpenTelemetry Collector and Promtail in a single binary, using a component-based pipeline config that wires inputs, processors, and outputs together explicitly.
One agent per host. One config. One service to monitor.
For our fleet (Proxmox nodes, LXC containers, CloudStack KVM compute nodes, database nodes), Alloy ships logs from journald and /var/log, scrapes node metrics, and forwards everything to Loki and Prometheus from a single process. One Ansible playbook pass to roll out across the fleet.
The Stack We Landed On
| Component | Role |
|---|---|
| Prometheus | Metrics storage and scraping |
| Loki | Log aggregation and storage |
| Alertmanager | Alert routing and notification |
| Grafana | Dashboards and unified query interface |
| Grafana Alloy | Collection agent: logs and metrics, one binary per host |
Everything runs as Docker containers on a dedicated monitoring node in our management VLAN. Alloy runs on each host in the fleet.
What We Gained
Exporter compatibility
Every Prometheus exporter works out of the box. Prometheus scrapes them directly, no translation required. Adding a new exporter is one new job in the scrape config.
Alertmanager routing
Alert rules live in Prometheus. Routing lives in Alertmanager. The separation is clean: define the condition once, route it differently depending on severity, environment, and host role. All version-controlled YAML.
Dashboard variables
Grafana template variables ($host, $site, $role) let us build one dashboard that works across the entire fleet. Filter by Proxmox cluster, by customer, or by environment. The same dashboard covers different scopes. SigNoz dashboards are more static by comparison.
Single agent
Alloy on each host replaces the OTel Collector and Promtail. Less to deploy, less to upgrade, less to break.
What We Gave Up
Traces. SigNoz trace correlation is genuinely good, and we do not have a replacement for it in the Grafana stack today. Grafana Tempo handles distributed traces natively, but we have not stood it up yet.
For infrastructure monitoring at our current scale, we have no distributed tracing requirements. When application workloads on ZCP need it, we will add Tempo to the existing stack rather than running two separate backends.
When SigNoz Is Still the Right Call
- You are running distributed microservices and trace correlation across services is a daily workflow
- You want a single product rather than assembling components
- Your team is starting fresh on OpenTelemetry and wants everything in one place
- You need traces now and Grafana Tempo feels like more assembly than you want to manage
SigNoz is not a wrong choice. It is a different trade-off.
ZSoftly operates private cloud infrastructure for Canadian businesses under ZCP, our platform built on Apache CloudStack, Proxmox, and Ceph. If you are evaluating private cloud or building observability for on-premises infrastructure, talk to us.


