Skip to main content
ZSoftly logo
Private Cloud

Why We Moved From SigNoz to the Grafana Stack

Staff at ZSoftly
6 min read
Observability
Monitoring
Grafana
Prometheus
Loki
OpenTelemetry
Private Cloud
Infrastructure
Self-Hosted
Share:
Migration from SigNoz to the Grafana observability stack with Prometheus, Loki, Alertmanager, and Grafana Alloy

SigNoz is a good product. That is not why we left it.

We left because for infrastructure monitoring — hosts, containers, databases, hypervisors — the modular Grafana stack fits the job better. And Grafana Alloy replaced two separate agents with one.


TL;DR

We ran SigNoz with the OpenTelemetry Collector on our private cloud infrastructure. We migrated to Prometheus, Loki, Alertmanager, and Grafana, with Grafana Alloy as the single collection agent on every host. The modular stack gives us a better exporter ecosystem, more expressive alerting, and one fewer agent per machine.


How We Got Here

When we started building ZCP — our private cloud platform running on Proxmox and Apache CloudStack — we needed observability fast. SigNoz was the obvious choice: a single service that ingests OpenTelemetry signals natively, stores traces, metrics, and logs in one place, and provides a unified query interface.

We deployed it, enrolled our first nodes, and it worked. For the bootstrapping phase, that was enough.


What SigNoz Does Well

To be clear about why we chose it:

  • OpenTelemetry-native. Exporters speak OTLP directly to the collector. No translation layer.
  • Unified backend. One service for traces, metrics, and logs. One URL.
  • Good default dashboards. Infrastructure views land immediately after enrollment.
  • Self-hosted. Data stays in the DC.

For tracing distributed applications (microservices, APIs, request paths across services), SigNoz is excellent. Trace correlation is where it earns its reputation.


Where It Started Feeling Wrong

Infrastructure monitoring is a different workload than application tracing.

The things we needed day-to-day: is this host healthy, what is this disk doing, why did this container restart, show me the last 24 hours of auth logs from these three nodes. SigNoz answered those questions. But the workflow was tuned for tracing paths through services, not drilling into host metrics across a fleet.

We also hit practical friction in three areas:

Exporter ecosystem

The Prometheus exporter ecosystem is larger and older. mysqld_exporter, pve_exporter, ceph_mgr_prometheus are all designed to be scraped by Prometheus, not to push OTLP. Running them through an OTel collector pipeline works, but it adds a translation step for every exporter. Each new exporter is a new pipeline to wire.

Alerting

SigNoz has alert rules. Alertmanager has been the standard for infrastructure alerting for a decade. The routing logic (group by severity, route by host role, send to the right channel) is more expressive in Alertmanager for fleet-scale operations. It is also YAML, version controlled, and idempotent.

Log queries

LogQL in Loki, paired with Grafana dashboards, fits infrastructure log queries better than the SigNoz logs tab. This is especially true when correlating logs against metrics on the same panel with time-synced variables and shared filters.


Why Grafana Alloy Instead of Promtail

The standard answer to "how do I ship logs to Loki" used to be Promtail. Promtail is a purpose-built Loki log shipper. It works. It is also a second agent to deploy and maintain alongside the OTel Collector or whatever else you are using for metrics.

Grafana Alloy changes that. Alloy is the next-generation collection agent from Grafana Labs. It replaces both the OpenTelemetry Collector and Promtail in a single binary, using a component-based pipeline config that wires inputs, processors, and outputs together explicitly.

One agent per host. One config. One service to monitor.

For our fleet (Proxmox nodes, LXC containers, CloudStack KVM compute nodes, database nodes), Alloy ships logs from journald and /var/log, scrapes node metrics, and forwards everything to Loki and Prometheus from a single process. One Ansible playbook pass to roll out across the fleet.


The Stack We Landed On

ComponentRole
PrometheusMetrics storage and scraping
LokiLog aggregation and storage
AlertmanagerAlert routing and notification
GrafanaDashboards and unified query interface
Grafana AlloyCollection agent: logs and metrics, one binary per host

Everything runs as Docker containers on a dedicated monitoring node in our management VLAN. Alloy runs on each host in the fleet.


What We Gained

Exporter compatibility

Every Prometheus exporter works out of the box. Prometheus scrapes them directly, no translation required. Adding a new exporter is one new job in the scrape config.

Alertmanager routing

Alert rules live in Prometheus. Routing lives in Alertmanager. The separation is clean: define the condition once, route it differently depending on severity, environment, and host role. All version-controlled YAML.

Dashboard variables

Grafana template variables ($host, $site, $role) let us build one dashboard that works across the entire fleet. Filter by Proxmox cluster, by customer, or by environment. The same dashboard covers different scopes. SigNoz dashboards are more static by comparison.

Single agent

Alloy on each host replaces the OTel Collector and Promtail. Less to deploy, less to upgrade, less to break.


What We Gave Up

Traces. SigNoz trace correlation is genuinely good, and we do not have a replacement for it in the Grafana stack today. Grafana Tempo handles distributed traces natively, but we have not stood it up yet.

For infrastructure monitoring at our current scale, we have no distributed tracing requirements. When application workloads on ZCP need it, we will add Tempo to the existing stack rather than running two separate backends.


When SigNoz Is Still the Right Call

  • You are running distributed microservices and trace correlation across services is a daily workflow
  • You want a single product rather than assembling components
  • Your team is starting fresh on OpenTelemetry and wants everything in one place
  • You need traces now and Grafana Tempo feels like more assembly than you want to manage

SigNoz is not a wrong choice. It is a different trade-off.


ZSoftly operates private cloud infrastructure for Canadian businesses under ZCP, our platform built on Apache CloudStack, Proxmox, and Ceph. If you are evaluating private cloud or building observability for on-premises infrastructure, talk to us.