Infra Atlas · Toolbox

Observability & Logs.

The terminal-side of observability — not Grafana dashboards but the tools you reach for when SSH'd into a box at 2am. Log navigation that doesn't make you scroll forever, JSON queries that don't require a manual, a process monitor that actually uses colour. Plus the log pipeline and benchmarking tools that belong in every platform team's playbook.

Form
Language
Picks
lnav replaces tail -f / less on log files

Log file navigator: open one or several log files and get syntax highlighting, automatic format detection (syslog, access_log, JSON, strace, many others), a search bar, and the ability to jump by time. lnav /var/log/nginx/access.log is immediately better than less with no configuration. Handles gzipped files and multiple files merged by timestamp.

TUI C mature SRE
tstack/lnav
btop replaces htop / top

Resource monitor for CPU, memory, disk, and network with smooth graphs, mouse support, and a genuinely readable layout. Better than htop for anything involving per-process I/O or network. The graphs make it clear at a glance whether a machine is CPU-bound, memory-bound, or saturating a disk — faster than parsing vmstat numbers. Works over SSH.

TUI C mature SRE
aristocratos/btop
jq replaces grep / sed for JSON log queries

The standard for processing JSON on the command line. cat app.log | jq 'select(.level == "error") | {ts: .timestamp, msg: .message}' is a query language, not a hack. Pipe-composable, scriptable, present on almost every engineering machine. If you work with structured logs or any API that returns JSON, jq is as fundamental as grep.

CLI C mature dev
jqlang/jq
Vector replaces Logstash / Fluentd

High-performance observability data pipeline: collect logs, metrics, and traces; transform them (parse, filter, enrich, route); deliver to any destination (S3, Elasticsearch, Datadog, Loki, InfluxDB, Kafka). Written in Rust — substantially faster and lighter than Logstash or Fluentd with comparable functionality. Single binary. Used as a sidecar, a node agent, or a standalone aggregator.

daemon Rust mature platform
vectordotdev/vector
GoAccess replaces AWStats / manual access log analysis

Real-time web log analyser for nginx, Apache, CloudFront, and other access log formats. Renders a live TUI dashboard in the terminal or an HTML report. goaccess /var/log/nginx/access.log --log-format=COMBINED gives request counts, top IPs, 4xx/5xx breakdown, bandwidth, and visitor trends — in seconds, without shipping logs anywhere. Invaluable for a quick traffic audit on any server you can SSH into.

TUI C mature SRE
allinurl/goaccess
hyperfine replaces manual time / bash timing loops

Command-line benchmarking tool: runs a command repeatedly, warms up the cache, accounts for variance, and reports mean/stddev/min/max with a progress bar. hyperfine 'grep -r foo .' 'rg foo .' gives you a statistically sound comparison. Essential for validating that a "performance improvement" actually improves performance before shipping it.

CLI Rust mature dev
sharkdp/hyperfine
ctop replaces docker stats

Container-level resource monitor: a real-time TUI that shows CPU, memory, network I/O, and block I/O per container in a compact table. ctop is to Docker what btop is to the host — a single-pane view of what's actually consuming resources. Particularly useful on a host running many containers where docker stats outputs an unwieldy flat list. Supports Docker and Kubernetes (via kubectl). Read-only, no config required.

TUI Go SRE
bcicen/ctop
OpenTelemetry Collector replaces Logstash / Fluentd for OTEL pipelines

The vendor-neutral OTEL agent for receiving, processing, and exporting telemetry. Runs as a sidecar or a node agent; accepts OTLP, Jaeger, Zipkin, Prometheus scrape, and Fluent Bit; exports to any backend (Datadog, Grafana Cloud, New Relic, Honeycomb, Jaeger, S3, Kafka). The two distributions: otelcol (core, minimal) and otelcol-contrib (all connectors). If you're instrumenting anything with OpenTelemetry SDK, the Collector is how you decouple your app from the backend — change vendors without touching code.

daemon Go SRE platform
open-telemetry/opentelemetry-collector
Tracetest replaces manual trace validation

Trace-based testing tool: run an HTTP request, capture the distributed trace it produces (via OTEL), and assert against span attributes, durations, and service graph structure. Write tests like "the database query span must be under 200ms" or "no N+1 queries — there must be exactly 1 SQL SELECT span". Integrates with Jaeger, Tempo, and any OTLP-compatible backend. Bridges the gap between functional testing ("did it return 200?") and performance/architecture testing ("did it behave efficiently?").

CLI Go dev
kubeshop/tracetest
*

This department covers the terminal-side of observability. Full stack observability (Prometheus, Grafana, Loki, Tempo) involves self-hosted servers and is a deployment decision, not a laptop install — those are out of scope here. The tools listed are either local CLI/TUI tools or lightweight daemons you'd run as a sidecar. Vector is the exception — it is a real pipeline service, but its binary-per-node model makes it toolbox-appropriate.