Metrics
ISM collects metrics for servers and client tasks, both are handled as separate collectors. Both are part of the agent and exposed as Prometheus metrics via HTTP. Metrics for servers are collected on demand while task metrics are tracked continuously.
Server metrics: agent/monitoring.go
Servers metrics are only collected for their:
- health,
ism_server_status
, gauge - connected clients,
ism_server_clients
, gauge - running sessions,
ism_running_sessions
, gauge
A server will be removed from the exposed metrics, if the underlying server instance is terminated.
The agent also exposes the desired pool size as static ism_server_prime
metric.
This can be used to asses how the current number of running servers compared to the desired number of running servers.
Task metrics: agent/tasks.go
The task recorder is under active development and metrics will likely be changed. All durations are in milliseconds.
The task recorder distinguishes for task duration between the first call of a command for a client and subsequent calls.
This is being done to allow better observability on first startup times, to, e.g., better see the duration of the first use_theories
call where an ML heap is being built.
- first call duration,
ism_first_task_duration
, summary - subsequent call duration,
ism_subsequent_task_duration
, summary
Note on circular imports
The agent imports the gateway package and as such can't be imported by the gateway again.
To facilitate the communication between the gateway handlers and the agent a workaround had to be made.
The current implementation uses an interface exposed by the gateway (gateway.PrometheusTaskRecorder
), which is implemented by the task recorder (agent.PrometheusTaskRecorder
).
The recorder is then supplied to the gateway acceptor in ism/main.go
.