Service management

Warning

This is an outdated version of the last draft on service management. It will be updated soon.

Introduction

To manage isabelle server instances ISM uses Linux namespaces, similar to container runtimes. These namespaces are then attached to systemd transient units1 to run our services within a confined environment.

1

A transient unit mainly differs from a "normal" unit by not being backed by a unit file. This means it only exists ephemerally and not be, e.g., restarted after reboot or persisted. However, artifacts of the unit, like journal logs, are still kept.

Startup

The agent initializes (primes, like a pump) a pool of server instances at application startup. The agent will in regular intervals cleanup unhealthy server instances and replenish the pool with the goal to always have the number of request primed instances available.

Pool management

Each server has an attached monitoring connection that is used to extract metrics, like the number of running sessions. This connection is further used to detect behavior that might indicate a corrupted state, e.g., dropped monitoring connection or timeout for list_sessions.

Each server instance has a static max number of (client) connections2. A new server instance is started, If all servers in the pool have exhausted their connections. ISM tries to distribute clients evenly over the pool however it doesn't yet transfer clients when one is having above average clients.

Info

This is a naive attempt to behave like a load balancer over the server pool. Out goal is to reduce the impact of inevitable partition failures.

2

Configured via isabelle.max_connections