Skip to content

Commit

Permalink
Merge remote-tracking branch 'supervisor/thebutlah/prepare-foss'
Browse files Browse the repository at this point in the history
  • Loading branch information
TheButlah committed Aug 6, 2024
2 parents b39ee1c + e197dd8 commit e0005d4
Show file tree
Hide file tree
Showing 19 changed files with 1,536 additions and 0 deletions.
65 changes: 65 additions & 0 deletions orb-supervisor/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Changelog

## 0.4.1

### Added

+ Private proxy for getting notified of service registration `org.worldcoin.OrbSupervisor1`
+ Version pinning for GitHub actions

### Changed

+ Upon booting `orb-supervisor` permits `update-agent` to begin downloading
immediately without throttling, until a signup starts

## 0.4.0

### Added

+ Proxy for logind method `org.freedesktop.login1.Manager.ScheduleShutdown`
+ enables `orb-core` and `update-agent` to shutdown or restart the device without
needing to grant elevated priveleges/suid

## 0.3.0

`orb-supervisor` no longer shuts down `orb-core` immediately when an update happens
but waits until no new signups have been started for a while.

### Added

+ Upon receiving a `RequestUpdatePermission` request, `orb-supervisor` only shuts
down `orb-core` after 20 minutes of inactivity (meaning that no signups have been
performed for 20 minutes). This timer is reset every time a new signup starts.
Once the timer is up, `orb-supervisor` schedules `update-agent` to immediately run again.

### Changed

+ `orb-supervisor` now returns custom `MethodError`s to report why an update was denied,
bringing it more in line with DBus conventions.

## 0.2.0 (October 20, 2022)

`orb-supervisor`'s integration with systemd and journald is improved by using
journald conventions and writing directly to the journald socket.

### Added

+ `orb-supervisor` detects if its attached to an interactive TTY using `STDIN`:
+ if not attached to a TTY, it will write to the journald socket
+ if attached to a TTY, it will write to stdout/stderr
+ `orb-supervisor` identifies itself as `worldcoin-supervisor` using SYSLOG IDENT;
+ use `journalctl -t worldcoin-supervisor` to filter journald entries
(`-u worldcoin-supervisor` however is still the preferred way);

## 0.1.0 (August 31, 2022)

This is the first release of `orb-supervisor`.

### Added

+ Expose dbus property `org.worldcoin.OrbSupervisor1.Manager.BackgroundDownloadsAllowed`;
+ Tracks how much time has passed since the last
`org.worldcoin.OrbCore1.Signup.SignupStarted` events;
+ Expose dbus method `org.worldcoin.OrbSupervisor1.Manager.RequestUpdatePermission`;
+ attempts to shutdown `worldcoin-core.service` through
`org.freedesktop.systemd1.Manager.StopUnit`;
24 changes: 24 additions & 0 deletions orb-supervisor/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[package]
name = "orb-supervisor"
version = "0.4.1"
edition = "2021"

[dependencies]
color-eyre = "0.6.3"
libc = "0.2.135"
listenfd = "1.0.0"
tokio = { version = "1.21.2", features = ["macros", "net", "rt-multi-thread"] }
tokio-stream = "0.1.11"
tracing = { version = "0.1.37", features = ["attributes"] }
tracing-subscriber = { version = "0.3.16", features = ["env-filter"] }
zbus = { version = "3.9.0", default-features = false, features = ["tokio"] }
zbus_systemd = { version = "0.0.8", features = [ "systemd1", "login1" ] }
thiserror = "1.0.37"
futures = "0.3.24"
once_cell = "1.15.0"
tap = "1.0.1"
tracing-journald = "0.3.0"

[dev-dependencies]
dbus-launch = "0.2.0"
tokio = { version = "1.25.0", features = ["sync", "test-util"] }
15 changes: 15 additions & 0 deletions orb-supervisor/PROCESS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Guideline for Supervised Process Development

Examples of SuPr (**Su**pervised **Pr**ocess) development:
- update-agent
- orb-core
- fan-controller
- ...

## Expectations

Through signal_hook or otherwise, we expect components to adhere to UNIX signal best practices, specifically around shutdown signals.

### Shutdown Flow

The supervisor _decides_ it must shutdown. The supervisor iterates over the list of supervised processes, reads their corresponding PID file, and issues a [SIGTERM](https://man7.org/linux/man-pages/man7/signal.7.html) to give the application **SOME DEFINED SECONDS** to shutdown. After that time has elapsed, the supervisor re-reads the SuPr PID files and sends a [SIGKILL](https://man7.org/linux/man-pages/man7/signal.7.html).
77 changes: 77 additions & 0 deletions orb-supervisor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Orb Supervisor

Orb supervisor is a central IPC server that coordinates device state and external UX across independent agents (binaries/processes).

## Table of Contents

- Minimal viable product (MVP)
- Why (this is necessary)
- Managing device health
- Consistent UX
- Seperation of concerns
- Relevant components

## MVP

### Initial release

- supervisor running [tonic gRPC](https://github.com/hyperium/tonic) over UDS (Unix Domain Sockets)
- supervisor can broadcast shutdown message
- component apps (orb-core, update-agent) listen for broadcast and shutdown
- supervisor can update SMD **through sub-process**
- supervisor can display front LED patterns
- IPC (InterProcess-Communication) client library supporting defaults for process shutdown handlers
- Setup the bidirectional communication + the listener for broadcast messages

### Immediate follow-up release
- supervisor can play sounds
- supervisor can engage in bi-directional communication for signup permission with orb-core; orb-core must not run a signup if...
- an update is scheduled;
- the device is shutting down;
- the SSD is full (coordinate with @AI on signup extensions);
- fan-controller PoC
- spin fans up/down depending on temperature/temperature-analogs
- watch iops/sec on NVMe as an indicator of SSD temperature (can be replaced by reading out SMART data after kernel 5.10 is deployed)
- supervisor can update SMD **through nvidia-smd crate**
- Implement an Nvidia SMD parser as a crate (other people may want this)

## Why this is necessary

There are two reasons that make the orb supervisor necessary:

1. Managing device health (heat, updates)
1. Consistent UX (updates w/ voice, LEDs)
1. Separation of concerns

### Managing device health

Device health must be ensured at all times, whether the device is updating or in the middle of a signup. Furthermore, you want this to be maximally isolated to avoid a scenario where, through a vulnerability in a monolithic application, an attacker acquires fan control and overheats the device.

> **Scenario**: _A non-security critical update is running in the background and writing large blobs of data to the NVMe SSD_ while _orb-core is running and signups are being performed. An attacker uses a vulnerability in the QR code processing to deadlock a thread. They then proceed to garble the incoming network traffic causing the download to be repeatedly retried and data to be constantly written to the SSD while thermal management is stuck in the blocked runtime. This can feasibly fry an Orb._
### Consistent UX

By necessity, the update agent service must have heightened privileges. Under no circumstances can we extend these to the entire orb-core process. At the same time, the operator must receive feedback on the status of an update. For certain updates, orb-core will not run during the update. In this scenario there is currently no mechanism to give feedback to the operator.

Thus, an independent service that owns UX is a necessary condition for operator feedback.

### Seperation of concerns

Breaking components down allows us to:

+ Reduce attack surfaces by restricting the responsibilities of privileged services;
+ Employ best patterns for the job (a fan monitoring service looks different from an update agent looks different from orb core);
+ Reduce engineering load (understanding a 500 LoC binary and finding bugs _is_ easier than in a 10k LoC monolith);
+ Running integration tests is significantly easier outside of complex runtimes.

It is best industry practice to write dedicated services *where possible*, where coupling is low and where solutions already exist. This applies especially on a full Linux host and will reduce engineering load.

## Relevant components

+ update agent
+ fan monitor & control
+ wifi management
+ UX controller, split into:
+ Sound
+ LED
+ library for basic and repeatable "component"
5 changes: 5 additions & 0 deletions orb-supervisor/src/consts.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
use tokio::time::Duration;

pub const WORLDCOIN_CORE_UNIT_NAME: &str = "worldcoin-core.service";
pub const DURATION_TO_STOP_CORE_AFTER_LAST_SIGNUP: Duration =
Duration::from_secs(20 * 60);
Loading

0 comments on commit e0005d4

Please sign in to comment.