Skip to content

Commit

Permalink
Adding a page about the maintenance window (#2227)
Browse files Browse the repository at this point in the history
Adding a page about the maintenance window

---------

Co-authored-by: Steve Fenton <99181436+steve-fenton-octopus@users.noreply.github.com>
Co-authored-by: Iryna Melnyk <92701928+irynamelnyk-octopus@users.noreply.github.com>
  • Loading branch information
3 people authored May 6, 2024
1 parent c05a07b commit f0f0c33
Show file tree
Hide file tree
Showing 6 changed files with 85 additions and 7 deletions.
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
"hast-util-from-selector": "^3.0.0",
"html-to-text": "^9.0.5",
"keyword-extractor": "^0.0.28",
"optional": "^0.1.4",
"remark-directive": "^3.0.0",
"remark-heading-id": "^1.0.1",
"sharp": "^0.33.3"
Expand Down
7 changes: 7 additions & 0 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
layout: src/layouts/Default.astro
pubDate: 2023-01-01
modDate: 2023-08-18
modDate: 2024-04-05
title: Maintenance Mode
description: You can put Octopus Server into maintenance mode so you can safely perform server maintenance or other administrative activities.
navOrder: 1
---

:::div{.hint}
Maintenance Mode is only available for self-hosted customers. [Octopus Cloud](/docs/octopus-cloud) instances will be updated in their specified [maintenance window](/docs/octopus-cloud/#set-the-outage-window).
Maintenance Mode is only available for self-hosted customers. [Octopus Cloud](/docs/octopus-cloud) instances will be updated in their specified [maintenance window](/docs/octopus-cloud/maintenance-window).
:::

From time to time you will need to perform certain administrative activities on your Octopus Server, like [upgrading Octopus](/docs/administration/upgrading/) or [applying operating system patches](/docs/administration/managing-infrastructure/applying-operating-system-upgrades). Typically you will want to schedule a maintenance window where you perform these activities, and Octopus Server helps with this by switching to **Maintenance Mode**.
Expand Down
7 changes: 4 additions & 3 deletions src/pages/docs/octopus-cloud/index.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: src/layouts/Default.astro
pubDate: 2023-01-01
modDate: 2023-01-01
modDate: 2024-04-05
title: Octopus Cloud
navTitle: Overview
navSection: Octopus Cloud
Expand Down Expand Up @@ -91,9 +91,10 @@ Users are only added to the Octopus Cloud instance after they sign in for the fi
1. Select **Add Member**
1. Click **Add** and then **Save**.

## Set the outage window \{#set-the-outage-window}
## Set the maintenance window \{#set-the-outage-window}

In order to keep your instance of Octopus Cloud updated and running the latest version, we will occasionally need to take it offline to update the software. You can let us know the best time for this to do this by setting the outage window.
To keep Octopus Cloud running smoothly, we must perform occasional [maintenance](/docs/octopus-cloud/maintenance-window), including updates and optimizations on your instance.
Please pick a two-hour maintenance window in the control center. Set a time outside of your normal business hours that is unlikely to include any scheduled deployments. The daily two-hour window provides some scheduling flexibility to ensure that all of our Cloud instances can be kept up-to-date and in good health by running a variety of maintenance tasks.

1. Log in to your Octopus account.
1. Select your cloud instance.
Expand Down
69 changes: 69 additions & 0 deletions src/pages/docs/octopus-cloud/maintenance-window.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
layout: src/layouts/Default.astro
pubDate: 2023-01-01
modDate: 2024-04-05
title: Octopus Cloud Maintenance Window
navOrder: 55
description: Details about the Octopus Cloud maintenance window
---

We are dedicated to keeping Octopus Cloud running smoothly and providing a reliable, scalable, and secure service. In order to do that, we must perform occasional maintenance, including updates and optimizations on your instance.
Most of these won't affect your instance's availability, but occasionally, we might need to take it offline briefly for tasks like software upgrades or infrastructure improvements.


:::div{.hint}
We don’t need to perform actions on your instance daily, and most of our maintenance actions won’t take your instance offline. At most, you might notice a performance impact. The steps that require an outage typically only take a short time to complete.
:::

At the time of publishing this (April 2024), our maintenance tasks that require downtime average 15 minutes per week.



## You’re in control of the schedule
You get to choose a two-hour time slot for maintenance activities. Pick a time outside your regular business hours to minimize potential impact.
You can adjust your maintenance window anytime, but make sure to do it before your current window begins to avoid interrupting ongoing maintenance tasks.


## View or change your maintenance window
Setting up your maintenance window to suit your business needs is easy. Just follow these steps:

1. Log in to your Octopus account.
2. Select your cloud instance.
3. Click **Configuration.**
4. Scroll down to the **Outage Window** section.
5. Select the time in UTC, providing a window of at least two hours and click **Save Outage window.**



## During a Maintenance Window

At the start of each window, an evaluation is performed to determine which maintenance operations need to be performed on each Octopus Cloud instance. There may be several operations that need to be performed in sequence on your instance during a single maintenance window.

Those tasks include (but are not limited to) the following:
- Database maintenance. This involves reindexing and compacting your Octopus Cloud instance database so that it can perform at its best.
- Performing any Octopus Server software upgrades.
- Moving your instance to new infrastructure. These operations don't happen as often, but are required when we roll out improvements to the underlying infrastructure.
- Processing any billing events, such as applying the latest license key to the instance or changing the task cap.

Most maintenance operations can be performed without taking the instance offline, such as database maintenance. Your instance may feel a little slower while any online maintenance operations are running. For tasks that cause an outage, typically only a subset of steps requiring the instance to be offline. For all the other steps, we keep the instance online.

Many of those tasks have guard clauses. For example, we won't de-fragment a database that has 10% fragmentation. In addition, we would only attempt to upgrade an instance if a new version exists.

It is important to note that most maintenance tasks do not start at the beginning of your maintenance window. We host thousands of customer instances. Because of that, we perform maintenance tasks in bulk. When we run a maintenance task, your instance might be the first, somewhere in the middle, or at the end of the list of instances. In some cases, by the time we finish processing other instances, your maintenance window is about to end. When that happens, your instance is skipped and that task won't be processed until the next day. That typically happens when performing upgrades.

:::div{.hint}
Upgrading an instance is the primary cause of outages. The most noticeable impact of an outage is deployments and runbook runs will fail. We are actively working on [Resilient Scalable Deployments](https://roadmap.octopus.com/c/95-alpha-program-resilient-scalable-deployments-in-octopus-cloud) to allow the deployments and runbook runs to continue post-upgrade.
:::

## Taking your instance offline
If we need to take your instance offline to perform any maintenance:
- Your instance will be given a few minutes to shut down cleanly. This will allow any in-progress tasks to complete. Any tasks still running at the end of the timeout will be abandoned.
- A maintenance page will be displayed to users and any requests to the API will return a 503 Service Unavailable status code.
- The maintenance operations will be performed.
- Your instance will start up again and we will check that it is in a healthy state.
- The maintenance page is removed and your instance is accessible again. Any tasks that were paused during shut down will be resumed, and any tasks that were scheduled to start during the outage will be started.


## How we communicate maintenance windows
- **Routine maintenance:** During a regular maintenance window, a maintenance page will be displayed to users, and any requests to the API will return a 503 Service Unavailable status code
- **Other maintenance:** There may be rare occasions outside of your normal maintenance window where we need to perform maintenance on your instance. Our Support team will contact you in these scenarios to coordinate the work.
4 changes: 2 additions & 2 deletions src/pages/docs/octopus-cloud/uptime-slo.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
---
layout: src/layouts/Default.astro
pubDate: 2023-01-01
modDate: 2023-01-01
modDate: 2024-04-05
title: Octopus Cloud Uptime SLO
navOrder: 50
description: The uptime SLO for Octopus Cloud instances
---

Each Octopus Cloud customer has their own instance of the Octopus Server and can use [dynamic workers](/docs/infrastructure/workers/dynamic-worker-pools). As the name implies, these workers are assigned to a cloud instance dynamically and are spun up and down as required by the Deployment or Runbook executed. The following uptime SLO (service level objective), therefore, refers to the customer's Cloud instance.

Each customer's instance may experience its own series of maintenance operations and reprovisioning for operational and upgrade reasons. Therefore the 95th percentile of monthly uptime is used as the basis for the Octopus Cloud uptime SLO. Operational downtime is, other than in exceptional circumstances, scheduled in the customer's [maintenance window](/docs/octopus-cloud/#set-the-outage-window). All downtime (unplanned and planned) is used in the determination of the uptime SLO.
Each customer's instance may experience its own series of maintenance operations and reprovisioning for operational and upgrade reasons. Therefore the 95th percentile of monthly uptime is used as the basis for the Octopus Cloud uptime SLO. Operational downtime is, other than in exceptional circumstances, scheduled in the customer's [maintenance window](/docs/octopus-cloud/maintenance-window). All downtime (unplanned and planned) is used in the determination of the uptime SLO.

## Uptime SLO
Monthly uptime SLO: 99.5%
Expand Down

0 comments on commit f0f0c33

Please sign in to comment.