diff --git a/docs/Platform-Strategy/service-deployment-strategy.md b/docs/Platform-Strategy/service-deployment-strategy.md new file mode 100644 index 0000000..ec874ba --- /dev/null +++ b/docs/Platform-Strategy/service-deployment-strategy.md @@ -0,0 +1,84 @@ +--- +title: Service Deployment Strategy +summary: +uri: https://defra.github.io/adp-documentation/Platform-Strategy/service-deployment-strategy/ +authors: + - Dan Rozkowski +date: 2024-04-04 +--- + +# Platform service Deployment Strategy + +## Guidance and Context + +This article outlines the Platform service deployment strategies available. Development teams should read the Platform Versioning and Git strategy document before reading this. ADP’s primary deployment strategy is **Rolling Deployments** on [AKS](https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/) with [HELM](https://atlassian.github.io/data-center-helm-charts/userguide/upgrades/HELM_CHART_UPGRADE/#3-define-the-upgrade-method) and [FluxCD](https://fluxcd.io/flagger/usage/deployment-strategies/). This provides Platform services with a zero-downtime deployment strategy. This allows applications to achieve high availability with low/no business impact to live service. This is important for services that need 24/7 availability and allows the capability to deploy to production multiple times a day. In the future, we will support other deployment strategies, such as Blue-Green and Canary deployments. + +## Deployment Strategies - ADP Rolling Updates + +ADP uses AKS (Kubernetes) with [HELM Charts](https://helm.sh/docs/topics/charts/) and Flux to perform rolling deployments. The default strategy applied to all services is rolling deployments, unless otherwise specified in the deployment YAML. We recommend starting with this strategy. This strategy allows for applications to be incrementally updated without downtime. +There are 3 core parts to a Service deployment/upgrade, which are done in the following order: + +1. App Configuration, including Secrets, +2. Service Infrastructure, +3. Database upgrade and Web Application + +The deployment process flow: + +1. A new deployment is triggered via the [CI & CD Pipelines](https://github.com/DEFRA/ado-pipeline-common?tab=readme-ov-file) for the Service: + 1. New app Secrets are imported/updated/deleted* in the Key Vault and are mastered in the Azure DevOps (ADO) Secret Library Groups for the service. + 2. New App Configuration keys and values are imported/updated/deleted in the Service Config Maps & App Configuration Service from the Service’s ‘appConfig.yaml’ files. Note: The sentinel key is not updated yet. +2. The new images and artefact are pushed to the environment Container Registry (ACR) (via pipeline deployment) and Flux updates the [Services repository](https://github.com/DEFRA/adp-flux-services) with the new version to be deployed: + 1. This can be a higher version (new image & release) or lower version (existing/rollback). + +3. Flux [reconciles](https://learn.microsoft.com/en-us/azure/azure-arc/kubernetes/tutorial-use-gitops-flux2?tabs=azure-cli#apply-a-flux-configuration) the Cluster with the new Web App code and Infrastructure versions requested with a rolling update. Any infrastructure updates take precedence over Application (Infra > App). + **Application deployment:** + + 1. The deployment will incrementally add new Pods (web applications) onto the Nodes in the Cluster. This will automatically pick up the new App Config/Secret updates on startup. + 2. AKS deployment will wait for those new Pods (apps) to start successfully with the configured/default (5m) wait times and health check endpoints. + 3. Once the new pods are started and reporting healthy via the endpoint(s), traffic will then be directed to the new Pods (updated app) via the internal load balancer/NGINX gracefully. + 4. The old Pods (previous version) will be deleted incrementally if the new Pods have started successfully, and all traffic has drained gracefully. + 5. If the new App/Pod does not start successfully, the deployment will time out and fail after a set period of health check retries (5m), but the previous app version (Pods) will remain in place and accepting traffic. The previous version’s App Config will remain as-is/unchanged on the none-upgraded Pods. + 1. Unhealthy Pods will be removed if an upgrade fails. + + **Infrastructure deployment:** + + 6. The new infrastructure will be deployed (created, updated, or deleted). This can be Queues, Topics, Datastores, Identities, etc. + 7. Once the infrastructure upgrade is successful, the App (and database If applicable) can be deployed/upgraded. + + **Database deployment:** + + 8. If a new DB Schema is to be deployed (migration required), this will be done before the Web Application is deployed. + 9. [Liquibase](https://docs.liquibase.com/start/tutorials/postgresql/postgresql-azure-database.html) will perform the PostgreSQL migration using a [Flux pre-deploy](https://fluxcd.io/flux/use-cases/running-jobs/) job. + 10. If database deployment/migration fails, the App will not be upgraded. + +4. If a user has requested the deployment of App Config/Secrets only via the [Flag in the build.yaml](https://github.com/DEFRA/ado-pipeline-common/blob/main/docs/AppBuildAndDeploy.md#buildyaml-for-nodejs-app), the App or Infra will not be deployed on this release: +a. The App Config & Secrets will be updated via the Pipeline, including the Sentinel Key with the Build ID – which triggers the configuration update. +b. The [Reloader](https://github.com/stakater/Reloader) service will perform a rolling and zero-downtime [upgrade](https://github.com/stakater/Reloader?tab=readme-ov-file#problem) (restart Pods) of the Service to consume the new App configuration (incremental Pod restarts). + +!!! note + All releases / deployments are promoted via the Common CI and CD Pipelines using Azure DevOps as the orchestrator. We promote continuous delivery with automated checks and tests above/in preference to manual intervention and approvals. Approval gates can be added optionally to Azure Pipelines to gate the promotion of code.  + +## Deployment and App Configuration Guidance / Context +All services will have the following settings defaulted (changeable if required): +- maxSurage – maximum additional Pods created at one time (50%). +- maxUnavailable – max Pods not available (25%) +- podDisruptionBudget – allowed disruptions for a Pod (application) (25% or at least 1) +- min and max replicas – number of replicas of the application in the Cluster. Minimum of 3 for production for high availability. +- All deployments of business apps are on the User/Apps Node Pools. Platform/System apps are on the System Node Pool. Taints/tolerations applied to that effect. +- Autoscaling via HPA is enabled. +- All services will have their own dedicated AKS Namespaces for their own team. + +Constraints +- Infrastructure is always deployed first if changed; database Schema migrations are second and App code is last (associated Config & Secrets consumed at that point) +- Database updates, if using PostgreSQL, will require development teams to deploy non-breaking changes and/or manage their schema updates appropriately with their app deployment to prevent downtime. + - Shutter pages will be included in phase 2 / Post MVP if required. +- Development teams must set health endpoints correctly for an effective rolling update. +- App Config, Infrastructure and Application Code are tied (versioned) together as an immutable unit. + - They are versioned using semver strategy defined in the versioning article. +- App Secrets in Key Vault/ADO library group are not versioned with the App/Code or Infra, they are fully independent, and can be rotated periodically. + - All secret rotations must have an overlap in expiry periods to ensure zero-downtime upgrades. Secrets should not be tied to versions as they are rotatable as good practice. +- The Platform has defined minimum replicas/availability to meet Defra SLA’s. +- The Platform **Reloader** Service will drain and replace the Pods in the Cluster with a rolling upgrade on detection of new App Config or New App Secrets automatically via the Sentinel Key update. +- All HELM Deployments are full CRUD operations – add, update, or delete. This includes Apps, Infra and Databases. Warning: You can delete your own infrastructure and configuration! +- All App Configuration updates are full CRUD operations – create, update, or delete. +- Secrets are add/update only for MVP. *Delete will be added post-MVP. diff --git a/docs/Platform-Strategy/service-versioning-strategy.md b/docs/Platform-Strategy/service-versioning-strategy.md index 4a6e09c..174e5e2 100644 --- a/docs/Platform-Strategy/service-versioning-strategy.md +++ b/docs/Platform-Strategy/service-versioning-strategy.md @@ -6,55 +6,82 @@ authors: - Dan Rozkowski date: 2024-03-14 --- + # Platform service versioning strategy. -This document outlines a two-phase versioning strategy for services on ADP with the goal to support ephemeral environments by phrase 2. In Phase 1, before ephemeral environments, feature branch builds fetch the version from the main branch's package.json. If the versions are the same, an error is thrown; if the feature branch version is higher, it's tagged with alpha and the build ID. The main branch version is pushed to ACR on deployment after merging into main, taking precedence over all feature (alpha) candidates of the same major/minor/patch. In Phase 2, once ephemeral environments are in place, the process remains the same for feature branches. For PR builds, if the package.json is not updated, a validation error is thrown; if it's updated, it's tagged with a release candidate (-RC) and the build ID. The main branch version still takes precedence over all feature (alpha & RC) candidates. The Build ID is unique and automatically increases on every CI on every image requested to be deployed. Users must increment Major, Minor, or Patch at least once on a feature branch build or PR build, and to merge into main for a successful release and validation. Once ephemeral environments are delivered, PR's and Feature releases will have their own dedicated infrastructure. +This article outlines a two-phase versioning strategy for services on ADP with the goal to support ephemeral environments by phase 2. + +**The following Git and Versioning strategies are in place and mandated:** + +- A [Sematic Versioning](https://semver.org/) (SemVer) strategy for all Platform and business services (app code and infrastructure) +- The [Trunk Based Development Git](https://trunkbaseddevelopment.com/) git strategy for application development (code and infrastructure). + +In Phase 1, before ephemeral environments, Feature branch builds fetch the version from the main branch’s **package.json** file for Node and the ***.csproj** file for C#. If the versions are the same, a validation error is thrown; if the feature branch version is higher, it's tagged with ‘**-alpha**’ and the pipeline build ID. When the main branch version is pushed to the ACR on deployment after merging into main, it will take precedence over all feature (alpha) candidates of the same major/minor/patch version. + +In Phase 2 with ephemeral environments, the process remains the same for Feature branches. For Pull Request (PR) builds, if the package.json/csproj is not updated, a validation error is thrown; if it is updated, the image/build is tagged with a release candidate (-RC) and the build ID. The main branch version takes precedence over all Feature (alpha & RC) candidates. With ephemeral environments, each feature deployment will deploy a unique pod (application & infrastructure). + +## Phase 1 Strategy – versioning logic (before ephemeral environments)¶ + +**Feature branch build and deployments** + +1. Retrieve the version from the Main branch package.json for the repository (e.g.: 4.2.30) + 1. if main and feature branches are the same version (M/M/P) then: + 1. throw validation error message: "The increment is invalid. Users must increase the package.json version.". Do not continue CI. + 2. if main and feature branch version are not same (i.e., a developer has increased Major, Minor or Patch) and Feature Branch > Main branch version, then: + 1. Tag the image and build with ‘-alpha’ and build ID which becomes: 4.2.31-alpha.511210 and respect the supplied major/minor/patch. +2. Push this version to Container Registry (ACR) when a deploy is requested. + +**Pull Request (PR) builds and deployments** + +No change for Phase 1, including tagging and naming. Developers merge (feature branch) version must be always above main. -## Phase 1 - requirements (before ephemeral environments) +  +**Main branch build and deployments** -**Feature branch build/deploys** +1. New version example is: 4.2.31 (patch+1). Tag release in GitHub. +2. This version will be pushed to the ACR on deployment after merge into main. +3. The main branch version is the primary version which takes precedence above all feature (alpha) candidates of the same major/minor/patch. -- Get version from Main branch package.json for the repo (e.g. 4.2.30) - - if main and feature branch are the same version (M/M/P) then: - - throw validation error message that "increment is invalid. User must increase the package.json version" - - if main and feature branch version are not same (i.e user has supplied it and increased Major, Minor or Patch) and Feature Branch > Main branch then: - - tag it with alpha and buildID which becomes 4.2.31-alpha.511210 and respect users major/minor/patch -- push this version to ACR when a deploy is requested. +## Phase 2 versioning logic – (with ephemeral environments are in place) -**PR builds/deploys** +**Feature branch builds and deployments** -No change for phase 1. Above requirements as is. User's merge (feature branch) version must be above main at all times. +1. Retrieve the version from Main branch package.json/csproj for the repository (e.g. 4.2.30) + 1. if main and feature branch are the same version (M/M/P) then: + 1. throw validation error message: "The increment is invalid. Developers must increase the package.json version.". Do not continue CI. + 2. if main and feature branch version are not same (i.e., a developer has increased Major, Minor or Patch) and Feature Branch > Main branch then: + 1. Tag the image and build with ‘-alpha’ and ‘build ID’ which becomes 4.2.31-alpha.511210 and respect users major/minor/patch. + 3. Push this version to ACR when a deploy is requested. -**Main branch** +**Pull Request (PR) - builds and deployments** -- 4.2.31 (example) -- push this version to ACR on deployment after merge into main -- the main branch version is the primary and priority version which takes precedence over and above all feature (alpha ) candidates of the same major/minor/patch. +1. If package.json/csproj is not updated in the repository then throw validation message: "The increment is invalid '4.2.30' -> '4.2.30'. Please upgrade". Do not continue CI. +2. If package.json/csproj is updated (i.e., 4.2.31) then tag the image and build with the release candidate (-RC) and build ID which becomes: **4.2.31-rc.511211** +3. Push this version to the Container Registry (ACR) when a deploy is requested. -## Phase 2 - requirements once ephemeral environments are in place! +**Main branch – build and deployments** -**Feature branch build/deploys** +1. New version example is: 4.2. 31 (patch+1). Tag release in GitHub. +2. This version will be pushed to the Container Registry (ACR) on deployment after merge into main. +3. The main branch version is the primary version which takes precedence over and above all feature (alpha & RC) candidates of the same major/minor/patch. -- Get version from Main branch package.json for the repo (e.g. 4.2.30) - - if main and feature branch are the same version (M/M/P) then: - - throw validation error message that "increment is invalid. User must increase the package.json version" - - if main and feature branch version are not same (i.e user has supplied it and increased Major, Minor or Patch) and Feature Branch > Main branch then: - - tag it with alpha and buildID which becomes 4.2.31-alpha.511210 and respect users major/minor/patch -- push this version to ACR +## Guidance / Context -**PR builds/deploys** +- The Build ID is unique and is the ADO Pipeline build ID. It automatically increases on every CI on every image you request to be deployed (feature deployment). +- Developers must increment Major, Minor or Patch at least once, on a Feature branch build or PR build, to merge into main successfully. The build ID is automatically increased for subsequent deployments of the same version. +- The Main version takes priority over Alpha and RC candidates of the same major/minor/patch version. -- if package.json is not updated in team repo then throw validation message: "increment Invalid '4.2.30' -> '4.2.30'. Please upgrade" (as above requirement error too) -- if package.json is updated (4.2.31) then tag it with release candidate (-RC) and buildID which becomes 4.2.31-rc.511211 -- push this version to ACR on deployment -**Main branch** +## Constraints -- 4.2.31 -- push this version to ACR on deployment after merge into main - - the main branch version is the primary and priority version which takes precedence over and above all feature (alpha & RC) candidates of the same major/minor/patch. +- Feature deployments into Sandpit/Dev will overwrite the existing deployment in terms of app code, infrastructure, and databases in Phase 1. This can cause conflicts and constraints. +- Once ephemeral environments are delivered, PR and Feature deployments into Sandpit/Dev will have its own dedicated infrastructure, including Application, Infra and Databases. +- SemVer and Trunk based development are mandated and designed into the Platform. +- All merges into ‘main’ are classed as releases and are tagged in GitHub as such with the application version supplied. +- Long-lived feature branches are not allowed and are discouraged. To deploy into a higher environment above Sandpit, you must merge into Main. -**Guidance/context** +# Platform service Deployment Strategy +## Guidance and Context -The Build ID is unique and is the ADO Pipeline build ID. It automatically increases on every CI on every image you request to be deployed (feature deployment). Users must increment Major, Minor or Patch at least once, on a Feature branch build or PR build, and to merge into main for a successful release and validation. Main takes priority over Alpha and RC candidates of the same major/minor/patch. Once ephemeral environments are delivered, PR's and Feature releases will have it's own dedicated infrastructure. In the interim, they will overwrite the "main" version. +This article outlines the Platform deployment strategy. Development teams should read the Platform Versioning and Git strategy document before reading this. ADP’s primary deployment strategy is **Rolling Deployments** with [HELM](https://atlassian.github.io/data-center-helm-charts/userguide/upgrades/HELM_CHART_UPGRADE/#3-define-the-upgrade-method), [AKS](https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/) and [FluxCD](https://fluxcd.io/flagger/usage/deployment-strategies/). This provides Platform services with a zero-downtime deployment strategy. This allows applications to achieve high availability with low/no business impact. This is important for services that need 24/7 availability and allows the capability to deploy to production multiple times a day. In the future, we will support other deployment strategies, including Blue-Green and Canary deployments for custom scenarios, including service ‘shutter pages’. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index c1023a9..8e02310 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -89,6 +89,7 @@ nav: - ADP Platform Strategy: Platform-Strategy/adp-platform-strategy.md - Documentation Approach: Platform-Strategy/documentation-approach.md - Service Versioning Strategy: Platform-Strategy/service-versioning-strategy.md + - Service Deployment Strategy: Platform-Strategy/service-deployment-strategy.md # ------------------- Platform Architecture ------------------- - Platform Architecture: - Architecture Overview: Platform-Architecture/architecture-overview.md