Book Feedback #1
Replies: 25 comments 20 replies
-
This reads really well, I skimmed the Ten Pillars content as I've already read that piece :) Some nitpicks: As the very first sentence in the book, this threw me. It's a bit of a garden-path sentence, my brain was expecting a second half, i.e.
Whereas I think you intend it more in the 'global we' sense:
Consider altering the sentence a bit? It's not clear if this is a direct quote or a paraphrase or something else? There's a few other italic sections later on that have a similar vibe. Tenets? Gold. However the image refers to 'projects' not 'artifacts'? |
Beta Was this translation helpful? Give feedback.
-
That is great feedback @cailyoung. I'll push some changes to address these issues. |
Beta Was this translation helpful? Give feedback.
-
Recently, I've been noodling on "The Value of Platform Engineering." I wanted to share my stream of thought to see if it had any value to add to the book. During an internship in the early 2000s, I spent an overnight mapping network ports. I'd plug a device into a network port in the office, and the senior person would use a similar device to find the connection on the patch panel. No one else in the office would do that necessary but unglamorous task. From that point through the mid-2010s, I frequently interacted with the operations team. They had different job titles: DBA, web admin, network engineer, etc. Some had certifications such as CCNP or MCSA. They learned all the low-level stuff, so I didn't have to. For web applications and custom services running on a Linux VM, the responsibility matrix was typically this: The relationship between Developers and Operations varied from company to company. Some were good, some were terrible. Most were in a state of MAD (Mutually Assured Destruction) because each group had a "throw it over the wall" mentality. "DBA, Here are the migration scripts I just wrote; please run them in Production." "Developer, your application crashed last night for the fifth time; fix it now." DevOps helped break down that wall. But it didn't eliminate the responsibilities of each group. Tools such as Chef and Ansible were adopted because they help automate some of Operations' responsibilities. However, several of Operations responsibilities cannot be automated. One of the challenges of a self-managed data center is the physical hardware management. Someone has to install and configure that SAN. Someone has to unbox the servers, place them into racks, and configure them. Cloud Providers and IaaS helped remove the physical hardware management responsibility from Operations. With that came new tools and terminology to learn. Phrases such as EC2, vNet, Front Door, VPC, Route 53, and S3 became the norm. Cloud providers created CLIs, script modules, and IaC functionality to speed up infrastructure creation (and tear down). These were all well documented. With some effort, they were easy to pick up. The line between operations and developers started to blur. Tasks normally performed by traditional operations teams could be done by developers. Development teams could self-manage. Some companies didn't even hire operations people! IIRC, we hired our first cloud architect / OPS person when we decided to build Octopus Cloud. Up to that point, people wore multiple hats. Everyone was an admin in our Azure and AWS account. But that introduced risk. Cloud providers make it "easy enough" that most developers feel comfortable performing routine tasks. It's also easy to misconfigure an item. When I worked at Election Systems and Software, someone misconfigured an S3 bucket. That led to the City of Chicago's voter information being leaked in 2017. (Clarification: it was the voter's PII data so they could sign in at the polling location to vote. It wasn't who they voted for). With managed services, Azure Web Apps, Azure SQL, Azure File Storage, AKS, ECS, ACS, S3, Lambdas, EKS, and RDS remove even more "traditional" responsibilities from Operations. What fills that place is a new knowledge gap. For example, deploying a container to K8s requires knowledge of the following topics:
Learning everything there is to know about Kubernetes is not an easy task. You end up with a knowledge gap. The software delivery pipelines need to evolve. A lot remains the same with a few tweaks. Building a .NET application and testing it fundamentally remains the same regardless of the application host. There are some tweaks; the build artifact might be a container instead of a .zip or .nupkg file. How that build artifact is deployed has completely changed. Deploying an application to Windows or Linux was like teaching a five-year-old how to make a peanut butter and jelly sandwich. You had to give explicit instructions or end up with a slice of bread with peanut butter on both sides. Deploying to platforms such as K8s is like telling your parents to make you a peanut butter and jelly sandwich. They know how to do it, but you have to tell them wheat or white bread, the brand of peanut butter, and the kind of jelly to get it how you exactly want it. Standardization is required, as that leads to "turn-key" solutions. Do you want to deploy your migration scripts SQL Server? Okay, you'll use Flyway with Octopus Deploy. Here is the pipeline and documentation on how to get started. In my mind, Platform Engineering is Operations vNext. Gone are the days of operations being responsible for physically managing hardware. In its place are tools and platforms such as Terraform, Kubernetes, and PaaS. Platform engineers are responsible for:
Asking, "Why do we need a platform engineering team?" Or, "What is the value of platform engineering?" is a lot like asking, "Why do I need an operations team?" 15 years ago. I don't know if that helps, but I wanted to share where my head is at on this topic. |
Beta Was this translation helpful? Give feedback.
-
There were a few "pull quotes" that stood out for me in here. Things like the "coincidentally similar solution" -> which is somehow far worse than the problem of novel solutions as teams can appear to be standardizing when they aren't. At least with novel solutions the problem is obvious :) Here are some suggestions for areas to potentially cover. IntroductionWe could highlight the underlying issue that inspires platform engineering - I think we need to mention/explain cognitive overload as it's the top justification for this approach. We could optionally delve in to Mihaly's Psychology of flow if we want to go deeper. Highlighting the reasons for platform engineering without talking about specific technical things is a good way to introduce the subject and gives teams something to share with their leaders. What is Platform Engineering?Maybe we should cover the evolution of a platform, from the minimum viable platform (a wiki page) through to self-service operations available through a UI and API (ideally with the wiki extended to explain all the cool new things). Documentation and TrainingThe State of DevOps report has given us some evidence that documentation enhances all your technical capabilities (by increasing their impact on wellbeing and organizational performance) - the 2023 report might give you some extra pow for the importance of documentation. To add to the "any product that needs a manual..." quote. One key use of documenation is that the analytics will tell you which documents are being viewed the most, which is a signal that the area might not be as user friendly as you'd like. Marketing
I'd add that you also need visibility of the versions out in the wild, so you can see the extent to which people upgrade. This is like a mini adoption signal - if most people don't upgrade to the newer version, maybe the features you're adding aren't the right ones. Planning your Internal Developer PlatformIDP without a dedicated platform team... for smaller organizations, there can be benefits building an internal developer platform even though the scale isn't yet there for a Platform Engineering team. This should be done through a community of practice (not a spare-time platform team). This prevents that awkward fragmentation problem you often get as you move from 1 team to 5 teams (and they all choose a different build server, deployment toole, cloud provider, etc). The community of practice that spans the teams to create a shared IDP might later seed the Platform Engineering team as the organization scales. The benefit here is the first task of the new Platform Engineering team of the future isn't to face down that gnarly fragmentation. Platform Engineering Responsibility ModelsJust before the three sections for customer responsibility, shared responsibility, and centralized responsibility it would be good to list the three kinds to prepare readers for these concepts. We could also describe a potential solution to the artifact problem, which is to use a DSL customers use, which the platform turns into an artifact. This minimizes customer change as you can keep backwards compatibility with "v1" of the DSL by using sensible defaults for each later extension. Customers only need to change the DSL file if they want to use a new feature. An example of this would be that instead of giving the customer a generated HCL file, you'd use a simpler format that generates the HCL file as part of the build. Some organizations have ridiculously simple DSL files, like: Language:
Name: Python
Size: Medium And they generate everything based on this... but customers can add further entries to customize it: Language:
Name: Python
Version: 3.8.3
Size: Medium If a team doesn't like this abstraction, it's likely because this isn't the problem they need the IDP to solve. I hope some of this is useful. |
Beta Was this translation helpful? Give feedback.
-
One of the ways I have been thinking about discussing the idea of flow is to highlight that DevOps was never about removing specialties, just removing the silos between specialties. This means the DevOps lifecycle is not a list of tasks that everyone is supposed to focus on with equal priority. If we were to draw a heat map of priorities for different specialties, developers might look like this: Ops might look like this: Product managers might look like this: But one of the ways DevOps has been misunderstood is to assume everyone does everything, and you roll the dice when you show up to work each morning to see which section of the DevOps lifecycle you'll be focusing on today. This means people generally deliver the fastest solution to solve whatever fire they needed to put out that day, which in turn breaks flow because you never really immerse yourself in anything other than firefighting. |
Beta Was this translation helpful? Give feedback.
-
This is a good addition to the feedback section. I'll also add the MONK metrics noted at https://octopus.com/devops/metrics/monk-metrics/. |
Beta Was this translation helpful? Give feedback.
-
This is a good point, and ties into the evolution of a platform team from an internal wiki to a dedicated platform. This might be worth it's own section including the point about spare time platform teams. |
Beta Was this translation helpful? Give feedback.
-
I'm wary of custom DSLs as IDP artifacts, mostly because the quality of a DSL is heavily dependent on the ability of the platform team to successfully distill a problem down into a simple abstraction. Platform teams may have the required knowledge after a few years of supporting their customers, but I'd be critical of any new platform team that claimed to know the correct level of abstraction at the start of their journey. As an example, we've talked to teams that have build Kubernetes CRDs to define their deployments, but now have to dedicate two or more engineers to supporting the CRDs. I'd argue that a better approach would have been to provide helm charts instead of CRDs. At the end of the day the set of values exposed by a Helm chart is basically a DSL. But if customers need to customize the centralised solution beyond what they can define in the helm chart values, they can fork the entire helm chart and implement whatever features they need. Most template solutions these days have the idea of modules or reusable components, and the all have the idea of supplying variables. On the flip side, custom DSLs almost always imply that there is a second custom component that reads the DSL and does useful work with it. The DSL backend is either an opaque process, or complex enough that contributing to it is not a practical option for the customers. |
Beta Was this translation helpful? Give feedback.
-
@steve-fenton-octopus This has been useful though, thanks for the feedback! |
Beta Was this translation helpful? Give feedback.
-
@steve-fenton-octopus Thinking about the feedback section some more, I'm leaning towards breaking it down into three areas inspired by the MONK metrics and DevEx metrics:
This way we can merge Developer experience metrics and Internal platform metrics into a single concept for platform teams. |
Beta Was this translation helpful? Give feedback.
-
I finally got the chance to give the book a good perusal; some conscious stream of thought commentary incoming... |
Beta Was this translation helpful? Give feedback.
-
Suggestion: make this 3 bullets—e.g.:
So the following "And these are just three of the more popular perspectives on DevOps." is clear. |
Beta Was this translation helpful? Give feedback.
-
:chef-kiss: how this connects DevOps success with Developer Experience satisfaction.
This reminded me of the Toyota Andon cable. I don't know if there's value in referencing it here, but sharing it because this line made me think of it. |
Beta Was this translation helpful? Give feedback.
-
Suggestion: provide some opinions on the relationship between IDPs, platform engineering, and DevEx—e.g.:
|
Beta Was this translation helpful? Give feedback.
-
Should we use a different term than "artifact" here—to distinguish it from software artifacts/packages?
Does "platform team" equal "platform engineering team" here? |
Beta Was this translation helpful? Give feedback.
-
This reminded me of the Project management triangle.
And the notion "Good, fast, cheap. Choose two." I don't know if there's value in referencing it here, but sharing it because this line made me think of it. |
Beta Was this translation helpful? Give feedback.
-
This did not read right to me. Suggestion—either:
I feel like there's a parallel here with the adoption of deployment patterns evolving with the maturity of the team—e.g.:
Is similar to:
|
Beta Was this translation helpful? Give feedback.
-
This made me think of the parallels between developing the product and delivering the product—for physical things: Developing the product
Delivering the product
|
Beta Was this translation helpful? Give feedback.
-
I think the reference to Octopus here is unintentional, correct? (Perhaps because this came from the blog) |
Beta Was this translation helpful? Give feedback.
-
For the table of contents and section headers, I noticed you could annotate them like this:
To be clear, I'm not necessarily suggesting they need to be annotated as such; I just appreciated how nicely the sections fit the what-why-how-who paradigm. |
Beta Was this translation helpful? Give feedback.
-
Feedback after reading the latest copy of the book. Overall, the book has the bones of a good book. The current layout of the book is:
Reading through the book, the ordering feels off. I think about the DevOps Handbook and how they broke it down into three "ways."
Each section built upon the other. I think this book could benefit from a similar approach. For example:
Maybe to flesh out the "IDP Section" by also analyzing the state of the current tools. We don't have to mention specific tools by name, as they will become woefully out of date soon. But the pros and cons of a leveraging open source tooling vs. commerical tooling in this space. What are some of the common pitfalls one could encounter when implementing an IDP today? How many custom scripts would they be expected to write? Misc FeedbackFor the purposes of this book, what is the definition of a DevOps team? When I first started reading the book, I was thinking of a DevOps engineer, someone who is responsible for ensuring tools like Octopus Deploy is always running and is on the latest version. Similar to a Matt Richardson or a Cail Young at Octopus. The more I read the book, it felt like DevOps teams meant engineering teams, the folks responsible for adding new features and functionality to internally developed applications and pushing them to Production. Is it possible to clear that up? I tripped over the "what is DevOps not?" section a lot in the chapter |
Beta Was this translation helpful? Give feedback.
-
This is the million dollar question. The point of the chapter "What is DevOps not?" is to call out that DevOps engineers and DevOps teams no longer have a meaningful external definition other than to say that they should implement the capabilities and practices that have been shown to improve their performance. What does any given individual in a DevOps team do? 🤷 What does a well structured DevOps team look like? 🤷 Are there clear capabilities teams must implement before they can call themselves a DevOps team? 🤷 That sense of frustration and ambiguity you have highlighted is exactly the point, and the best way to articulate it is to attempt to define DevOps through counterfactual arguments. To quote the book:
Perhaps the only definition of a DevOps team we can give is: Two or more people working together to deliver a technical solution. |
Beta Was this translation helpful? Give feedback.
-
This is fair. The chapters can be reorganised to tell a better story, and the order you suggested makes a lot of sense. Ideally I want the checklists at the end of each chapter to be something we can discuss on a whiteboard for teams looking to implement platform engineering,
I'm reluctant for this book to be tied to any specific tools. It would end up being superficial and quickly obsolete, and I doubt we have the capacity to give such an overview the attention it deserves. I will add a section on what an IDP is though. We can talk about the capabilities it must provide rather than talk about specific implementations that exist today. |
Beta Was this translation helpful? Give feedback.
-
That makes a lot of sense. In some of the K8s books I've been reading, they tended to shy away from getting into the nitty gritty of various tools. Their focus was more on decision criteria. For example, should you host your own k8s vs use a managed k8s provider like GKE, EKS, or AKS? What are the pros and cons with rolling your own vs. managed providers? And if you do roll your own, what architecture decisions must you consider? |
Beta Was this translation helpful? Give feedback.
-
What about something along the lines of: "DevOps has reached the equal to Agile, Scrum, Kanban, and TDD. To be a successful software organization, you need to leverage DevOps practices. But as we witnessed with those other key initiatives, how they are implemented varies with each company. Some companies opt for dedicated engineers embedded within feature teams. Others opt for having senior engineers from each feature team work part time on DevOps goals. While others have a team of dedicated 'DevOps engineers' supporting multiple teams. Each approach has its pros and cons, and is successful as long as DevOps practices are followed. In this book, DevOps teams is a umbrella term we use for all those approaches." |
Beta Was this translation helpful? Give feedback.
-
Please add any comments about the book to this discussion.
Beta Was this translation helpful? Give feedback.
All reactions