Reading Docs Didn’t Prepare Me for Production Crashes
I didn’t learn cloud engineering by memorizing documentation. I learn it by fixing broken systems at 2 AM.
Welcome back to this new edition of What’s New Cloud Newsletter, where I share my insights on AWS cloud, AI/ML news, Terraform feature updates and DevOps Trends.
Before, I used to rely on tutorials and docs. They look impressive on a resume, but when incidents happen at work, theory fails me.
Here’s what tutorials and docs didn’t teach me
Direction without patterns when there are system failures
Scripts not tested in production-like environment
Building steps but no clear systems to debug issues
This is what I did to reclaim my expertise
Reviewed postmortems of failures:
I went back to any postmortems documentation I could find, both within the company and externally, to study them.
Reproduced issues in safe sandboxes:
Every time someone else fixed an issue, I recreated it in a sandbox environment. This helped me learn to troubleshoot it in case it happened again.
Documented fixes to prevent recurrence:
I actively contributed to documentation for each incident, ensuring the steps and lessons were clear for future reference.
This is what you can do if you’re learning to debug production issues
Build small test environments to simulate failure
Log all incidents and create knowledge artifacts
Pair debug sessions with teammates to cross-pollinate learning
My Take
If can confidently troubleshoot incidents, you become invaluable as an engineer. Training alone doesn’t prepare you for the chaos of production, experience does.
That’s it for this week. Thanks for reading.
If you found this useful, like, comment, share and subscribe.
Have a question or topic you’d like me to cover in a future issue? Hit reply, I’d love to hear from you.
Since then, stay ahead of the cloud curve. I share AWS news, AI/ML updates, Terraform automation tips, and the biggest DevOps trends, three times a week, all in one place.



Recreating issues in a sandbox after someone else fixes them is brillant. I never thought about doing that systematically. Most people just move on once an incidnet is resolved, but you're building muscle memory for when it happens again. How often do you actually end up using those sandbox reprodctions later?