Foundational Skillset for Cloud Engineers

Jai Chenchlani
5 min readDec 16, 2023

Today, Cloud is a revolutionary technology that has transformed how we access and utilize resources. No longer chained to clunky on-premise servers, we can now tap into an on-demand pool of processing power, storage, and software, all seamlessly delivered through the internet. From humble beginnings as a niche concept, the cloud has soared to become the backbone of the digital age, powering everything from your ride to airport to the next scientific breakthrough. The cloud’s potential for flexibility, scalability, and innovation is limitless.

If you aspire to become technically proficient in cloud, you will need to build your foundational skillset. With my 2 decades of experience, I’ve consolidated an exhaustive list of both conceptual and technology skills, that you must know to be a smart and efficient platform, cloud, sre, devops engineer/architect.

Follow the links below for training content on the technology skillsets called out below.

https://everythingcloudplatform.com

https://www.youtube.com/channel/UCJlOrn-AEJo613ZCEh9xPrg/

Concepts

Automation — Understand the importance of automation

DevOps — DevOps is a philosophy, practice, and set of tools that aims to bridge the gap between software development (Dev) and IT operations (Ops).

SRE — SRE, or Site Reliability Engineering, is a practice and philosophy that blends software engineering principles with IT operations to ensure the high reliability, scalability, and performance of software systems.

CICD — CICD, often written as CI/CD, stands for Continuous Integration and Continuous Delivery (or Deployment). It’s a set of practices and tools that automate the software development and release process, making it faster, more reliable, and more efficient.

IaC — IAC, or Infrastructure as Code, is a practice of managing and provisioning IT infrastructure using code, rather than manual configuration.

FinOps — FinOps, short for financial operations, is a management practice that promotes shared responsibility for an organization’s cloud computing infrastructure and costs.

Networking basics — Understand IP addresses, CIDR ranges and Subnetting concepts.

Load Balancer — A load balancer is a network device or software that acts like a traffic cop for your website or application. It sits between your users and your servers, distributing incoming traffic evenly across all available servers in a pool.

Basic | Must Haves

GCP/AWS/Azure — Build your expertise in atleast 1 of the top 3 cloud providers.

Python — Python is a popular, high-level, general-purpose programming language. It’s easy to lear, and comes with a rich library eco system. It’s widely used as a scripting language too.

Linux — Linux is a family of open-source operating systems based on the Linux kernel. It’s one of the most prominent examples of free and open-source software (FOSS), meaning its source code is freely available for anyone to modify and distribute. This has led to a diverse ecosystem of

Linux distributions like CentOS, Debian etc, each tailored for specific needs and preferences.

Linux Package Managers — Each of the linux distributions supports either brew, yum, apt-get or dnf package managers.

vi — vi, often referred to as vim (derived from “vi improved”), is a powerful and modal text editor primarily used in Unix-based systems like Linux and macOS.

Bash — CLI cli(gcloud/aws) — Bash is a powerful command-line interpreter or shell commonly found in Linux and macOS operating systems. It allows you to interact with your computer directly by entering commands to execute various tasks.

webservers — Web servers are the unsung heroes of the internet, silently working behind the scenes to deliver the content you see on your screen. They are essentially software and hardware systems that receive requests from your web browser (client) and send back the requested resources (web pages, images, videos, etc.). Nginx and Apache are 2 most popular webservers.

Terraform — Terraform is an infrastructure as code (IaC) tool developed by HashiCorp. It allows you to define and manage your infrastructure (servers, networks, storage, etc.) using declarative configuration files, making it repeatable, versionable, and portable.

Git — Git is a distributed version control system (DVCS) that helps you track changes to files, primarily used for software development.

Argo — “Argo” isn’t a singular term, but rather a family of open-source projects from the Cloud Native Computing Foundation (CNCF) focused on simplifying and automating various aspects of software development and deployment.

SQL — Structured Query Language is a powerful language used to communicate with databases and manage the data stored within them. It allows you to retrieve, create, update, and delete data in an organized and efficient way.

AI/ML — AI/ML stands for Artificial Intelligence and Machine Learning. It’s a broad field encompassing various techniques and technologies that enable computers to learn and perform tasks that typically require human intelligence.

Containerization / Kubernetes

Docker — Docker is a popular platform for developing, deploying, and running applications in containers. Think of it as a way to package up your application with all its dependencies (code, runtime, system tools, settings) into a standardized unit called a container, making it portable and self-contained. This allows you to run your application consistently and seamlessly across different environments, from your laptop to the cloud.

Kubernetes — Kubernetes, often abbreviated as K8s, is an open-source platform for automating the deployment, scaling, and management of containerized applications. It groups containers into logical units called pods, and then manages these pods across a cluster of servers.

kubectl — kubectl is a command-line tool used to interact with Kubernetes clusters. It allows you to manage your containerized applications deployed on Kubernetes.

KCC — Open-source project from Google that allows you to manage Google Cloud Platform (GCP) resources like Cloud Pub/Sub topics, Cloud Storage buckets, and Cloud Spanner instances directly from your Kubernetes cluster.

Helm — Helm is a powerful package manager for Kubernetes. It simplifies the process of deploying, managing, and updating containerized applications on your Kubernetes cluster.

Kustomize — Kustomize is a Kubernetes configuration transformation tool that enables you to customize untemplated YAML files, leaving the original files untouched. Kustomize can also generate resources such as ConfigMaps and Secrets from other representations.

Istio — Istio is a powerful open-source service mesh that helps you manage the network communication between microservices in a distributed application. It sits as a dedicated layer on top of your existing infrastructure, providing an array of features that enhance the resilience, reliability, and security of your microservices architecture.

Nice to have

Packer — Packer is a powerful open-source tool from HashiCorp used to build and automate the creation of identical machine images for multiple platforms.

Least Privileges — The least privilege principle is a fundamental security concept that states each entity (user, program, process) should be granted the minimum level of access (permissions) necessary to perform its intended function.

Observability — Observability is a crucial concept in software development and system administration, essentially offering a comprehensive view of what’s happening inside your system.

logs — If you are an SRE, you ought to love logs.

Golden Signals — Golden Signals refer to a set of four key metrics — Traffic, Errors, Latency and Saturation, used to monitor the health and performance of a service or system from the user’s perspective.

SLI — Service Level Indicators. Measurable, quantitative metrics that reflect the health and performance of a service from the user’s perspective. Examples include availability, latency, throughput.

SLO — Service Level Objectives. Targets or commitments for the performance of an SLI over a specific period. They define the acceptable levels of service users expect.

Error Budget — A quota of acceptable “errors” or service delivery failures within an SLO timeframe.

Grafana — Grafana is an open-source monitoring tool for querying, visualizing, and alerting on your metrics, logs, and traces. Other popular industry recognized tools are Dynatrace, Splunk and Datadog.

Prometheus — Prometheus is an open-source monitoring and alerting platform, widely adopted by many companies as a Kubernetes monitoring tool. This is popular because it is adopted as a managed service within Google Cloud.

--

--

Jai Chenchlani

Profession: Cloud Architect; Passion: Technology; Interests: Stand-up Comedy, Politics, Music.