AWS Cloud Provider

Important:

In Kubernetes 1.27 and later, you must use an out-of-tree AWS cloud provider. In-tree cloud providers have been deprecated, and the Amazon cloud provider has been removed completely, and won't work after an upgrade to Kubernetes 1.27. The steps listed below are still required to set up an Amazon cloud provider. You can set up an out-of-tree cloud provider for RKE after creating an appropriate IAM role and configuring the ClusterID.

You can also migrate from an in-tree to an out-of-tree AWS cloud provider on Kubernetes 1.26 and earlier. All existing clusters must migrate prior to upgrading to v1.27 in order to stay functional.

To enable the AWS cloud provider, there are no RKE configuration options. You only need to set the name as aws. In order to use the AWS cloud provider, all cluster nodes must have already been configured with an appropriate IAM role and your AWS resources must be tagged with a cluster ID.

cloud_provider:
    name: aws

IAM Requirements

In a cluster with the AWS cloud provider enabled, nodes must have at least the ec2:Describe* action.

In order to use Elastic Load Balancers (ELBs) and EBS volumes with Kubernetes, the node(s) will need to have the an IAM role with appropriate permissions.

IAM policy for nodes with the controlplane role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeTags",
        "ec2:DescribeInstances",
        "ec2:DescribeRegions",
        "ec2:DescribeRouteTables",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeVolumes",
        "ec2:CreateSecurityGroup",
        "ec2:CreateTags",
        "ec2:CreateVolume",
        "ec2:ModifyInstanceAttribute",
        "ec2:ModifyVolume",
        "ec2:AttachVolume",
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:CreateRoute",
        "ec2:DeleteRoute",
        "ec2:DeleteSecurityGroup",
        "ec2:DeleteVolume",
        "ec2:DetachVolume",
        "ec2:RevokeSecurityGroupIngress",
        "ec2:DescribeVpcs",
        "elasticloadbalancing:AddTags",
        "elasticloadbalancing:AttachLoadBalancerToSubnets",
        "elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
        "elasticloadbalancing:CreateLoadBalancer",
        "elasticloadbalancing:CreateLoadBalancerPolicy",
        "elasticloadbalancing:CreateLoadBalancerListeners",
        "elasticloadbalancing:ConfigureHealthCheck",
        "elasticloadbalancing:DeleteLoadBalancer",
        "elasticloadbalancing:DeleteLoadBalancerListeners",
        "elasticloadbalancing:DescribeLoadBalancers",
        "elasticloadbalancing:DescribeLoadBalancerAttributes",
        "elasticloadbalancing:DetachLoadBalancerFromSubnets",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:ModifyLoadBalancerAttributes",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "elasticloadbalancing:SetLoadBalancerPoliciesForBackendServer",
        "elasticloadbalancing:AddTags",
        "elasticloadbalancing:CreateListener",
        "elasticloadbalancing:CreateTargetGroup",
        "elasticloadbalancing:DeleteListener",
        "elasticloadbalancing:DeleteTargetGroup",
        "elasticloadbalancing:DescribeListeners",
        "elasticloadbalancing:DescribeLoadBalancerPolicies",
        "elasticloadbalancing:DescribeTargetGroups",
        "elasticloadbalancing:DescribeTargetHealth",
        "elasticloadbalancing:ModifyListener",
        "elasticloadbalancing:ModifyTargetGroup",
        "elasticloadbalancing:RegisterTargets",
        "elasticloadbalancing:SetLoadBalancerPoliciesOfListener",
        "iam:CreateServiceLinkedRole",
        "kms:DescribeKey"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

IAM policy for nodes with the etcd or worker role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeRegions",
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetRepositoryPolicy",
        "ecr:DescribeRepositories",
        "ecr:ListImages",
        "ecr:BatchGetImage"
      ],
      "Resource": "*"
    }
  ]
}

Tagging AWS Resources

The AWS cloud provider uses tagging to discover and manage resources, the following resources are not automatically tagged by Kubernetes or RKE:

VPC: The VPC used by the cluster
Subnet: The subnets used by the cluster
EC2 instances: All nodes launched for the cluster
Security Groups: The security group(s) used by nodes in the cluster

Note: If creating a LoadBalancer service and there is more than one security group attached to nodes, you must tag only one of the security groups as owned so that Kubernetes knows which group to add and remove rules. A single untagged security group is allowed, however, sharing this between clusters is not recommended.

AWS Documentation: Tagging Your Amazon EC2 Resources

You must tag with one of the following:

Key	Value
kubernetes.io/cluster/`<CLUSTERID>`	shared

<CLUSTERID> can be any string you choose. However, the same string must be used on every resource you tag. Setting the tag value to owned informs the cluster that all resources tagged with the <CLUSTERID> are owned and managed by this cluster only.

If you do not share resources between clusters, you can change the tag to:

Key	Value
kubernetes.io/cluster/`<CLUSTERID>`	owned

Tagging for Load Balancers

When provisioning a LoadBalancer service Kubernetes will attempt to discover the correct subnets, this is also achieved by tags and requires adding additional subnet tags to ensure internet-facing and internal ELBs are created in the correct subnets.

AWS Documentation: Subnet tagging for load balancers

Using the Out-of-Tree AWS Cloud Provider for RKE

Node name conventions and other prerequisites must be followed so that the cloud provider can find the instance. RKE provisioned clusters don't support configuring providerID.

note

If you use IP-based naming, the nodes must be named after the instance followed by the regional domain name (ip-xxx-xxx-xxx-xxx.ec2.<region>.internal). If you have a custom domain name set in the DHCP options, you must set --hostname-override on kube-proxy and kubelet to match this naming convention.

Select the cloud provider.

Selecting external-aws sets --cloud-provider=external and allows setting use_instance_metadata_hostname. Enabling use_instance_metadata_hostname will query the EC2 metadata service and set http://169.254.169.254/latest/meta-data/hostname as hostname-override for kubelet and kube-proxy.

Enabling use_instance_metadata_hostname is required if hostname-override is empty or if hostname-override doesn't meet the node naming conventions mentioned above in step 1.

cloud_provider:
  name: external-aws
  use_instance_metadata_hostname: true/false

Existing clusters that use external cloud provider will set --cloud-provider=external for Kubernetes components but won't set the hostname-override by querying the EC2 metadata service.

Install the AWS cloud controller manager after the cluster finishes provisioning. Note that the cluster isn't successfully provisioned and nodes are still in an uninitialized state until you deploy the cloud controller manager.

Helm Chart Installation from CLI

Official upstream docs for Helm chart installation can be found on GitHub.

Add the Helm repository:

helm repo add aws-cloud-controller-manager https://kubernetes.github.io/cloud-provider-aws
helm repo update

Create a values.yaml file with the following contents, to override the default values.yaml:

# values.yaml
hostNetworking: true
tolerations:
  - effect: NoSchedule
    key: node.cloudprovider.kubernetes.io/uninitialized
    value: 'true'
  - effect: NoSchedule
    value: 'true'
    key: node-role.kubernetes.io/controlplane
nodeSelector:
  node-role.kubernetes.io/controlplane: 'true'
args:
  - --configure-cloud-routes=false
  - --use-service-account-credentials=true
  - --v=2
  - --cloud-provider=aws
clusterRoleRules:
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - '*'
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
  - apiGroups:
      - ""
    resources:
      - services
    verbs:
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - ""
    resources:
      - services/status
    verbs:
      - list
      - patch
      - update
      - watch
  - apiGroups:
     - ''
    resources:
      - serviceaccounts
    verbs:
    - create
    - get
  - apiGroups:
      - ""
    resources:
      - persistentvolumes
    verbs:
      - get
      - list
      - update
      - watch
  - apiGroups:
      - ""
    resources:
      - endpoints
    verbs:
      - create
      - get
      - list
      - watch
      - update
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs:
      - create
      - get
      - list
      - watch
      - update
  - apiGroups:
      - ""
    resources:
      - serviceaccounts/token
    verbs:
      - create

Install the Helm chart:

helm upgrade --install aws-cloud-controller-manager -n kube-system aws-cloud-controller-manager/aws-cloud-controller-manager --values values.yaml

Verify that the Helm chart installed successfully:

helm status -n kube-system aws-cloud-controller-manager

If present, edit the DaemonSet to remove the default node selector node-role.kubernetes.io/control-plane: "":

kubectl edit daemonset aws-cloud-controller-manager -n kube-system

(Optional) Verify that the cloud controller manager update succeeded:

kubectl rollout status daemonset -n kube-system aws-cloud-controller-manager

Migrating to the Out-of-Tree AWS Cloud Provider for RKE

To migrate from an in-tree cloud provider to the out-of-tree AWS cloud provider, you must stop the existing cluster's kube controller manager and install the AWS cloud controller manager. There are many ways to do this. Refer to the official AWS documentation on the external cloud controller manager for details.

If it's acceptable to have some downtime, you can switch to an external cloud provider, which removes in-tree components and then deploy charts to install the AWS cloud controller manager.

If your setup can't tolerate any control plane downtime, you must enable leader migration. This facilitates a smooth transition from the controllers in the kube controller manager to their counterparts in the cloud controller manager. Refer to the official AWS documentation on Using Leader Migration for more details.

Important

The Kubernetes cloud controller migration documentation mentions that it is possible to migrate with the same Kubernetes version, but assumes that migration is part of a Kubernetes upgrade.

Refer to the Kubernetes documentation on migrating to use the cloud controller manager to see if you need to customize your setup before migrating. Confirm your migration configuration values. If your cloud provider provides an implementation of the Node IPAM controller, you also need to migrate the IPAM controller.

Update the cluster config to enable leader migration in cluster.yml:

services:
  kube-controller:
    extra_args:
      enable-leader-migration: "true"

Note that the cloud provider is still aws at this step:

cloud_provider:
  name: aws

Cordon the control plane nodes, so that the AWS cloud controller pods run on nodes only after upgrading to the external cloud provider.

kubectl cordon -l "node-role.kubernetes.io/controlplane=true"

To install the AWS cloud controller manager, you must enable leader migration in values.yaml and follow the same steps as when installing chart on a new cluster. To enable leader migration, add the following to the container arguments in values.yaml:

- '--enable-leader-migration=true' 

Confirm that the chart is installed but the new pods aren't running yet due to cordoned controlplane nodes. After updating the cluster in the next step, RKE will uncordon each node after upgrading and aws-controller-manager pods will be scheduled.
Update cluster.yml to change the cloud provider and remove the leader migration arguments from the kube-controller.

Enabling use_instance_metadata_hostname is required if hostname-override is empty or if hostname-override doesn't meet the node naming conventions.

cloud_provider:
  name: external-aws
  use_instance_metadata_hostname: true/false

Remove enable-leader-migration from:

services:
  kube-controller:
    extra_args:
      enable-leader-migration: "true"

If you're upgrading the cluster's Kubernetes version, set the Kubernetes version as well.
Update the cluster. The aws-cloud-controller-manager pods should now be running.
(Optional) After the upgrade, leader migration is no longer required due to only one cloud-controller-manager and can be removed. Upgrade the chart and remove the following section from the container arguments:

- --enable-leader-migration=true

Verify the cloud controller manager update was successfully rolled out with the following command:

kubectl rollout status daemonset -n kube-system aws-cloud-controller-manager

IAM Requirements​

Tagging AWS Resources​

Tagging for Load Balancers​

Using the Out-of-Tree AWS Cloud Provider for RKE​

Helm Chart Installation from CLI​

Migrating to the Out-of-Tree AWS Cloud Provider for RKE​