Skip to content

WIP: Add support for dual stack load balancers#1313

Open
nrb wants to merge 11 commits intokubernetes:masterfrom
nrb:dual-stack-lb
Open

WIP: Add support for dual stack load balancers#1313
nrb wants to merge 11 commits intokubernetes:masterfrom
nrb:dual-stack-lb

Conversation

@nrb
Copy link
Contributor

@nrb nrb commented Dec 2, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

Adds basic support for configuring the IP address family on Services.

This functionality overlaps with the ALBC somewhat, but the goal here is to provide much more basic functionality, not the full scope of ALBC.

Which issue(s) this PR fixes:

Fixes #1219

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Introduce `service.beta.kubernetes.io/aws-load-balancer-ip-address-type` to enable dual stack Services.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 2, 2025
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 2, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 2, 2025
@nrb
Copy link
Contributor Author

nrb commented Dec 2, 2025

/test pull-cloud-provider-aws-test

@damdo
Copy link
Member

damdo commented Dec 9, 2025

Hey @kmala do we know what's happening with the Netlify checks? It looks like they are broken across many open PRs. TY

@damdo
Copy link
Member

damdo commented Dec 10, 2025

/retest

1 similar comment
@nrb
Copy link
Contributor Author

nrb commented Dec 11, 2025

/retest

@nrb
Copy link
Contributor Author

nrb commented Dec 12, 2025

/retest

@nrb
Copy link
Contributor Author

nrb commented Dec 12, 2025

/test pull-cloud-provider-aws-test

One more run to see if this is a rate limiting issue or some sort of genuine issue with my code.

Copy link

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with a simple service below:

apiVersion: v1
kind: Service
metadata:
  name: wordpress-mysql
  labels:
    app: wordpress
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-ip-address-type: dualstack
spec:
  ipFamilyPolicy: PreferDualStack
  ports:
    - port: 3306
  selector:
    app: wordpress
  type: LoadBalancer

The CCM does create the dualstack NLB as expected 🚀

$ kubectl -n myns get svc/wordpress-mysql -o yaml | yq .status.loadBalancer.ingress
- hostname: <id>.elb.us-east-1.amazonaws.com

$ nslookup <id>.elb.us-east-1.amazonaws.com
# nslookup shows both public IPv4 EIP and IPv6 GUA addresses

$ aws elbv2 describe-load-balancers --names=<nlb-id> --query='LoadBalancers[0].[Type,IpAddressType,Scheme]'
[
    "network",
    "dualstack",
    "internet-facing"
]

@nrb
Copy link
Contributor Author

nrb commented Jan 16, 2026

/test pull-cloud-provider-aws-test

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 20, 2026
@nrb
Copy link
Contributor Author

nrb commented Jan 22, 2026

/test pull-cloud-provider-aws-test

@nrb
Copy link
Contributor Author

nrb commented Jan 22, 2026

/test pull-cloud-provider-aws-check

@nrb
Copy link
Contributor Author

nrb commented Jan 23, 2026

/test pull-cloud-provider-aws-test
/test pull-cloud-provider-aws-check

@nrb
Copy link
Contributor Author

nrb commented Jan 26, 2026

@kmala I'm going to do some manual testing with this PR starting today before I mark it as ready for review, but would you be able to give it a look by any chance?

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 24, 2026
@nrb nrb marked this pull request as ready for review February 24, 2026 19:55
@nrb
Copy link
Contributor Author

nrb commented Feb 24, 2026

I've updated the change to only use the Kubernetes API fields in service.Spec rather than annotations. I think this keeps the implementation straightforward, and there should be nothing in this PR that blocks us from adding support for more granular settings via annotations in the future.

- Create load balancers according to the Kubernetes Service API
- Do not reconcile Services with spec.loadBalancerClass defined. These
  are handled by external controllers, and the CCM should ignore them.

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@nrb
Copy link
Contributor Author

nrb commented Feb 24, 2026

govulncheck notice will be resolved by pulling kubernetes/kubernetes#136820 in during a kube bumpl.

internalELB,
annotations,
securityGroups,
apiService,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we are passing the entire service then we can skip passing other params like service name, annotation right?

Copy link

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few questions 🙏 But overall, I think we are having great progress!

Comment on lines 2542 to 2546
err = c.updateInstanceSecurityGroupsForNLB(ctx, loadBalancerName, instances, subnetCidrs, sourceRangeCidrs, v2Mappings)
if err != nil {
klog.Warningf("Error opening ingress rules for the load balancer to the instances: %q", err)
return nil, err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider/allow IPv6 CIDRs when updating the instance's security groups (to allow traffic from NLB), right? IIUC, there are 2 types of CIDRs here:

  • sourceRangeCidrs: Already defined on line 2396-2405
  • subnetCidrs: IPv6 CIDRs can be extracted from subnet info returned by AWS.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that means we need to update the following functions to set the Ipv6Range field correctly?

func (c *Cloud) updateInstanceSecurityGroupsForNLB(ctx context.Context, lbName string, instances map[InstanceID]*ec2types.Instance, subnetCIDRs []string, clientCIDRs []string, portMappings []nlbPortMapping) error {

func (c *Cloud) updateInstanceSecurityGroupForNLBTraffic(ctx context.Context, sgID string, sgPerms IPPermissionSet, ruleDesc string, protocol string, ports sets.Set[int32], cidrs []string) error {

Comment on lines +190 to +194
if service.Spec.IPFamilyPolicy != nil && *service.Spec.IPFamilyPolicy == v1.IPFamilyPolicySingleStack {
if serviceRequestsIPv6(service) {
return nil, fmt.Errorf("single stack IPv6 is not supported for network load balancers")
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can:

  • Place this validation in pkg/providers/v1/aws_validations.go
  • Additionally, we should add a validation that only allows dualstack settings (i.e. ipFamilies: [IPv4, IPv6], ipFamilyPolicy: RequireDualStack) when LB type is NLB.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense.


Some limitations to be aware of when using dual stack load balancers:

- The `spec.ipFamilies` field can have a second family added or removed, but the first entry is immutable after Service creation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we should mention what would happen if the secondary family is removed? Like what will be the state of:

  • The Load Balancer
  • The Security Groups (those for LB and those for instances)
  • Target Group
  • (Optional) Listeners

Comment on lines +117 to +118
- IPv6
- IPv4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The list indentation is a lot more than usual. Let's format it a bit 😁

nrb added 3 commits February 25, 2026 16:10
remove loadbalancerclass check

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
pass IPv6 source ranges to security group rule

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Set the IP address type on load balancer creation

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@damdo
Copy link
Member

damdo commented Feb 26, 2026

/retest

@damdo
Copy link
Member

damdo commented Feb 26, 2026

/test pull-cloud-provider-aws-e2e-kubetest2-quick

1 similar comment
@damdo
Copy link
Member

damdo commented Feb 27, 2026

/test pull-cloud-provider-aws-e2e-kubetest2-quick

nrb added 5 commits February 27, 2026 16:58
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 2, 2026
Check for empty fields in validation logic

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@nrb
Copy link
Contributor Author

nrb commented Mar 3, 2026

/test pull-cloud-provider-aws-test

@nrb
Copy link
Contributor Author

nrb commented Mar 3, 2026

Current e2e failure is legit - the provisioned cluster is not set up for IPv6.

   STEP: creating LoadBalancer service in Kubernetes @ 03/03/26 17:45:37.338
  I0303 17:45:37.386843 32011 loadbalancer.go:508] Unexpected error: 
      <*errors.errorString | 0xc000321140>: 
      failed to create LoadBalancer Service "lbconfig-test-clb-ds": Service "lbconfig-test-clb-ds" is invalid: spec.ipFamilies[1]: Invalid value: "IPv6": not configured on this cluster
      {
          s: "failed to create LoadBalancer Service \"lbconfig-test-clb-ds\": Service \"lbconfig-test-clb-ds\" is invalid: spec.ipFamilies[1]: Invalid value: \"IPv6\": not configured on this cluster",
      }
  [FAILED] in [It] - /home/prow/go/src/k8s.io/cloud-provider-aws/tests/e2e/loadbalancer.go:508 @ 03/03/26 17:45:37.387
  STEP: Destroying namespace "cloud-provider-aws-593" for this suite. @ 03/03/26 17:45:37.387
  << Timeline 

Will work on getting support for that plumbed in to kubetest-ec2.

Remove LoadBalancerClass check

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@nrb nrb force-pushed the dual-stack-lb branch from 1b3815c to 2dd0dac Compare March 4, 2026 14:20
@k8s-ci-robot
Copy link
Contributor

@nrb: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e 2dd0dac link true /test pull-cloud-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dual-stack support for Service type-loadBalancer NLB

7 participants