Skip to content

Conversation

@johnwhumphreys
Copy link
Contributor

Summary:
In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function. This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns ..cluster.local and return it. If it fails, it makes a best attempt at a fallback.

Note: The fallback does not match prior functionality; but diff #4 in this stack makes it fall back to empty-string and adds tests to cover this diff.

Differential Revision: D88567449

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 7, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 7, 2025

@johnwhumphreys has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88567449.

johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 8, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Differential Revision: D88567449
johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 8, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Differential Revision: D88567449
johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 8, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Differential Revision: D88567449
@codecov-commenter
Copy link

codecov-commenter commented Dec 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.71%. Comparing base (981dfcf) to head (092156f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1173      +/-   ##
==========================================
+ Coverage   91.70%   91.71%   +0.01%     
==========================================
  Files          86       86              
  Lines        6653     6665      +12     
==========================================
+ Hits         6101     6113      +12     
  Misses        552      552              
Flag Coverage Δ
unittests 91.71% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 8, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Reviewed By: AbishekS

Differential Revision: D88567449
johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 8, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Reviewed By: AbishekS

Differential Revision: D88567449
johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 8, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Reviewed By: AbishekS

Differential Revision: D88567449
johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 8, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Reviewed By: AbishekS

Differential Revision: D88567449
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Note: adding extra test and resubmitting to avoid GitHub coverage block.

Reviewed By: AbishekS

Differential Revision: D88567449
johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 9, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Note: adding extra test and resubmitting to avoid GitHub coverage block.

Reviewed By: AbishekS

Differential Revision: D88567449
johnwhumphreys added a commit to johnwhumphreys/torchx that referenced this pull request Dec 9, 2025
…1173)

Summary:

In the current implementation, the hostname is set to empty-string in the k8s scheduler's describe function (basically it's an implementation gap).  This diff uses the k8s client to map the pod-name to its ip, and generate the pop cluster-dns <dashed-ip>.<namespace>.cluster.local and return it.

If the pod IP is not available yet, this version of the diff will simply return an empty hostname since the k8s hostname is derived from the IP.  The describe API is intended to quickly return the actual state from the scheduler's POV; so returning an empty host name when the IP is not yet assigned is better than blocking until it is, which could take an arbitrary amount of time.

Existing users of the API should be fine as the hostname was not populuated anyway.  New users will need to implement polling until the host name is ready, if they have not already.

Note: adding extra test and resubmitting to avoid GitHub coverage block.

Reviewed By: AbishekS

Differential Revision: D88567449
@meta-codesync meta-codesync bot merged commit 1de66c5 into meta-pytorch:main Dec 9, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants