Skip to content

Conversation

@otaconix
Copy link
Contributor

@otaconix otaconix commented Mar 4, 2025

I saw #296 as a good first issue and thought I'd have a go at it.

A few things to note:

  • I named the argument the same as curl's (--noproxy), as that seems to be what you were aiming for. This could potentially be confusing for users, as it's similar to --no-proxy, which undoes any --proxy arguments.
  • reqwest's NoProxy's applies a wildcard (*) only to hostnames, not IP addresses. curl does both. So there's a little workaround to implicitly add IPv4 and IPv6 full ranges (respectively 0.0.0.0/0 and ::/0).
  • I wanted to add tests for IPv6 and hostnames, but didn't see a way to do this easily: IPv6 would require testing on a host that has an IPv6 interface (not sure lo has that by default), and for hostnames, I thought I could use --resolve, but that turns the hostname into an IP address, so the hostname never reaches reqwest. If needed, I can take a closer look to see if I can add these tests.

Copy link
Collaborator

@blyxxyz blyxxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

For the name, maybe --disable-proxy-for? Bit of a mouthful though.

@otaconix
Copy link
Contributor Author

otaconix commented Mar 4, 2025

For the name, maybe --disable-proxy-for? Bit of a mouthful though.

Some of the ideas I had about the name:

  • --bypass-proxy
  • --ignore-proxy
  • curl's --noproxy

--disable-proxy-for is as good a suggestion as any, though as you say, a bit long.

Is there some "guideline"/design principle that's used to name flags? I originally added support for --proxy, but didn't give much thought to naming back then (I was just scratching my own itch).

@blyxxyz
Copy link
Collaborator

blyxxyz commented Mar 4, 2025

Most of the flags are inherited from HTTPie, so we don't have a lot of experience coming up with names unfortunately.

--noproxy feels too close to --no-proxy like you said. I don't have a strong opinion about bypass vs ignore vs disable, but I tacked on -for to make it clear that the option takes an argument. --disable-proxy sounds like it could be a binary flag that just toggles off all proxies. I'm not sure how confusing that is in practice.

Uhh, I'll use an LLM. I'll pretend these options already exist and ask it what they do and read what it hallucinates. That hopefully gets close to the intuition of real users. First I ask

What does xh's --proxy option do?

And it doesn't get the syntax quite right but it correctly guesses what the option is for.

Then I ask

Thanks! What about the --disable-proxy option?

And it gives examples where it doesn't take an argument, xh --disable-proxy https://example.com and xh --disable-proxy --auth user:password https://example.com. So it guessed incorrectly.

  • When I ask it about --disable-proxy-for it gets it exactly right, it even guesses that you can separate the hosts with commas: xh --proxy http://proxy.example.com:8080 --disable-proxy-for internal.example.com,api.local https://example.com.
  • For --noproxy it also guesses right, probably because it has been trained on lots of curl commands.
  • (For --no-proxy it guesses that it's a binary flag, so it picks up on the nuance! But it says that --no-proxy can override $http_proxy, which it can't, though maybe it should.)
  • For --ignore-proxy it incorrectly guesses a binary flag.
  • For --bypass-proxy it guesses correctly! I guess bypass makes it more clear than the other verbs that you're not completely disabling the proxies. This is a strong candidate.

I also asked it

Thanks! Is there an option for selectively bypassing the proxy for certain hosts?

And it came up with this, which is pretty interesting:
xh --proxy http://proxy.example.com:8080,no-proxy=localhost,no-proxy=127.0.0.1,no-proxy=example.com https://example.com

We have the https: and http: keys in the current syntax, so we could add a no-proxy:/noproxy:/no:/ignore: or something key. There's a pleasing pattern there, http: corresponds to $HTTP_PROXY, https: corresponds to $HTTPS_PROXY, so why not make no: correspond to $NO_PROXY? Then you could write --proxy 'http:http://localhost:8080,no:example.org'.

But maybe it's a little too cute, and I'd first want to check if the syntax for the --proxy option was invented for HTTPie or if it's also used somewhere else.

Finally I asked it

Thanks! I remember there's another option for selectively bypassing the proxy for certain hosts, what's it called?

and it hallucinated --proxy-ignore. Feels a little weird but it does more to suggest that it takes an argument than --ignore-proxy does.

Take all of this with a grain of salt of course.

@otaconix
Copy link
Contributor Author

otaconix commented Mar 5, 2025

@blyxxyz Can you take a look at the latest changes? And maybe also see if there's anything you'd like to add to the issue I raised at seanmonstar/reqwest#2579 ?

@blyxxyz blyxxyz self-requested a review March 5, 2025 19:17
@otaconix
Copy link
Contributor Author

otaconix commented Mar 5, 2025

I also asked it

Thanks! Is there an option for selectively bypassing the proxy for certain hosts?

And it came up with this, which is pretty interesting: xh --proxy http://proxy.example.com:8080,no-proxy=localhost,no-proxy=127.0.0.1,no-proxy=example.com https://example.com

We have the https: and http: keys in the current syntax, so we could add a no-proxy:/noproxy:/no:/ignore: or something key. There's a pleasing pattern there, http: corresponds to $HTTP_PROXY, https: corresponds to $HTTPS_PROXY, so why not make no: correspond to $NO_PROXY? Then you could write --proxy 'http:http://localhost:8080,no:example.org'.

But maybe it's a little too cute, and I'd first want to check if the syntax for the --proxy option was invented for HTTPie or if it's also used somewhere else.

This is getting into software archeology, but it kinda looks like httpie's syntax for proxies is a direct mapping of the underlying requests library: https://docs.python-requests.org/en/latest/user/advanced/#proxies

I do like the idea of adding something like no-proxy to the mini-DSL of the --proxy option, since --proxy is repeatable, and reqwest's NoProxy applies to a single proxy (it's just that as currently implemented in this PR, the --ignore-proxy-for is applied to every proxy). I'm just not sure how, if at all, xh will be able to map that to curl, since it doesn't look like curl supports anything like that out of the box.

@otaconix otaconix changed the title Add --noproxy argument to ignore proxy for hosts or IP addresses. Add --disable-proxy-for argument to ignore proxy for hosts or IP addresses. Mar 10, 2025
Copy link
Collaborator

@blyxxyz blyxxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! A few more notes.

Comment on lines +1533 to +1536
#[test]
fn disable_proxy_for_trims_whitespace() {
assert_eq!(DisableProxyFor::from("*"), DisableProxyFor::from(" * "));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run this via Cli::try_parse_from in case clap one day decides to stop using From<&str> or something?

/// The environment variable "NO_PROXY"/"no_proxy" can also be used, but its completely ignored
/// if --disable-proxy-for is passed.
#[clap(long, value_name = "no-proxy-list", value_delimiter = ',')]
pub disable_proxy_for: Vec<DisableProxyFor>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After sleeping on it I like your --bypass-proxy naming best, because it's shorter, because --no-bypass-proxy seems clearer than --no-disable-proxy-for, and because the "bypass" terminology is also used by other CLIs like chromium and netsh on Windows.

(Sorry for the churn!)

Comment on lines +308 to +309
/// if --disable-proxy-for is passed.
#[clap(long, value_name = "no-proxy-list", value_delimiter = ',')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One effect of doing it this way is that --bypass-proxy=example.com --bypass-proxy=example.org is equivalent to --bypass-proxy=example.com,example.org. This is different from curl, but it might be desirable.

If we want to commit to it then we should have a test for it so it doesn't regress and we should maybe mention it in the docs.

curl's manpage says:

If --noproxy is provided several times, the last set value will be used.

We could say:

--bypass-proxy can be provided several times. The list can be cleared with --no-bypass-proxy.


Ok(proxy.no_proxy(
reqwest::NoProxy::from_string(&noproxy_comma_delimited)
.or_else(reqwest::NoProxy::from_env),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd also apply the * handling to the environment variable.

I'm also OK with deferring that for now since it's annoying to handle here and we haven't heard back from reqwest yet. In that case could you put a code comment?

Proxy::All(url) => reqwest::Proxy::all(url),
}?;

let mut noproxy_comma_delimited = disable_proxy_for.join(",");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a weird effect where --disable-proxy-for='' doesn't override $NO_PROXY but --disable-proxy-for='' --disable-proxy-for='' does override it.

Perhaps:

  • if disable_proxy_for.is_empty(), always do .no_proxy(reqwest::NoProxy::from_env()).
  • Otherwise, do from_string() without a fallback.

Comment on lines +1144 to +1155
let mut proxy_server = server::http(|_| async move {
hyper::Response::builder()
.status(200)
.body("Proxy shouldn't have been used.".into())
.unwrap()
});
let mut actual_server = server::http(|_| async move {
hyper::Response::builder()
.status(200)
.body("".into())
.unwrap()
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut proxy_server = server::http(|_| async move {
hyper::Response::builder()
.status(200)
.body("Proxy shouldn't have been used.".into())
.unwrap()
});
let mut actual_server = server::http(|_| async move {
hyper::Response::builder()
.status(200)
.body("".into())
.unwrap()
});
let mut proxy_server = server::http(|_| async move {
hyper::Response::builder()
.status(200)
.body("Proxy response".into())
.unwrap()
});
let mut actual_server = server::http(|_| async move {
hyper::Response::builder()
.status(200)
.body("Non-proxy response".into())
.unwrap()
});

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test the environment variable?

impl Proxy {
pub fn into_reqwest_proxy(
self,
disable_proxy_for: &[DisableProxyFor],
Copy link
Collaborator

@blyxxyz blyxxyz Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you instead (or additionally) have a function to convert &[DisableProxyFor] to NoProxy? Then we only have to process it once.

Bonus points if you do it in Cli::try_parse_from and store the NoProxy as a field in the Cli struct, since then the fully processed IpMatcher and DomainMatcher get printed by --debug.

@otaconix otaconix changed the title Add --disable-proxy-for argument to ignore proxy for hosts or IP addresses. Add --bypass-proxy argument to ignore proxy for hosts or IP addresses. Mar 12, 2025
@otaconix otaconix marked this pull request as draft May 9, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants