Skip to content

Fix GH-18753: file_get_contents() and file_put_contents() fail with data >=2GB on macOS & BSD #18755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: PHP-8.3
Choose a base branch
from

Conversation

kiler129
Copy link

@kiler129 kiler129 commented Jun 4, 2025

Problem

On Linux read(3) syscall returns as many bytes as possible, up to requested maximum. Darwin/XNU and BSD kernels instead immediately returns EINVAL if the requested value is larger than native INT_MAX. This causes file_get_contents() to fail with files larger than 2147483647 bytes.

Fix

This patch applies the same clamping of buffer/read chunk to 2GB as previously on Windows. This fixes the file_get_contents(). There's still a way to trigger the problem using fread() with stream_set_read_buffer() set to 0/unbuffered, but I think this is more of an expected behavior. After this patch the later returns with EBADF instead of EINVAL.

Affected version

The bug stems affects 8.2 onwards d/t 6beee1a from #8547. However, as this is not a security issue I'm targeting 8.3 as it's latest actively supported version.

@DanielEScherzer
Copy link
Member

Please target the PHP-8.3 branch rather than the specific PHP-8.3.22 patch branch

@kiler129
Copy link
Author

kiler129 commented Jun 4, 2025

I just realized the problem also affects php_put_contents() as well, due to the same behavior of write(). Fixed the commit to cover that too.

@kiler129 kiler129 changed the title Fix GH-18753: file_get_contents() fails with files >=2GB on macOS & BSD Fix GH-18753: file_get_contents() and file_put_contents() fail with data >=2GB on macOS & BSD Jun 4, 2025
Copy link
Member

@NattyNarwhal NattyNarwhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little surprising that they do limit it to INT_MAX, but it is documented (at least in FreeBSD), nor is there an interface that lets you do larger (other than switching to kqueue for everything?). Weird.

@@ -53,12 +53,16 @@ extern int php_get_uid_by_name(const char *name, uid_t *uid);
extern int php_get_gid_by_name(const char *name, gid_t *gid);
#endif

#if defined(PHP_WIN32)
#if defined(PHP_WIN32) || defined(__APPLE__) || defined(__MACH__) || defined(BSD) || defined(__DragonFly__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do a configure-time check for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be just better to invert and allow it only for Linux

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure if Solaris can handle it either (can't be bother to check) so safest option should be to just do #ifdef __linux__ and invert the logic...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be salient to also add a comment explaining why then. (the BSD and Windows cause should explain pretty much all cases).

(FWIW, AIX does seem to allow reads as big as off_t on 64-bit systems...)

@@ -53,12 +53,16 @@ extern int php_get_uid_by_name(const char *name, uid_t *uid);
extern int php_get_gid_by_name(const char *name, gid_t *gid);
#endif

#if defined(PHP_WIN32)
#if defined(PHP_WIN32) || defined(__APPLE__) || defined(__MACH__) || defined(BSD) || defined(__DragonFly__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be just better to invert and allow it only for Linux

--FILE--
<?php
echo "-- file_put_contents() --\n";
echo file_put_contents('bigfile', str_repeat('a', 2 ** 31));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's quite a lot memory, I wonder if we should add some skip if there is not enough system memory. But it needs to work on Mac to be actually useful and not like https://github.com/php/php-src/blob/91becb304286942337ad65ec614cc0d1a3f79cd6/ext/standard/tests/file/file_get_contents_file_put_contents_5gb.phpt which is actually the reason why we didn't notice this (that test is always skipped on OSX and not sure if it works on FreeBSD either.

--FILE--
<?php
echo "-- file_put_contents() --\n";
echo file_put_contents('bigfile', str_repeat('a', 2 ** 31));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name needs to be test specific - something like gh18753.data for example

@@ -53,12 +53,16 @@ extern int php_get_uid_by_name(const char *name, uid_t *uid);
extern int php_get_gid_by_name(const char *name, gid_t *gid);
#endif

#if defined(PHP_WIN32)
#if defined(PHP_WIN32) || defined(__APPLE__) || defined(__MACH__) || defined(BSD) || defined(__DragonFly__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure if Solaris can handle it either (can't be bother to check) so safest option should be to just do #ifdef __linux__ and invert the logic...

@devnexen
Copy link
Member

devnexen commented Jun 5, 2025

I did bother to try on illumos yesterday, I forgot to tell but it works without the change. However I can t vouch for solaris which starts to be more and more behind as the time goes so I would be in favor to do what @bukka said, much simpler too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants