Description
Description
Recently a user notified us of a corrupted file after a composer install
on our platform while using PHP 8.2.1.
It seems that under some condition the _php_stream_copy_to_stream_ex
can corrupt files.
I am not a PHP expert but here is the information I have found.
If I run the following code:
<?php
$archive = new PharData("./01fd644d9004da280847edbe58bb0346773cffaa.tar");
var_dump($archive->extractTo("./out", ["src/Versions/Version.php"]));
I get this output on some of our servers:
[14:36] Scalingo: /tmp/server-a $ php test.php
bool(true)
[14:37] Scalingo: /tmp/server-a $ sha512sum out/src/Versions/Version.php
ef8b2f794a94bbf45e4e4a1ed12a0edf5d403a6216cb23dcddb5fee894b7c6ee7ea73a92d5cc0a567343d9861a018f6a5439033f09e2d4ecbeb0345894b9c611 out/src/Versions/Version.php
But I expected this output instead:
[14:35] Scalingo: /tmp/server-b $ php test.php
bool(true)
[14:37] Scalingo: /tmp/server-b $ sha512sum out/src/Versions/Version.php
1fe10db185b013570e15147991377c36afe74bae2a2b799589aac0add6e51d33c943cfc80bf57da52c2bfdd393b632ba2658b289fe0df97ae72b93a8659a96fa out/src/Versions/Version.php
The file outputed on the first server is clearly corrupted when on the second server it's the expected output.
After some investigation I think that we traced back the issue to the copy_file_range
call in streams/streams.c
.
On system where corruption is occurring, only a part of the requested length is read, where on systems where no corruption is occurring the entire requested length is read.
Here are part of the strace
that seems relevant to me.
The strace
which results in corrupted file:
openat(AT_FDCWD, "/tmp/server-a/out/src/Versions/Version.php", O_RDWR|O_CREAT|O_TRUNC, 0666) = 6
fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
lseek(6, 0, SEEK_CUR) = 0
lseek(5, 73728, SEEK_SET) = 73728
lseek(5, 73728, SEEK_SET) = 73728
lseek(6, 0, SEEK_SET) = 0
copy_file_range(5, NULL, 6, NULL, 5667, 0) = 4096
fstat(5, {st_mode=S_IFREG|0600, st_size=99840, ...}) = 0
mmap(NULL, 5667, PROT_READ, MAP_SHARED, 5, 0x13000) = 0x7f6d150e6000
lseek(5, 83491, SEEK_SET) = 83491
write(6, "Step): void\n {\n $this-"..., 5667) = 5667
munmap(0x7f6d150e6000, 5667) = 0
close(6) = 0
The strace
that results in a valid file:
openat(AT_FDCWD, "/tmp/server-b/out/src/Versions/Version.php", O_RDWR|O_CREAT|O_TRUNC, 0666) = 6
fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
lseek(6, 0, SEEK_CUR) = 0
lseek(5, 73728, SEEK_SET) = 73728
lseek(5, 73728, SEEK_SET) = 73728
lseek(6, 0, SEEK_SET) = 0
copy_file_range(5, NULL, 6, NULL, 5667, 0) = 5667
close(6) = 0
chmod("./out/src/Versions/Version.php", 0644) = 0
When it resulted in a valid file, it seems that copy_file_range
copied the entire requested length (5667 bytes).
But on the corrupted version, it looks like it copied a part of the file (4096 bytes). Then it switched to the mmap
method to copy the rest of the file.
I'm really neither an expert on PHP internals nor on system calls so the following might be completely wrong. But my understanding, seems to be that the mmap
call copies too much data:
mmap(NULL, 5667, PROT_READ, MAP_SHARED, 5, 0x13000) = 0x7f6d150e6000
Seems to ask 5667 bytes on the FD 5. But the FD already read 4096 bytes so shouldn't we read 4096 - 5667 = 1571
bytes?
This seems to be confirmed by the seek that comes after.
The base offset seems to be 73728
(cf. the seek before copy_file_range
) so the next seek should be 73728 + 5667 = 79395
.
But according to the strace
, it seeks to 83491
(73728 + 5667 + 4096).
A solution would be to subtract the result
bytes from the bytes read in the mmap.
Or to do a loop around copy_file_range
that would keep calling that method until it read the entire requested length.
The thing we do not understand is why copy_file_range
is returning 4096
on some systems. Because on almost all systems it copies the entire file even if the environment are pretty close (both are on Ubuntu 20.04, Using the same kernel branch, code run in a docker container, ..). That's why it's a bit hard to give some precise instructions on how to reproduce the issue.
Additional information
On both servers PHP version is:
PHP 8.2.1 (cli) (built: Jan 16 2023 18:08:45) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.1, Copyright (c) Zend Technologies
with Zend OPcache v8.2.1, Copyright (c), by Zend Technologies
PHP Version
PHP 8.2.1
Operating System
Ubuntu 20.04