php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77937 preg_match failed
Submitted: 2019-04-24 20:33 UTC Modified: 2019-05-07 21:26 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: v-altruo at microsoft dot com Assigned: cmb (profile)
Status: Closed Package: *General Issues
PHP Version: 7.3.5RC1 OS: Windows 10
Private report: No CVE-ID: None
 [2019-04-24 20:33 UTC] v-altruo at microsoft dot com
Description:
------------
Failed regardless of OPCache being enabled or disabled and if it was TS or NTS. 
Test file location: ext\pcre\tests\locales.phpt

Test script:
---------------
setlocale(LC_ALL, 'pt_PT', 'pt', 'pt_PT.ISO8859-1', 'portuguese');
var_dump(preg_match('/^\w{6}$/', 'aאבחיט'));


Expected result:
----------------
int(1)

Actual result:
--------------
int(0)

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-04-24 20:45 UTC] [email protected]
-Status: Open +Status: Not a bug
 [2019-04-24 20:45 UTC] [email protected]
Last I knew Portuguese does not cover Hebrew characters.
 [2019-04-24 22:20 UTC] a at b dot c dot de
Adding the /u modifier to the pattern would help (assuming the source is encoded in UTF8 - you can't even SAY "aאבחיט" in ISO8859-1).
 [2019-04-24 22:24 UTC] a at b dot c dot de
Incidentally, the test cited in the original report uses the string "aàáçéè", not Hebrew characters.
 [2019-04-25 09:14 UTC] [email protected]
-Status: Not a bug +Status: Re-Opened -Assigned To: +Assigned To: cmb
 [2019-04-25 09:14 UTC] [email protected]
Thanks for reporting!  I can reproduce the *test* *failure*. The
problem is that setlocale()[1] claims to support "pt_PT", but
actually it does not.  Actually supported locales would be "pt-PT"
and "portuguese".

I'm not sure yet what to do about this.  Simply fixing the test
case for Windows would be an option, but that would not fix the
underlying issue which may affect existing userland code.

[1] <https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=vs-2019>
 [2019-04-25 10:03 UTC] [email protected]
Hmm, yes, it seems Windows will quite happily accept any "language" or "language_country" string regardless of whether either part exists, as long as the language code is 2 or 3 characters.

var_dump(setlocale(LC_ALL, "xjq_ASDF")); // returns xjq_ASDF
var_dump(setlocale(LC_ALL, "0")); // still xjq_ASDF

FFS.

So for maximum portability it seems you have to list Windows-specific strings before the normal strings. Or at least the codes it accepts before any short ones.

setlocale(LC_ALL,
  "Portuguese_Portugal.28591", // windows okay (28591 is the codepage for ISO 8859-1), linux ignored
  "Portuguese_Portugal",       // windows okay, linux ignored
  "Portuguese",                // windows okay, linux ignored
  "pt_PT.ISO8859-1",           // windows ignored (bad codepage), linux okay
  "pt_PT",                     // windows okay (wrong), linux okay
  "pt"                         // windows okay (wrong), linux okay
);
 [2019-04-25 17:03 UTC] [email protected]
-Package: PCRE related +Package: *General Issues
 [2019-04-25 17:03 UTC] [email protected]
This issue is neither directly PCRE nor testing related.  Consider
the following script:

    <?php
    var_dump(17.4);
    var_dump(setlocale(LC_ALL, 'pt_PT'));
    var_dump(17.5);
    var_dump(ctype_alpha(224));
    ?>

This outputs on my Windows system:

    float(17.4)
    string(5) "pt_PT"
    float(17,5)
    bool(false)

The first three lines indicate that pt_PT is properly supported,
but the failing ctype_alpha() shows that it is not really.

The following C program confirms that the issue is not directly
related to PHP:

    #include <stdio.h>
    #include <ctype.h>
    #include <locale.h>

    int main()
    {
        struct lconv *lc1 = localeconv();
        char *loc = setlocale(LC_ALL, "pt_PT");
        struct lconv *lc2 = localeconv();
        int alpha = isalpha(224);
        printf("%s %s %s %d\n", lc1->decimal_point, loc, lc2->decimal_point, alpha);
        return 0;
    }

Outputs on my Windows system (when built with VC15):

    . pt_PT , 0

Again, ctype fails to properly recognize the locale (which is the
reason for the failing test, since PCRE2 calls ctype functions to
build the character tables).

If I build with VC11, I get:

    . (null) . 0

Apparently, AppVeyor behaves either like this, or it properly
recognizes pt_PT for the ctype functions.
 [2019-05-07 21:26 UTC] [email protected]
I've made a patch[1] which is supposed to be as backward
compatible as reasonably possible.  To ease testing, respective
binary snapshots[2] are available as well.  I hope to get some
feedback on that before proceeding.

Thanks!

[1] <https://github.com/cmb69/php-src/commit/fa35882831010861aea3c4b2d12dd4d3d0fb64a7>
[2] <https://windows.php.net/downloads/snaps/ostc/77937/>
 [2019-05-16 09:11 UTC] [email protected]
The following pull request has been associated:

Patch Name: Fix #77937: preg_match failed
On GitHub:  https://github.com/php/php-src/pull/4169
Patch:      https://github.com/php/php-src/pull/4169.patch
 [2019-06-11 06:46 UTC] [email protected]
Automatic comment on behalf of [email protected]
Revision: http://git.php.net/?p=php-src.git;a=commit;h=f3ff72e54b2f6c2fa1ac924ad95455a5309099d5
Log: Fix #77937: preg_match failed
 [2019-06-11 06:46 UTC] [email protected]
-Status: Re-Opened +Status: Closed
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Jun 08 15:01:26 2025 UTC