php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73948 Preg_match_all should return NULLs on trailing optional capture groups.
Submitted: 2017-01-16 14:24 UTC Modified: 2019-03-19 14:39 UTC
Votes:3
Avg. Score:4.7 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:0 (0.0%)
From: tomasyorke at hotmail dot com Assigned: nikic (profile)
Status: Closed Package: PCRE related
PHP Version: 7.0.14 OS: Windows 7
Private report: No CVE-ID: None
 [2017-01-16 14:24 UTC] tomasyorke at hotmail dot com
Description:
------------
Using preg_match_all with the PREG_SET_ORDER flag and an optional capture group might return either an empty string or a missing element. 

This depends on whether there is a matched capture group after the non-matched capture group.

From an interface perspective, this means that I have to check for two representations to test whether an optional capture group was matched.

The test script demonstrates such a case.

When considering a fix, the first priority should be that whatever the representation is (NULL, an empty string or a missing element.),  it should be consistent, no matter if there is a matched capture group after or not.

If this cannot be fixed due to backwards compatibility Issues. Could we add a PREG_KEEP_NONMATCHES or PREG_SET_ORDER_2 flag?


Test script:
---------------
<?php

preg_match_all("#(a)?(b)(c)?#","b",$matches,PREG_SET_ORDER);

var_dump($matches);

Expected result:
----------------
array(1) {
  [0] => array(3) {
    [0] => string(1) "b" [1] => string(0) "" [2] => string(1) "b" [3] => string(0) ""
  }
}

Actual result:
--------------
array(1) {
  [0] => array(3) {
    [0] => string(1) "b" [1] => string(0) "" [2] => string(1) "b"
  }
}

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-01-16 14:38 UTC] [email protected]
-Status: Open +Status: Duplicate -Package: *Regular Expressions +Package: PCRE related
 [2017-01-16 14:38 UTC] [email protected]
Duplicate of bug #61780, fixed in master. (unmatched groups will be NULL)
 [2017-01-16 14:40 UTC] [email protected]
Not sure if it's quite a duplicate. IIRC we now use null instead of "" if unmatched, but we still don't fill up null values at the end.
 [2017-01-16 14:57 UTC] [email protected]
-Status: Duplicate +Status: Feedback
 [2017-01-16 14:57 UTC] [email protected]
Hmm, yes, I misunderstood this report: 61780 is about turning empty strings from unmatched subpatterns into NULLs while this is about *adding* trailing unmatched subpatterns.
I don't see anything in the 61780 changes that clearly indicate this is also fixed, but some parts look like they might do it accidentally. Can anyone check? (It'd be nice if 3v4l could do master too...)
 [2017-01-16 15:50 UTC] [email protected]
-Summary: Preg_match_all should return empty strings on optional capture groups. +Summary: Preg_match_all should return NULLs on trailing optional capture groups. -Status: Feedback +Status: Open
 [2017-01-16 15:50 UTC] [email protected]
Actually I can. Yay for Linux on Windows!

~/php/master/bin# ./php
<?php var_dump(preg_match('/(a)?(b)?(c)?/', 'b', $matches), $matches);
int(1)
array(3) {
  [0]=>
  string(1) "b"
  [1]=>
  NULL
  [2]=>
  string(1) "b"
}

[1] is clearly NULL but there is no [3] for the unmatched (c)? group.
 [2017-01-17 13:28 UTC] [email protected]
-Status: Open +Status: Analyzed
 [2017-01-17 13:28 UTC] [email protected]
For PREG_PATTERN_ORDER unmatched subpatterns are already added[1];
something like that would also have to be done for PREG_SET_ORDER.

However, that would cause BC issues, and adding another flag isn't
appealing to me. I also think that NULL and unset are close enough
to stick with the current behavior for now; checking `isset()`
would suffice.

[1] <https://github.com/php/php-src/blob/PHP-7.1.0/ext/pcre/php_pcre.c#L862-L871>
 [2017-01-23 15:48 UTC] tomasyorke at hotmail dot com
As noted by cmb, this bug does not appear with the PREG_PATTERN_ORDER flag.
Thus I switched the flag and my code to use PREG_PATTERN_ORDER and implemented my own preg_set_order() function.
But I found a nearly identical bug with the PREG_OFFSET_CAPTURE flag.

Consider the code

  preg_match_all("#(a)?(b)(c)?#","b",$matches,PREG_OFFSET_CAPTURE);
  var_dump($matches);

Returns

  array(4) {
    [0] => array(1) {
      [0] => array(2) {
        [0] => string(1) "b" [1] => int(0)
      }
    } [1] => array(1) {
      [0] => array(2) {
        [0] => string(0) "" [1] => int(-1)
      }
    } [2] => array(1) {
      [0] => array(2) {
        [0] => string(1) "b" [1] => int(0)
      }
    } [3] => array(1) {
      [0] => string(0) ""
    }
  }

As you can see, the third optional capture group returns something different than the first optional capture group.
This is too a bug. Again, I don't care what representation is chosen. But it must be the same for any optional capture no matter if trailing or not.
 [2017-01-23 15:59 UTC] tomasyorke at hotmail dot com
Regarding BC
The proposed change would be to change the representation of missed capture groups  from 2 representations to 1 representation.
The only code that would break is code that supported only the more obscure representation that only appears with trailing groups. And if that code exists, then it will still break on non trailing missing groups even without the change.
 [2017-01-23 17:17 UTC] [email protected]
Regarding the BC break: when fixing <https://bugs.php.net/61780> I
would have preferred to unset unmatched captures, but that
appeared too much of a BC break (consider somebody is count()ing
the $matches). Therefore I changed the empty strings to NULL.

Adding unmatched trailing captures as NULL would still have the
same BC concerns – I don't think we can do that before PHP 7.2,
and even that might require the RFC process.
 [2017-01-23 17:43 UTC] tomasyorke at hotmail dot com
I'm glad we didn't go the unset route. Having fixed sized arrays is useful for looping by groups/matches independently of the Flag used.
You also can very easily know how many groups/matches the regex returned/contained.
By making the output array dynamicly sized. You only allow for an easy way to determine one of these things. While making the other more complex.
 [2017-01-23 17:53 UTC] tomasyorke at hotmail dot com
If you wish to delay this change until 7.2, that sounds fine.
But why would you need an RFC for a bugfix?
 [2017-01-23 18:31 UTC] [email protected]
> But why would you need an RFC for a bugfix?

An RFC might be over the top, but at least a PR to get some
attention appears to be appropriate.
 [2018-04-14 14:42 UTC] [email protected]
Since we now have PREG_UNMATCHED_AS_NULL, the solution appears to
be straight forward: fill up NULL values at the end, only if
PREG_UNMATCHED_AS_NULL is used.  Since this option is only
available as of PHP 7.2.0, the resulting BC break seems to be
acceptable.
 [2018-04-14 14:47 UTC] [email protected]
The filling should happen with or without PREG_UNMATCHED_AS_NULL. What it should control whether the filler is NULLs or empty strings.
 [2018-04-14 15:08 UTC] [email protected]
@requinix: That's impossible for BC reasons. I agree with @cmb that this should only be done for PREG_UNMATCHED_AS_NULL.
 [2018-06-09 22:51 UTC] php at tecnopolis dot ca
For those stuck on older PHP's, I'm wondering if the following is sufficient to work around this bug?

$pattern.='(.??)';
preg_match($pattern,$haystack,$matches);
array_pop($matches);

I can't yet find any instance where the fairly innocuous (.??) could possibly cause a problem, it should match in all cases, when there's chars left or not, and being non-greedy will not eat earlier matches' chars.
 [2019-03-19 13:04 UTC] [email protected]
-Status: Analyzed +Status: Assigned -Assigned To: +Assigned To: nikic
 [2019-03-19 13:04 UTC] [email protected]
PR for PHP 7.4: https://github.com/php/php-src/pull/3964
 [2019-03-19 14:39 UTC] [email protected]
> But I found a nearly identical bug with the PREG_OFFSET_CAPTURE flag.

Fixed this issue with https://github.com/php/php-src/commit/f53e7394eb61085c09928b47f84123bd90b64dfb on the PHP 7.4 branch.
 [2019-03-21 09:23 UTC] [email protected]
-Status: Assigned +Status: Closed
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jun 11 03:01:26 2025 UTC