Parentheses within custom match regexp #95

dizlexik · 2016-11-04T17:48:27Z

I'd like to be able to define groups within my custom match regexp like so:
/:foo(\\d+(\\.\\d+)?)

Which I would expect to match both of these paths:
/123
/123.123

Currently this doesn't seem to work as it appears to be parsing the inner parentheses as a new key. Any ideas on how I could achieve something like this?

My workaround for the moment is to define two separate paths like /:foo(\\d+) and /:foo(\\d+\\.\\d+), but I'd love to just have one like I defined above.

blakeembrey · 2016-11-04T18:55:23Z

You can't define groups within groups at this time as there's no defined way this should act. If you can come up with a proposal for how it behaves, we can look at implementing support for it.

dizlexik · 2016-11-05T17:10:28Z

I don't think there's any reason for groups within groups to have any special behavior outside of their usefulness for doing things like what I described in my original comment or, for another contrived example, like /:foo(b(ar|az)) to match /bar or /baz, etc.

I don't think inner groups should be parsed at all and should just be left alone as part of the custom regexp. Does that make sense?

blakeembrey · 2016-11-05T18:23:07Z

Not really. What do we do with the matches inside the parentheses? The example above would create two matches. What about other types of matching groups that don't have output in the RegExp? How are unbalanced brackets handled? You are welcome to submit a PR if you know how it should work.

Edit: I'd love to see it work, but the reason it does not exist is because it's not clear how/why.

dizlexik · 2016-11-05T18:29:36Z

I just spent a bit of time attempting to put together a pull request for this but it turns out it's probably going to be really difficult.

I found the regex that needs to be modified to accomplish what I've described and the portion of it that will require modification \$((?:\\\\.|[^\\\\()])+)\$

The problem is JavaScript's lack of recursive regex support so detecting inner parentheses will either mean drastically increasing the length and complexity of PATH_REGEXP to support an arbitrary number of levels of nested parentheses or introducing a new project dependency like XRegExp and utilizing its matchRecursive method to support unlimited levels. I think I could implement the former but it doesn't feel like a great solution. Using XRegExp seems like it could work but the decision to pull in a new dependency for this somewhat-edge case isn't mine to make.

It sure would be nice if this were supported but I can see why it might make more sense, at least for now, to leave it as it is and maybe include a note in the README about this limitation.

By the way I wrote a test for this case while working on my pull request. I'll include it here in case someone wants to move forward with this at some point:

[
  '/:test(\\d+(\\.\\d+)?)',
  null,
  [
    {
      name: 'test',
      prefix: '/',
      delimiter: '/',
      optional: false,
      repeat: false,
      partial: false,
      asterisk: false,
      pattern: '\\d+(\\.\\d+)'
    }
  ],
  [
    ['/123', ['/123', '123']],
    ['/abc', null],
    ['/123/abc', null],
    ['/123.123', ['/123.123', '123.123']],
    ['/123.abc', null]
  ],
  [
    [{ test: 'abc' }, null],
    [{ test: '123' }, '/123'],
    [{ test: '123.123' }, '/123.123'],
    [{ test: '123.abc' }, null]
  ]
]

blakeembrey · 2016-11-05T18:34:57Z

Yeah, it's tricky. The problem really comes from internal group matches. There's no clearly defined way of how the internal match would interact with the current support for tokens. It's also not clear how things like the reverse path-to-regexp would work either as a result. The clearest thing here is to make them non-matching groups, but that's a little opinionated too because maybe someone actually wants internal matches to be part of the result. The easiest way around it today would be to put the two matches next to each other as separate groups, but I'd love to find the correct way to support it how people want natively.

Edit: The above example is a good case. Right now, keys is based on what you have above - but the above pattern has two many matches and causes a mismatch on results to keys.

dizlexik · 2016-11-05T18:48:32Z

Yeah I was assuming they would be treated as non-matching somehow. I realize I'm probably coming off as really naive with this request but thanks for humoring me! You've clearly put a lot of thought into this yourself. I'll keep an eye on this to see where it goes, but thanks so much for all your hard work! Aside from this incredibly minor nitpick the library is amazing :)

blakeembrey · 2016-11-05T18:52:25Z

I wouldn't worry about the implementation, we can always make that work. The harder thing right now is deciding how it would work. Should internal brackets create a matching group? If they do, do we add a token to the list? If we add a token to the list, how do people use the reverse "create a path" from the tokens? I think that's a dead end, so a non-matching group is probably a better idea. In that case, we can replace ( with (?: so it becomes non-matching. After that, the next question is probably - how do we treat the outer "primary" parens? If the internal ones work with (?! and (?=, why wouldn't the outer ones?

I just know the issues around it, not the solutions, sorry 😄 Maybe the best way is just to allow (?:, (?= and (?= moving forward and add something to the tokens to indicate they are non-matching (not a key). Or we just keep it as is and leave those tokens for internal use only.

dizlexik · 2016-11-16T05:36:49Z

FWIW I'd be perfectly content with the non-matching group solution. The implementation is a little beyond me though since I think there's still the issue of balancing parentheses.

plandem · 2017-03-27T21:54:16Z

damn...I needed something like you requested: /(list|edit/\d+) and can't solve it :(

Woodya · 2017-05-05T18:03:24Z

@blakeembrey are you planning to add support for non-capturing groups?
those are quite integral to our use case. We're evaluating react-router, which is using this package and this is a bit of a blocker for a "smooth" transition.

blakeembrey · 2017-05-06T00:43:45Z

I haven't any plans, but can take another look at it. I'd also accept a PR that just ignores non-matching groups.

sorahn · 2017-09-26T18:42:05Z

@plandem when using React Router recently (which just hands it's path to this library) I discovered that this works for /foo and /bar

<Route path="/:_(foo|bar)" component={FooBar} />

And at this point I don't know it's a bug or a feature.

Edit:
Oh I see what happened.
Looking here: https://github.com/pillarjs/path-to-regexp#custom-match-parameters I can see that what I did was create a custom param named _ that only matched that regex. Interesting.

sergeysova · 2018-02-09T21:10:40Z

Any progress here?

sergeysova · 2018-02-09T21:15:09Z

I want something like:

/:username{/|/i/:image}

match:

/somename
/somename/i/imagename

don't match:

/somename/asdasd

blakeembrey · 2018-02-20T06:51:10Z

@sergeysova Why don't you use an array or a regular expression? You can also do /:username/(i/[^/]+)? but I'd advocate for the array as it's easier to understand.

sergeysova · 2018-02-20T09:24:47Z

@blakeembrey I want have named parameter :image

And more subroutes will add under /:username/

blakeembrey · 2018-03-07T05:34:37Z

@sergeysova Try an array? ['/:username', '/:username/i/:image']?

crdrost · 2018-06-27T15:11:57Z

I am surprised that the primary hangup here is the semantics. To me the semantics are obvious:

Internal groups never propagate up. That is, any containing (___) is rewritten to (?:___). Anything else is likely to be unbelievably confusing to everyone involved and there is provably no loss in generality.
- Suppose you are a consumer of this library and you do want to capture something into its own group, your logic will be much easier to understand if you would be to use path-to-regexp with a path that looks like ':param(' + myRegexString+ ')' and then afterwords run new RegExp(myRegexString).exec(results.param) to extract the group. Any 'convenience' that we try to provide beyond this is going to confuse the readers of their code -- "wait what does THAT do?" -- when the alternative is to force them to write two more lines of code whose meanings are obvious to any reader. Like the only downside I can see is that TypeScript etc will not be able to detect that the match results are not null, but it's got various "I know it's not null I swear!" constructs so I'm not concerned.
The outer parens are part of the URL grammar, not the regexp. Writing :param(?=abc) throws a SyntaxError when the child RegExp is compiled because ? cannot be the first character of any valid RegExp, likewise with ?! and ?:.

I am under a bit of a deadline crunch but I might be able to submit a PR which parses JS regexp along these lines, if they're acceptable to you?

crdrost · 2018-06-27T15:48:03Z

Also if I don't get to this ever, I wanted to mention for anyone reading this thread that unless you are repeating something in parentheses indefinitely, you probably do not need () in your regexp.

Just to break down the cases in this, the first comment starts with the regexp \d+(\.\d+)?. First observe that (xyz)? is the same as (|xyz), so this is \d+(|\.\d+). Now distribute that prefix over the rest, to get \d+|\d+\.\d+.

A more complicated case: if you wanted to match a JSON float the regex for that is apparently: -?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?. Yuck. You can see that it has three groups, two with ? and one with |. But no indefinite repetition of a parenthesized component! So you can go through the 2^3 = 8 cases:

-?0
-?0[eE][+-]?\d+
-?0\.\d+
-?0\.\d+[eE][+-]?\d+
-?[1-9]\d*
-?[1-9]\d*[eE][+-]?\d+
-?[1-9]\d*\.\d+
-?[1-9]\d*\.\d+[eE][+-]?\d+

Putting these together with | yields a regex without ().

A language with support for list comprehensions can do a lot of this for you, e.g. in Python you can convert (abc|def|ghi) to the tuple ("abc", "def", "ghi"), as mentioned above you can convert (abc)? to (|abc) to ("", "abc") as well, and finally you can combine them with the list comprehension:

> print('|'.join(a + b + c + d 
    for a in ("-?",)
    for b in ("0", "[1-9]\d*")
    for c in ("", "\.\d+")
    for d in ("", "[eE][+-]?\d+")))
-?0|-?0[eE][+-]?\d+|-?0\.\d+|-?0\.\d+[eE][+-]?\d+|-?[1-9]\d*|-?[1-9]\d*[eE][+-]?\d+|-?[1-9]\d*\.\d+|-?[1-9]\d*\.\d+[eE][+-]?\d+

blakeembrey · 2018-06-27T20:12:11Z

You’re welcome to submit a PR, but it’s definitely not just semantics stopping an implementation. Balancing nested matching brackets is a big rewrite of the existing code, if you’re able to do it simply without many changes that’s fantastic and definitely please submit a PR. Currently, I believe the logic needs to be rewritten to support a lexer instead of using a regexp internally.

crdrost · 2018-06-27T20:15:31Z

Oh yeah definitely. The language of regular expressions is famously not regular.

MANTENN · 2018-10-06T20:23:03Z

What I found out is that if you are working with groups utilize an another set of parenthesis after. This of course will not work with nested grouping.
For example, this pattern /([F][S]|[R][A])(\d{5}) will match /FS12345 or RA12345, and it will not match FA12345.

Or use (\d+\.*\d+), and then split inside the component on the dot

garrettmaring · 2018-10-28T20:59:39Z

I believe that there is a good use case for having more control over the group: base64 validation:

^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

This cannot be done with the current restrictions on custom match groups.

crdrost · 2018-10-30T21:20:59Z

@garrettmaring yeah that {4} followed by that * does indeed seem difficult or impossible. I would be surprised if / worked in the URL since it's also a path separator, but there is always RFC4648 base64url if you don't have that.

However, I would remind you and anyone else reading that these regexes are not in my view primarily security-related and so "validation" is a little bit of a dicey subject. Like, to my mind the point of the regex is to say "this route looks like _____". The basic problem is that you try to write an API which has both GET /widgets/:id and also GET /widgets/report and then run into the problem that the API is not aware that report is anything other than an :id substitution, so you write /widgets/:id([0-9]+) to make it clear that that's a number, say.

Base64 on the other hand is so permissive that it makes this very tricky because it doesn't rule out so many other routes -- so if :id is Base64 then you see a problem with the endpoint GET /widgets/smallest because smallest is the Base64 encoding of the byte string b2 66 a5 95 eb 2d. So once you start using base 64 IDs I would really start to recommend rewriting to GET /widgets/by-id/:id so that you have a separate namespace here to avoid the route conflicts, and then you don't need the regexes anymore.

So I think you're 100% right that there is a thing you cannot do here, but even if you could do it, you would not buy much over just the regex [-A-Za-z0-9_]*=?=? to disambiguate the routes (maybe your other routes contain a ~ or something) followed by checking req.params.id.length % 4 == 0 to get the rest of the validation in the actual client code.

sorahn · 2018-10-31T17:23:31Z

@crdrost At that point, I think you'd just be looking for some application logic to do that for you right?

GET /widgets/:thing then in the handler for that, if (thing === 'reports') { // do it; return; } else { // handle thing as id (validate guid, etc) },

crdrost · 2018-11-01T15:32:43Z

Yeah, I'd say so. What's really at stake here is that URL paths come from a filesystem world but are a generic text format being used to transmit data in a very free-form format.

If we were to instead take a more structured view towards API development, where we have valid API "midpoints" as well as the endpoints when the full URL is constructed, we might see this better. So each midpoint instead comes from some sort of philosophy, "the API structures we care about are Records (well-known properties which I can access, GET /my-record/property-name), Lists (collections whose items I can access under numeric keys, so GET /my-list/123 for the 124th element), Dictionaries (collections whose items I can access under string keys, so GET /my-dict/my-key), and Tagged Unions or Enums (a structure which can be one of many well-known kinds and you have to assert what kind it is in order to use it, GET /my-union-item/as-widget failing because it was actually a foobar, but GET /my-union-item/as-foobar succeeding and granting me access to some internal foobar-style properties)."

Most people seem to get by without explicit lists or tagged unions; lists are in some sense subsumed by dictionaries but tagged unions really have their own genuine power to them.

The error that is being described above is an error of an API midpoint which does not know whether it is a Dictionary or a Record and thus is trying to be both, and the proposed solution is to make the midpoint firmly into a Record and add a property by-id which maps to the desired Dictionary.

garrettmaring · 2018-11-02T00:22:49Z

@crdrost delta! I agree that validation of that sort doesn't make sense in the URL as a regex. Thanks for the reply 👍

blakeembrey · 2019-11-11T23:52:37Z

Added support with 1327699, but opted to not transform anything people write. It'll instead throw an error and prompt you to use the non-capturing group directly. So (?: will be allowed to be nested in the next major release.

blakeembrey added the enhancement label Nov 5, 2016

blakeembrey mentioned this issue Aug 22, 2017

Slash is not optional when using multi parameters #113

Closed

LKay mentioned this issue Mar 6, 2018

Optional group with param compile #142

Open

blakeembrey mentioned this issue Nov 11, 2019

Impossible to validate a slug #205

Closed

blakeembrey closed this Nov 11, 2019

blakeembrey added a commit that referenced this issue Nov 11, 2019

Add test cases from #95

Loading status checks…

943c907

TPXP mentioned this issue Nov 21, 2019

Update/Fork path-to-regexp (v3 released Jan 13, 2019) ReactTraining/react-router#6899

Open

pillarjs / path-to-regexp

Parentheses within custom match regexp #95

Parentheses within custom match regexp #95

dizlexik commented Nov 4, 2016

blakeembrey commented Nov 4, 2016

dizlexik commented Nov 5, 2016

blakeembrey commented Nov 5, 2016 •

edited

dizlexik commented Nov 5, 2016

blakeembrey commented Nov 5, 2016 •

edited

dizlexik commented Nov 5, 2016

blakeembrey commented Nov 5, 2016 •

edited

dizlexik commented Nov 16, 2016

plandem commented Mar 27, 2017 •

edited

Woodya commented May 5, 2017

blakeembrey commented May 6, 2017

sorahn commented Sep 26, 2017 •

edited

sergeysova commented Feb 9, 2018

sergeysova commented Feb 9, 2018

blakeembrey commented Feb 20, 2018 •

edited

sergeysova commented Feb 20, 2018

blakeembrey commented Mar 7, 2018

crdrost commented Jun 27, 2018

crdrost commented Jun 27, 2018 •

edited

blakeembrey commented Jun 27, 2018

crdrost commented Jun 27, 2018

MANTENN commented Oct 6, 2018 •

edited

garrettmaring commented Oct 28, 2018

crdrost commented Oct 30, 2018 •

edited

sorahn commented Oct 31, 2018 •

edited

crdrost commented Nov 1, 2018 •

edited

garrettmaring commented Nov 2, 2018

blakeembrey commented Nov 11, 2019

Nov	DEC	Jan
	07
2019	2020	2021

pillarjs / path-to-regexp

Join GitHub today

GitHub is where the world builds software

Parentheses within custom match regexp #95

Parentheses within custom match regexp #95

Comments

dizlexik commented Nov 4, 2016

blakeembrey commented Nov 4, 2016

dizlexik commented Nov 5, 2016

blakeembrey commented Nov 5, 2016 • edited

dizlexik commented Nov 5, 2016

blakeembrey commented Nov 5, 2016 • edited

dizlexik commented Nov 5, 2016

blakeembrey commented Nov 5, 2016 • edited

dizlexik commented Nov 16, 2016

plandem commented Mar 27, 2017 • edited

Woodya commented May 5, 2017

blakeembrey commented May 6, 2017

sorahn commented Sep 26, 2017 • edited

sergeysova commented Feb 9, 2018

sergeysova commented Feb 9, 2018

blakeembrey commented Feb 20, 2018 • edited

sergeysova commented Feb 20, 2018

blakeembrey commented Mar 7, 2018

crdrost commented Jun 27, 2018

crdrost commented Jun 27, 2018 • edited

blakeembrey commented Jun 27, 2018

crdrost commented Jun 27, 2018

MANTENN commented Oct 6, 2018 • edited

garrettmaring commented Oct 28, 2018

crdrost commented Oct 30, 2018 • edited

sorahn commented Oct 31, 2018 • edited

crdrost commented Nov 1, 2018 • edited

garrettmaring commented Nov 2, 2018

blakeembrey commented Nov 11, 2019

Essential cookies

Always active

Analytics cookies

blakeembrey commented Nov 5, 2016 •

edited

blakeembrey commented Nov 5, 2016 •

edited

blakeembrey commented Nov 5, 2016 •

edited

plandem commented Mar 27, 2017 •

edited

sorahn commented Sep 26, 2017 •

edited

blakeembrey commented Feb 20, 2018 •

edited

crdrost commented Jun 27, 2018 •

edited

MANTENN commented Oct 6, 2018 •

edited

crdrost commented Oct 30, 2018 •

edited

sorahn commented Oct 31, 2018 •

edited

crdrost commented Nov 1, 2018 •

edited