Enhanced Keywords #1382

manticore-projects · 2021-10-18T14:43:16Z

Rewrite the Reserved Keywords Logic:

Automatically Derive all Keyword Tokens from the Parser.
Explicitly add Reserved Keywords to the Grammar and classify, why/when it is reserved.
Auto-generate the list of Allowed Keywords as Difference of 1) and 2) above
Provide Gradle task to semi-auto generate RelObjectNameWithoutValue() based on 3)
Parametrized Tests for Keywords, testing all keywords

Advantage:
a) we have now a clear documentation which Keywords are reserved for what reason, with proper tests
b) allowed keywords are generated (semi-)automatically, especially when more Tokens are added to the Grammar. New Tokens won't break SQL Statements, which have been parsed fine before.

To-Dos:
c) Composite Tokens do not work well and still need to be added manually to ALL_KEYWORDS (we would need to refactor the GRAMMAR in order to avoid such composite tokens)
d) @wumpz: Clarify the meaning/purpose of the various RelObjectNamexxxx() Productions, so we can auto generate them too.
e) Use the Gradle task to fully inject the RelObjectNamexxxx() Productions (instead of manually updating the code)

Resolves one more Special Oracle Test.
Fixes #1148
Fixes #1450
Fixes #1443
Fixes #1462
Fixes #1508
Fixes #1538

Add Keywords and document, which keywords are allowed for what purpose

wumpz

Again, are all those keyword additions to relobjectnames needed? You talked about all those edge case improvements. Are most of those additions not exactly that?

src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt

wumpz · 2021-10-20T22:00:04Z

src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt

@@ -1733,7 +2007,7 @@ Table TableWithAlias():
    Alias alias = null;
 }
 {
-    table=Table() [alias=Alias() { table.setAlias(alias); }]
+    table=Table() [ LOOKAHEAD(2) alias=Alias() { table.setAlias(alias); }]


It should be the aim to do less not more LOOKAHEADs. All this for some keywords?

Yes, all this for some keywords (although the Alias() Production is affected mostly)
But why exactly would you deny Tokens, which are not defined in the holy SQL:2016 standard as keywords when there is no compelling technical reason ?

Complaints about keywords are felt the second most concern of the users, right after quoting/escaping text and objects.

src/test/java/net/sf/jsqlparser/statement/KeywordsTest.java

Derive All Keywords from Grammar directly Generate production for Object Names (semi-) automatically Add parametrized Keyword Tests

manticore-projects · 2021-11-01T13:22:32Z

@wumpz: Please check out the great work on PR #1254. It adds a new token QUICK and which of course should not become a Reserved Keyword.

Of course the author @OlivierCavadenti has no chance to know, that he would need to add QUICK to the productions RelObjectNamexxxx() in order to enable this token as Object Identifier. (And how should he, I figured that out only 6 month after my first contribution.)

This PR would solve this challenge reliably, preventing new tokens from breaking valid statements and ease the entry for new contributors.

Parallel Test execution Gradle Caching Explicitly request for latest JavaCC 7.0.10

# Conflicts: # src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt

Update keywords

wumpz · 2021-11-28T21:58:20Z

build.gradle

@@ -227,6 +242,13 @@ task renderRR() {
        } 
    }
 }
+
+task updateKeywords(type: JavaExec) {


Since I do not use gradle, what does this do?

Maven is the main build engine. You changed some gradle build options. Does the maven build still run?

It generates and prints the Source Code Text of the method RelObjectName() which you would manually insert/replace in the JSQL Grammar file.

This step is optional and does not affect the Maven build. It is executed manually and on demand only.

Although the long term goal was to have a mechanism inside the build tool (Gradle and/or Maven), which during the build:

analyses the KeyWords automatically

modifies the Grammar file (methods RelObjectName() and friends)

builds the Parser with the modified Grammar

runs the Keyword Tests

wumpz · 2021-11-28T22:00:27Z

The naming of this RelObjectName ... methods is sure an issue that should be adressed in futher improvements. So sorry for that: its grown ... ;)

wumpz · 2021-11-28T22:04:07Z

I am not a fan of injecting those keywords, since not every keyword could be introduced this way and needs other parts of the grammar to be modified. However, building this keyword testing using JUnit 5 is cool.

manticore-projects · 2021-11-29T07:59:27Z

The naming of this RelObjectName ... methods is sure an issue that should be adressed in futher improvements. So sorry for that: its grown ... ;)

You would do me a great favor by documenting the purpose of those 5 methods verbosely. I tried my best to figure it out by trial'n error but I am still not exactly sure why we ended up with 5 methods. (And no worries about grown code.)

manticore-projects · 2021-11-29T08:09:23Z

I am not a fan of injecting those keywords, since not every keyword could be introduced this way and needs other parts of the grammar to be modified. However, building this keyword testing using JUnit 5 is cool.

Honestly, I am not a big fan of that PR either -- but after spending many days on that matter trying various things it was the best approach I found and in my opinion it is still much better than the current state:

at the moment, Keywords are just a mess and we get a lot of issues about keywords only.
Are you able to document right now, what keywords are reserved and for what reason? I do not think so and the PR would solve this documentation problem.
today, even when you have an opinion on keywords, we have no parametrized tests.
You kindly admit, that these tests are kind of cool and I appreciate your feedback. Those parametrized tests however depend on a well defined list of keywords, which the PR provides. (Although we can argue of course, how/where exactly such a list should be defined.)

not every keyword could be introduced this way and needs other parts of the grammar to be modified

While that is true for the PR, it is also true by now already. Also we could easily enforce a policy of "Keywords are defined by simple Tokens" (not allowing complex Tokens for Keywords).

The work flow is to: a) define the Tokens and Productions in the Grammar first and then b) document any new RESERVED Keywords only in the List (emphasis on RESERVED).

The PR has one big advantaged: By defining RESERVED keywords only, it will determine automatically any allowed Keywords and modify the Grammar semi-automatically.

manticore-projects · 2021-11-29T08:15:39Z

Allow me to ask please:

what was your preferred way to manage and document RESERVED KEYWORDs
what was your preferred way to maintain any allowed Token/Keyword in RelObjectName()
what was your preferred way to test all the possible Keywords (automatically)

And can we have a Skype chat about this :-)

wumpz · 2022-04-09T21:51:20Z

I took some days to think about this again. The goal is to automate all allowed keywords tests. Right? Maybe the documentation.

So this externalization of keywords in getReservedKeywords is a helper method to build parameterized tests for all this stuff. But this results in this weird workflow regenerating the grammer.

Wouldn't it be better to parse the grammar file and simply extract those list the getReservedKeywords provides? After that all automated tests could be done like you propose and no regenerating the grammar is needed.

By the way, renaming all RelObjectNames methods is a must.

manticore-projects · 2022-04-09T22:02:18Z

I took some days to think about this again. The goal is to automate all allowed keywords tests. Right? Maybe the documentation.

Yes, and also: to automatically allow all Tokens, that are not reserved keywords (especially after adding new tokens).

So this externalization of keywords in getReservedKeywords is a helper method to build parameterized tests for all this stuff. But this results in this weird workflow regenerating the grammer.

No. getReservedKeywords is needed for knowing, which tokens to allow in RelObjectName() and friends.

Wouldn't it be better to parse the grammar file and simply extract those list the getReservedKeywords provides? After that all automated tests could be done like you propose and no regenerating the grammar is needed.

I think, you are still miss the main point: Tests and Documentation are nice side products.
The main goal though is to Allow the permitted Tokens (as difference between All Tokens and the Reserved Keywords), whenever new Tokens are added.

By the way, renaming all RelObjectNames methods is a must.
Fully agreed.

Can we have a call on this topic maybe? I feel we could clarify the approach within a few minutes when we just talk about it.

manticore-projects · 2022-04-09T22:11:10Z

So this externalization of keywords in getReservedKeywords is a helper method to build parameterized tests for all this stuff. But this results in this weird workflow regenerating the grammer.

Actually, not much has changed: in 99% of all cases, you just build as before. Only when a new Token has been defined, you would run gradle buildKeywords first and even then it is optional (at the cost of Allowed Tokens and Documentation).

wumpz · 2022-04-09T23:08:27Z

You are right, but if we read out those keywords directly from the grammar file we never need to run this buildKeywords because always all tests use the actual list. As always the documentation should be much more improved as well as the source code documentation (if I may make this difference).

# Conflicts: # build.gradle # src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt # src/test/java/net/sf/jsqlparser/statement/select/SpecialOracleTest.java

# Conflicts: # build.gradle # ruleset.xml

- read Tokens per REGEX Matcher - move Reserved Keywords from Grammar into ParserKeywordsUtils - adjust the Tests

manticore-projects · 2022-04-23T06:10:15Z

Greetings.

I have modified the approach and read the Tokens directly from the Grammar file without invoking JTREE.
Thus all the Keywords have been moved our of the Grammar into ParserKeywordUtils and we spare the redundant run.

It handles also the Composite Tokens.

Cheers.

wumpz · 2022-04-24T22:52:36Z

I did not look into your modifications, but I think I managed to parse the grammar using JavaCCs classes. (javacc/javacc#221)

wumpz · 2022-04-24T23:09:31Z

So if I understand your modifications right, you are scanning for all tokens defined in the token part. These different RelObjectName productions are used for different purposes. So depending on in which a token is used the tests should be different right?

manticore-projects · 2022-04-25T09:24:26Z

I did not look into your modifications, but I think I managed to parse the grammar using JavaCCs classes. (javacc/javacc#221)

Yes, I saw that. Thank you for your effort.
I am not sure, if this is better or worse than Parsing the Grammar per Regex because again, your engage the JavaCC Parser (now only from inside Java instead of from Gradle/Maven Plugin before).

At the same time i really do not care much as long as we get the Content of all tokens reliably (which seems to work both ways).

So if I understand your modifications right, you are scanning for all tokens defined in the token part.
Yes, but as per REGEX (because I do not want to rely on the K_ or S_ prefixes and also want to get the keywords of the compound tokens.

These different RelObjectName productions are used for different purposes. So depending on in which a token is used the tests should be different right?

Yes, thats why there are two tests sofar.

wumpz · 2022-05-11T21:27:09Z

So these two tests are enough? So we could merge some of those RelObjectName productions?

How we extract the keywords does not matter that much. What matters to me is the direct use of keywords without rebuilding a list, rebuilding the grammar building the parser, and so on.

However, to read out all keywords correctly one has to rebuild this part of JavaCCs grammar parser, so why not reuse the original and not having troubles with some incomplete regular expressions?

manticore-projects · 2022-05-12T15:45:56Z

So these two tests are enough? So we could merge some of those RelObjectName productions?

I do see this PR as a first step to move us into a better direction. Certainly more fine tuning will need to follow. Although I feel like this PR is quite massive already and I would like to see a Merge first, before touching anything else.

This particular development step marries the best out of both worlds:

it demonstrates the idea and the benefits
it is still fully compatible with what has been there before and does not break anything

Please understand, that maintaining multiple PRs in parallel is quite a nightmare for any contributor and a lot of effort goes into keeping up with origin/master, depending on your Merge order.

How we extract the keywords does not matter that much. What matters to me is the direct use of keywords without rebuilding a list, rebuilding the grammar building the parser, and so on.

However, to read out all keywords correctly one has to rebuild this part of JavaCCs grammar parser, so why not reuse the original and not having troubles with some incomplete regular expressions?

Fully agreed, I would provide a similar method using JavaCC instead of the RegEx (so we could switch on/off later on demand).
JavaCC would be the default, but we could still use the RegEx whenever we would hit a snag.

Add some tests to ensure, that all Keywords or found

manticore-projects · 2022-05-13T14:54:38Z

I implemented they getAllKeywords() method based on JTree and got it working.
Although there are pros and cons:

We really get ALL keywords even from when not declared as a Token -- that's awesome!
The implementation is quite complex and verbose
We carry now JavaCC as a dependency, although we needed it only for the compilation -- I do not like that at all, but have no good idea what to do about that.

Including it into compileJavacc won't work since it depends on compiling the ParserKeywordUtils.java Single file compilation did not work

wumpz

I still do not get, why you want to hold this keyword list. Since we could extract all keywords, like you mentioned, and read all RelObj .. methods, we have all lists we need to get the reserved keyword list directly from the grammar without rewriting it.

The dependency of JavaCC is only needed for tests. So this ParserKeywordsUtils list should be a test class only.

Sure as I already mentioned, the RelObj - methods needs to be renamed to better reflect there meaning and use.

wumpz · 2022-05-21T23:33:59Z

src/test/java/net/sf/jsqlparser/statement/ConditionalKeywordsTest.java

@@ -37,7 +37,7 @@ public static Stream<String> keyWords() {
        List<String> keywords = new ArrayList<>();
        try {
            try {
-                keywords.addAll( ParserKeywordsUtils.getAllKeywords(file) );
+                keywords.addAll( ParserKeywordsUtils.getAllKeywordsUsingRegex(file) );


So you use the JavaCC parser to just verify what your regex extracts? Why?

It took me a bit of effort, to extract all keyword tokens based on your initial stub. I used the previous Regex generated List as reference. When both lists returned the same, I got confidence, that I got it working.

However, the JavaCC approach was based on trial an error and is not documented. So having a Test in the future might prove useful.

See it as the equivalent of you toString() vs DeParser -- it is redundant, but provides a great way to test the correctness.

wumpz · 2022-05-21T23:33:59Z

src/main/java/net/sf/jsqlparser/parser/ParserKeywordsUtils.java

+    @SuppressWarnings({"PMD.ExcessiveMethodLength"})
+    public static List<String> getReservedKeywords(int restriction) {
+        // Classification follows http://www.h2database.com/html/advanced.html#keywords
+        Object[][] ALL_RESERVED_KEYWORDS = {


Couldn't thoses lists directly derived from the grammar? At least RESTRICTED_ALIAS?

It is the other way around: This List will define, how we rewrite RelObjectName.... This List will be defined manually and everything else follows it.

manticore-projects · 2022-05-22T07:36:29Z

I still do not get, why you want to hold this keyword list.

We want to:

Create/Rewrite RelObject.... automatically during the build process (at least whenever new tokens/productions have been added to the Grammer)

We will use the difference between ALL keywords and RESERVED keywords for this rewrite.
ALL keywords parsed from the Grammer, RESERVED keywords from the manually defined list.

Since we could extract all keywords, like you mentioned, and read all RelObj .. methods, we have all lists we need to get the reserved keyword list directly from the grammar without rewriting it.

It is the other way around: we get the ALLOWED keywords and rewrite RelObjectNames... accordingly.

Sure as I already mentioned, the RelObj - methods needs to be renamed to better reflect there meaning and use.

Certainly, I have asked you above for an explanation, what you had in mind when introducing the 4 methods. (The first 2 are clear to me, but I struggle to understand the methodology behind ...Ext and ...Ext2).

manticore-projects added 2 commits Oct 18, 2021

Enhanced Keywords

834dab1

Add Keywords and document, which keywords are allowed for what purpose

Fix incorrect tests

7a885a1

manticore-projects mentioned this pull request Oct 19, 2021

Unexpected token: apply #1148

Open

wumpz reviewed Oct 20, 2021

View changes

manticore-projects added 3 commits Oct 23, 2021

Merge remote-tracking branch 'origin/master' into Keywords

1b69d25

Define Reserved Keywords explicitly

5d20d06

Derive All Keywords from Grammar directly Generate production for Object Names (semi-) automatically Add parametrized Keyword Tests

Fix test resources

df9c56e

OlivierCavadenti mentioned this pull request Nov 1, 2021

Add Delete / Update modifiers for MySQL #1254 #1396

Merged

manticore-projects added 10 commits Nov 22, 2021

Adjust Gradle to JUnit 5

de805ab

Parallel Test execution Gradle Caching Explicitly request for latest JavaCC 7.0.10

Do not mark SpeedTest for concurrent execution

d5a6dca

Remove unused imports

a9d0503

Adjust Gradle to JUnit 5

6dfa05f

Parallel Test execution Gradle Caching Explicitly request for latest JavaCC 7.0.10

Do not mark SpeedTest for concurrent execution

8f0bfe6

Remove unused imports

5cd0974

Merge remote-tracking branch 'origin/master'

5f11d3f

Merge remote-tracking branch 'origin/Keywords' into Keywords

2af3cd5

Merge remote-tracking branch 'origin/master' into Keywords

4bef952

# Conflicts: # src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt

Keyword test adopt JUnit5

1fd56ac

Update keywords

manticore-projects requested a review from wumpz Nov 28, 2021

wumpz reviewed Nov 28, 2021

View changes

Merge Master

873dfd4

CheckStyle sanitation of method names

1a9db26

Merge https://github.com/JSQLParser/JSqlParser

3ab04b0

manticore-projects mentioned this pull request Dec 30, 2021

select database() #1450

Open

manticore-projects added 2 commits Apr 9, 2022

Appease Codacy

74c1f71

Remove broken rule warning about perfectly fine switch-case statements

ba9992e

manticore-projects mentioned this pull request Apr 11, 2022

where table_name = select database() has exception #1508

Open

manticore-projects added 5 commits Apr 15, 2022

Merge remote-tracking branch 'origin/master' into Keywords

7cad82e

# Conflicts: # build.gradle # src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt # src/test/java/net/sf/jsqlparser/statement/select/SpecialOracleTest.java

Force Changes

a19b963

Fix Merge Issues

8edf488

Merge remote-tracking branch 'manticore/Keywords' into Keywords

fbf0c8b

# Conflicts: # build.gradle # ruleset.xml

Read Tokens directly from the Grammar File without invoking JTREE

83b8a03

- read Tokens per REGEX Matcher - move Reserved Keywords from Grammar into ParserKeywordsUtils - adjust the Tests

Appease PMD/Codacy

452e8a5

Merge Origin/Master

bde4da3

Extract the Keywords from the Grammar by using JTRee (instead of Regex)

0ce18dd

Add some tests to ensure, that all Keywords or found

manticore-projects added 6 commits May 13, 2022

Appease Codacy/PMD

546fe16

Merge origin/master

81c299f

Separate UpdateKeywords Task again

83166d2

Including it into compileJavacc won't work since it depends on compiling the ParserKeywordUtils.java Single file compilation did not work

Clean-up the imports

f4284a0

Add JavaCC dependency to Maven for building ParserKeywordsUtils

cd462e5

Add JavaCC dependency to Maven for building ParserKeywordsUtils

a86edf5

wumpz reviewed May 21, 2022

View changes

JSQLParser / JSqlParser Public

Enhanced Keywords #1382

Enhanced Keywords #1382

manticore-projects commented Oct 18, 2021 •

edited

wumpz left a comment

wumpz Oct 20, 2021

manticore-projects Oct 21, 2021

manticore-projects commented Nov 1, 2021 •

edited

wumpz Nov 28, 2021

manticore-projects Nov 29, 2021

wumpz commented Nov 28, 2021

wumpz commented Nov 28, 2021

manticore-projects commented Nov 29, 2021

manticore-projects commented Nov 29, 2021 •

edited

manticore-projects commented Nov 29, 2021

wumpz commented Apr 9, 2022 •

edited

manticore-projects commented Apr 9, 2022

manticore-projects commented Apr 9, 2022

wumpz commented Apr 9, 2022

manticore-projects commented Apr 23, 2022 •

edited

wumpz commented Apr 24, 2022

wumpz commented Apr 24, 2022

manticore-projects commented Apr 25, 2022

wumpz commented May 11, 2022

manticore-projects commented May 12, 2022 •

edited

manticore-projects commented May 13, 2022

wumpz left a comment

wumpz May 21, 2022

manticore-projects May 22, 2022

wumpz May 21, 2022

manticore-projects May 22, 2022

manticore-projects commented May 22, 2022

Apr	MAY	Dec
	25
2021	2022	2023

JSQLParser / JSqlParser Public

Enhanced Keywords #1382

Are you sure you want to change the base?

Enhanced Keywords #1382

Conversation

manticore-projects commented Oct 18, 2021 • edited

wumpz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manticore-projects commented Nov 1, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wumpz commented Nov 28, 2021

wumpz commented Nov 28, 2021

manticore-projects commented Nov 29, 2021

manticore-projects commented Nov 29, 2021 • edited

manticore-projects commented Nov 29, 2021

wumpz commented Apr 9, 2022 • edited

manticore-projects commented Apr 9, 2022

manticore-projects commented Apr 9, 2022

wumpz commented Apr 9, 2022

manticore-projects commented Apr 23, 2022 • edited

wumpz commented Apr 24, 2022

wumpz commented Apr 24, 2022

manticore-projects commented Apr 25, 2022

wumpz commented May 11, 2022

manticore-projects commented May 12, 2022 • edited

manticore-projects commented May 13, 2022

wumpz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manticore-projects commented May 22, 2022

manticore-projects commented Oct 18, 2021 •

edited

manticore-projects commented Nov 1, 2021 •

edited

manticore-projects commented Nov 29, 2021 •

edited

wumpz commented Apr 9, 2022 •

edited

manticore-projects commented Apr 23, 2022 •

edited

manticore-projects commented May 12, 2022 •

edited