JSQLParser / JSqlParser Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced Keywords #1382
base: master
Are you sure you want to change the base?
Enhanced Keywords #1382
Conversation
Add Keywords and document, which keywords are allowed for what purpose
Again, are all those keyword additions to relobjectnames needed? You talked about all those edge case improvements. Are most of those additions not exactly that?
@@ -1733,7 +2007,7 @@ Table TableWithAlias(): | |||
Alias alias = null; | |||
} | |||
{ | |||
table=Table() [alias=Alias() { table.setAlias(alias); }] | |||
table=Table() [ LOOKAHEAD(2) alias=Alias() { table.setAlias(alias); }] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the aim to do less not more LOOKAHEADs. All this for some keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all this for some keywords (although the Alias()
Production is affected mostly)
But why exactly would you deny Tokens, which are not defined in the holy SQL:2016 standard as keywords when there is no compelling technical reason ?
Complaints about keywords are felt the second most concern of the users, right after quoting/escaping text and objects.
Derive All Keywords from Grammar directly Generate production for Object Names (semi-) automatically Add parametrized Keyword Tests
@wumpz: Please check out the great work on PR #1254. It adds a new token Of course the author @OlivierCavadenti has no chance to know, that he would need to add This PR would solve this challenge reliably, preventing new tokens from breaking valid statements and ease the entry for new contributors. |
Parallel Test execution Gradle Caching Explicitly request for latest JavaCC 7.0.10
Parallel Test execution Gradle Caching Explicitly request for latest JavaCC 7.0.10
# Conflicts: # src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt
Update keywords
@@ -227,6 +242,13 @@ task renderRR() { | |||
} | |||
} | |||
} | |||
|
|||
task updateKeywords(type: JavaExec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I do not use gradle, what does this do?
Maven is the main build engine. You changed some gradle build options. Does the maven build still run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It generates and prints the Source Code Text of the method RelObjectName()
which you would manually insert/replace in the JSQL Grammar file.
This step is optional and does not affect the Maven build. It is executed manually and on demand only.
Although the long term goal was to have a mechanism inside the build tool (Gradle and/or Maven), which during the build:
- analyses the KeyWords automatically
- modifies the Grammar file (methods
RelObjectName()
and friends) - builds the Parser with the modified Grammar
- runs the Keyword Tests
The naming of this RelObjectName ... methods is sure an issue that should be adressed in futher improvements. So sorry for that: its grown ... ;) |
I am not a fan of injecting those keywords, since not every keyword could be introduced this way and needs other parts of the grammar to be modified. However, building this keyword testing using JUnit 5 is cool. |
You would do me a great favor by documenting the purpose of those 5 methods verbosely. I tried my best to figure it out by trial'n error but I am still not exactly sure why we ended up with 5 methods. (And no worries about grown code.) |
Honestly, I am not a big fan of that PR either -- but after spending many days on that matter trying various things it was the best approach I found and in my opinion it is still much better than the current state:
The work flow is to: a) define the Tokens and Productions in the Grammar first and then b) document any new RESERVED Keywords only in the List (emphasis on RESERVED). The PR has one big advantaged: By defining RESERVED keywords only, it will determine automatically any allowed Keywords and modify the Grammar semi-automatically. |
Allow me to ask please:
And can we have a Skype chat about this :-) |
I took some days to think about this again. The goal is to automate all allowed keywords tests. Right? Maybe the documentation. So this externalization of keywords in Wouldn't it be better to parse the grammar file and simply extract those list the By the way, renaming all RelObjectNames methods is a must. |
Yes, and also: to automatically allow all Tokens, that are not reserved keywords (especially after adding new tokens).
No.
I think, you are still miss the main point: Tests and Documentation are nice side products.
Can we have a call on this topic maybe? I feel we could clarify the approach within a few minutes when we just talk about it. |
Actually, not much has changed: in 99% of all cases, you just build as before. Only when a new Token has been defined, you would run |
You are right, but if we read out those keywords directly from the grammar file we never need to run this |
# Conflicts: # build.gradle # src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt # src/test/java/net/sf/jsqlparser/statement/select/SpecialOracleTest.java
# Conflicts: # build.gradle # ruleset.xml
- read Tokens per REGEX Matcher - move Reserved Keywords from Grammar into ParserKeywordsUtils - adjust the Tests
Greetings. I have modified the approach and read the Tokens directly from the Grammar file without invoking JTREE. It handles also the Composite Tokens. Cheers. |
I did not look into your modifications, but I think I managed to parse the grammar using JavaCCs classes. (javacc/javacc#221) |
So if I understand your modifications right, you are scanning for all tokens defined in the token part. These different |
Yes, I saw that. Thank you for your effort. At the same time i really do not care much as long as we get the Content of all tokens reliably (which seems to work both ways).
These different RelObjectName productions are used for different purposes. So depending on in which a token is used the tests should be different right?
|
So these two tests are enough? So we could merge some of those RelObjectName productions? How we extract the keywords does not matter that much. What matters to me is the direct use of keywords without rebuilding a list, rebuilding the grammar building the parser, and so on. However, to read out all keywords correctly one has to rebuild this part of JavaCCs grammar parser, so why not reuse the original and not having troubles with some incomplete regular expressions? |
I do see this PR as a first step to move us into a better direction. Certainly more fine tuning will need to follow. Although I feel like this PR is quite massive already and I would like to see a Merge first, before touching anything else. This particular development step marries the best out of both worlds:
Please understand, that maintaining multiple PRs in parallel is quite a nightmare for any contributor and a lot of effort goes into keeping up with origin/master, depending on your Merge order.
Fully agreed, I would provide a similar method using JavaCC instead of the RegEx (so we could switch on/off later on demand). |
Add some tests to ensure, that all Keywords or found
I implemented they
|
Including it into compileJavacc won't work since it depends on compiling the ParserKeywordUtils.java Single file compilation did not work
I still do not get, why you want to hold this keyword list. Since we could extract all keywords, like you mentioned, and read all RelObj .. methods, we have all lists we need to get the reserved keyword list directly from the grammar without rewriting it.
The dependency of JavaCC is only needed for tests. So this ParserKeywordsUtils list should be a test class only.
Sure as I already mentioned, the RelObj - methods needs to be renamed to better reflect there meaning and use.
@@ -37,7 +37,7 @@ public static Stream<String> keyWords() { | |||
List<String> keywords = new ArrayList<>(); | |||
try { | |||
try { | |||
keywords.addAll( ParserKeywordsUtils.getAllKeywords(file) ); | |||
keywords.addAll( ParserKeywordsUtils.getAllKeywordsUsingRegex(file) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you use the JavaCC parser to just verify what your regex extracts? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a bit of effort, to extract all keyword tokens based on your initial stub. I used the previous Regex generated List as reference. When both lists returned the same, I got confidence, that I got it working.
However, the JavaCC approach was based on trial an error and is not documented. So having a Test in the future might prove useful.
See it as the equivalent of you toString()
vs DeParser
-- it is redundant, but provides a great way to test the correctness.
@SuppressWarnings({"PMD.ExcessiveMethodLength"}) | ||
public static List<String> getReservedKeywords(int restriction) { | ||
// Classification follows http://www.h2database.com/html/advanced.html#keywords | ||
Object[][] ALL_RESERVED_KEYWORDS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't thoses lists directly derived from the grammar? At least RESTRICTED_ALIAS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the other way around: This List will define, how we rewrite RelObjectName...
. This List will be defined manually and everything else follows it.
We want to:
We will use the difference between ALL keywords and RESERVED keywords for this rewrite.
It is the other way around: we get the ALLOWED keywords and rewrite
Certainly, I have asked you above for an explanation, what you had in mind when introducing the 4 methods. (The first 2 are clear to me, but I struggle to understand the methodology behind |
Rewrite the Reserved Keywords Logic:
RelObjectNameWithoutValue()
based on 3)Advantage:
a) we have now a clear documentation which Keywords are reserved for what reason, with proper tests
b) allowed keywords are generated (semi-)automatically, especially when more Tokens are added to the Grammar. New Tokens won't break SQL Statements, which have been parsed fine before.
To-Dos:
c) Composite Tokens do not work well and still need to be added manually to ALL_KEYWORDS (we would need to refactor the GRAMMAR in order to avoid such composite tokens)
d) @wumpz: Clarify the meaning/purpose of the various
RelObjectNamexxxx()
Productions, so we can auto generate them too.e) Use the Gradle task to fully inject the
RelObjectNamexxxx()
Productions (instead of manually updating the code)Resolves one more Special Oracle Test.
Fixes #1148
Fixes #1450
Fixes #1443
Fixes #1462
Fixes #1508
Fixes #1538