The Wayback Machine - https://web.archive.org/web/20250522040311/https://github.com/github/codeql-go/pull/251
Skip to content
This repository was archived by the owner on Jan 5, 2023. It is now read-only.

Add taint-tracking for archive/tar and archive/zip #251

Merged
merged 7 commits into from
Jul 28, 2020

Conversation

gagliardetto
Copy link
Contributor

@gagliardetto gagliardetto commented Jul 8, 2020

Part of #167

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts about naming. I'm a bit unsure about some of the taint steps; see inline comments. I'll also kick off an evaluation to see what it does for performance and results.

@gagliardetto
Copy link
Contributor Author

Thanks for the review, @max-schaefer

I updated the naming, and the taint logic.

@@ -0,0 +1,210 @@
// WARNING: This file was automatically generated. DO NOT EDIT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to make these tests a little less verbose?

I'm also growing increasingly fond of tests with inline annotations where the expected output of the test is in some way encoded in the test itself. For example, you are already using source and sink functions to model expected sources and sinks. If you give them an additional integer argument, you could establish the convention that taint from source(n) should flow to sink(n, ...). The test query would then simply check that there is such flow and no other flow from source(n) to an unrelated sink(n', ...).

This has the advantage that you can see the expected test output directly from the test itself and don't have to refer back and forth between the .go file and the .expected file. In fact, the .expected file would be completely empty.

You can see an example of this style in our call graph tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to make these tests a little less verbose?

Comments can be removed.

This has the advantage that you can see the expected test output directly from the test itself and don't have to refer back and forth between the .go file and the .expected file. In fact, the .expected file would be completely empty.

Does that mean that tests would only have the results that don't have the flow from a source(n) to a sink(n,...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very cool. I'll implement that. I'm very fond of automation that removes manual steps.

@max-schaefer
Copy link
Contributor

Evaluation showed no changes in performance or results.

@@ -0,0 +1,105 @@
// WARNING: This file was automatically generated. DO NOT EDIT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated Go code should have a comment that matches ^// Code generated .* DO NOT EDIT\.$ as documented here: https://golang.org/pkg/cmd/go/internal/generate/; could you change the comment accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also perhaps mention in the comment how they were autogenerated, so I know where to go if I want to alter them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention the repo. Then, I think the README file of the repo should be enough to reproduce.

For the taint-tracking of the packages in this PR, these are the commands:

codebox --out-dir=./generated/latest --pkg=/usr/local/go/src/archive/tar

# and

codebox --out-dir=./generated/latest --pkg=/usr/local/go/src/archive/zip

If you want to edit the taint logic, add --http and then go to http://127.0.0.1:8080/

codebox --out-dir=./generated/latest --pkg=/usr/local/go/src/archive/tar --http

# and

codebox --out-dir=./generated/latest --pkg=/usr/local/go/src/archive/zip --http

Comment on lines +66 to +68
source := newSource(0)
out := TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(source)
sink(0, out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make the tests cleaner to just inline these functions; is there a reason you haven't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean, exactly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have separate functions for each step? It seems like you could just write the step itself into these blocks, like so:

{
    source = newSource(0)
    var intoWriter881 tar.Writer
    intoWriter881.WriteHeader(source)
    sink(intoWriter881)
}

It would be much more compact and probably more informative when a test fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely possible, but I prefer named functions; they keep things clean and ordered, and I can easily understand where and for what each function was generated (see the naming suffix).

Also, less visual fatigue in case of a review, not to mention a lot less scrolling in case of a 100+ test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll draft an implentation of that tomorrow, and share the results of what it looks like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still looking into this? I'd be fine with leaving the tests as is (though I don't necessarily agree with your reasoning; the generated function names do not look very pretty, and I find a list of 100+ functions at least as tiresome to review as a list of 100+ code blocks).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative suggestion:

Suggested change
source := newSource(0)
out := TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(source)
sink(0, out)
sink(0, TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(newSource(0))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might then even be able to rename newSource to source, which would be nicely parallel to sink.

@smowton
Copy link
Contributor

smowton commented Jul 15, 2020

Re: the CI failure, note we now have autoformatting for Go as well as QL. After a rebase, try make autoformat in the root of the repo

smowton
smowton previously approved these changes Jul 15, 2020
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low-confidence review: lgtm

@gagliardetto
Copy link
Contributor Author

Re: the CI failure, note we now have autoformatting for Go as well as QL. After a rebase, try make autoformat in the root of the repo

The issue was a go file I edited from the browser on GH.

Fixed and pushed, but github.com seems to be having system issues.

@smowton
Copy link
Contributor

smowton commented Jul 15, 2020

Yeah, @gagliardetto you can get a notification when it's fixed from https://www.githubstatus.com/

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, modulo an optional suggestion.

Comment on lines +66 to +68
source := newSource(0)
out := TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(source)
sink(0, out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still looking into this? I'd be fine with leaving the tests as is (though I don't necessarily agree with your reasoning; the generated function names do not look very pretty, and I find a list of 100+ functions at least as tiresome to review as a list of 100+ code blocks).

Comment on lines +66 to +68
source := newSource(0)
out := TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(source)
sink(0, out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative suggestion:

Suggested change
source := newSource(0)
out := TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(source)
sink(0, out)
sink(0, TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(newSource(0))

Comment on lines +66 to +68
source := newSource(0)
out := TaintStepTest_ArchiveTarFileInfoHeader_B0I0O0(source)
sink(0, out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might then even be able to rename newSource to source, which would be nicely parallel to sink.

@gagliardetto
Copy link
Contributor Author

Are you still looking into this? I'd be fine with leaving the tests as is (though I don't necessarily agree with your reasoning; the generated function names do not look very pretty, and I find a list of 100+ functions at least as tiresome to review as a list of 100+ code blocks).

I'm short on time to implement the modified code generation. I suggest moving on with the PRs, and then, at a certain point, I'll open a PR with the updated golang test code for the previous merged PRs.

@max-schaefer
Copy link
Contributor

OK, in that case I'll merge. As always, thanks very much for your contribution!

@max-schaefer max-schaefer merged commit e9ae697 into github:master Jul 28, 2020
owen-mc added a commit to owen-mc/codeql-go that referenced this pull request Aug 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants