-
Notifications
You must be signed in to change notification settings - Fork 40.7k
Pod Certificates: Preliminary implementation of KEP-4317 #128010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
/sig auth |
/assign liggit |
@ahmedtd: GitHub didn't allow me to assign the following users: liggit. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
6027570
to
bda6204
Compare
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/triage accepted |
bda6204
to
6f72603
Compare
9e39f17
to
035990e
Compare
pkg/apis/certificates/types.go
Outdated
// SignerName indicates the request signer. | ||
SignerName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document the well-known signer.
// PodName is the name of the pod into which the certificate will be mounted. | ||
PodName string | ||
// PodUID is the UID of the pod into which the certificate will be mounted. | ||
PodUID types.UID | ||
|
||
// ServiceAccountname is the name of the service account the pod is running as. | ||
ServiceAccountName string | ||
// ServiceAccountUID is the UID of the service account the pod is running as. | ||
ServiceAccountUID types.UID | ||
|
||
// NodeName is the name of the node the pod is assigned to. | ||
NodeName types.NodeName | ||
// NodeUID is the UID of the node the pod is assigned to. | ||
NodeUID types.UID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indicate these are all required. +required
(currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// If omitted, kube-apiserver will set it to 86400(24 hours). kube-apiserver | ||
// will reject values shorter than 3600 (1 hour). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting can be done like this https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/v1/defaults.go#L280-L285
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaulting for all will be 24hr. Custom signer users can specify this value if they need a non-default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Though I'm wondering if it might be cleaner to not do any explicit defaulting, but instead treat the field being absent as meaning 24 hours. That way people who just use a JSON / yaml parser won't be confused by missing the magical behavior of the Kubernetes parsing defaulter.
Since we check the issued certificate against maxExpirationSeconds during status update validation, we can guarantee that third-party signer implementations have to be aware of the "missing means 24 hours" behavior.
// The PodCertificateMaxExpiration admission plugin, if enabled, will | ||
// shorten the value to the maximum expiration configured for the requested | ||
// signer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditional defaulting has not aged well in the past. Let's not do mutating admission (or conditional mutating strategy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still see the original, did you push the changes?
// maxExpirationSeconds is the maximum lifetime permitted for the | ||
// certificate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document maximum allowable range. the kubernetes ones will be 24hrs or less. not kube ones will be 91 days or less. shortest possible for all is 1hr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
var allErrors field.ErrorList | ||
allErrors = append(allErrors, ValidatePodCertificateRequest(newReq, opts)...) | ||
allErrors = append(allErrors, apivalidation.ValidateObjectMetaUpdate(&newReq.ObjectMeta, &oldReq.ObjectMeta, field.NewPath("metadata"))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
func ValidatePodCertificateRequestUpdate(newReq, oldReq *certificates.PodCertificateRequest, optsIn ValidatePodCertificateRequestOptions) field.ErrorList { | ||
opts := optsIn | ||
if bytes.Equal(newReq.Spec.PKIXPublicKey, oldReq.Spec.PKIXPublicKey) && bytes.Equal(newReq.Spec.ProofOfPossession, oldReq.Spec.ProofOfPossession) && newReq.Status.CertificateChain == oldReq.Status.CertificateChain { | ||
opts.SkipCryptoValidation = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SkipCryptoValidation
is no longer necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
var allErrors field.ErrorList | ||
allErrors = append(allErrors, ValidatePodCertificateRequest(newReq, opts)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't delegate because the update is immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// Bail out if the certificate chain hasn't been set. | ||
if req.Status.CertificateChain == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.status updates need to be a different function. .status updates must never fail on invalid .spec values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
leafCert, err := x509.ParseCertificate(leafBlock.Bytes) | ||
if err != nil { | ||
allErrors = append(allErrors, field.Invalid(certChainPath, req.Status.CertificateChain, "leaf certificate does not parse as valid X.509")) | ||
return allErrors | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enj can you be sure this does the validation we desire?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, for the entire status bits. Be sure you're happy. Discussion suggests it's not as obvious as I might have thought.
Hi @ahmedtd, |
035990e
to
694bcad
Compare
694bcad
to
b889242
Compare
/retest |
b889242
to
085e872
Compare
Hello, @ahmedtd @deads2k |
@deads2k This is ready for another review pass. Apologies, it took a lot longer than I expected to convert the Kubelet code to be informer-based, but I think it has significantly improved it. There's one change from the last time you reviewed that I should call out: I have locked down the allowable status conditions on PodCertificateRequest:
This simplified a lot of the validation logic, and seems OK because
All other changes are just implementing review feedback. |
// The PodCertificateMaxExpiration admission plugin, if enabled, will | ||
// shorten the value to the maximum expiration configured for the requested | ||
// signer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still see the original, did you push the changes?
pkg/apis/certificates/types.go
Outdated
// | ||
// Signer implementations do not need to support all key types supported by | ||
// kube-apiserver and kubelet. If a signer does not support the key type | ||
// used for a given PodCertificateRequest, it should deny the request, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see "should", did you push the changes?
if pcr.ObjectMeta.CreationTimestamp.Time.After(time.Now().Add(-30 * time.Minute)) { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to have different retention for PCRs with at least one condition (=> addressed) vs PCRs with none (ignored for some reason, maybe faulty signer implementation)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a persistent problem, then there is always going to be one you can query and look at with no conditions near the end of its life. If it isn't persistent, there's nothing to look at.
It could be interesting to have a metric count of PCRs deleted without any conditions on them. Maybe separated by the kube signer and everyone else.
} | ||
|
||
func (m *IssuingManager) refreshOnePod(ctx context.Context, pod *corev1.Pod) error { | ||
for _, v := range pod.Spec.Volumes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you could map pod -> slice[credRecord]
or pod -> podCertProjectedVolume -> credRecord
so that we don't have to loop through the volumes every time?
serviceAccount, err := m.kc.CoreV1().ServiceAccounts(pod.ObjectMeta.Namespace).Get(ctx, pod.Spec.ServiceAccountName, metav1.GetOptions{}) | ||
if err != nil { | ||
return fmt.Errorf("while fetching service account: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one live fetch should do for the whole pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect there is a reason we don't do cached SA lookups, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to be one fetch for the pod.
Ehhh, I am not sure of the importance of having a live lookup here. It's probably fine to work from a cache if I can figure out where Kubelet is caching service accounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far. We start tomorrow with the kubelet
return allErrors | ||
} | ||
|
||
if len(req.Status.Conditions) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than this, make the three known conditions mutually exclusive. Keeping the "must be true" is fine. This will allow us to add other conditions without having a two release dance of loosening validation one release and adding the condition in a second release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I've done this. The three known conditions are now mutually exclusive, but other conditions may be set.
pkg/kubelet/kubelet.go
Outdated
clock.RealClock{}, | ||
) | ||
klet.podCertificateManager = pcm | ||
kubeInformers.Start(wait.NeverStop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be able to use ctx.Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"k8s.io/utils/clock" | ||
) | ||
|
||
// TODO(KEP-4317): Tests for IssuingManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, don't forget to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a note for myself: This is probably easiest to test under the integration test framework, with a real kube-apiserver.
Namespace string | ||
PodName string | ||
PodUID types.UID | ||
ServiceAccountName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird to have the service account name as part of a pod identity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed PodIdentity --- it was vestigial after the IssuingManager refactor to be informer-driven
) | ||
|
||
func (m *IssuingManager) Run(ctx context.Context) { | ||
if !cache.WaitForCacheSync(ctx.Done(), m.pcrInformer.HasSynced) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait for the node lister to be synced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
uniquePath := source.PodCertificate.CredentialBundlePath | ||
if uniquePath == "" { | ||
uniquePath = source.PodCertificate.KeyPath | ||
} | ||
if uniquePath == "" { | ||
uniquePath = source.PodCertificate.CertificateChainPath | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is different from how uniquePath is calculated in TrackPod
and ForgetPod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched all this to be based on the volume name and the index of the projected volume source.
pod.ObjectMeta.Namespace, | ||
pod.ObjectMeta.Name, pod.ObjectMeta.UID, | ||
pod.Spec.ServiceAccountName, serviceAccount.ObjectMeta.UID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could just pass both pod's and SA's objectMeta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like the current spelling.
volumeName: v.Name, | ||
path: uniquePath, | ||
} | ||
delete(m.validCredentials, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are invalidCredentials
ever cleaned up?
Should we also remove pendingCredentials
? I see they get removed in handlePCRDelete()
so I guess it might be fine to defer deletion till then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clean them all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all our map entries should be dropped once the pod or podcertificaterequest they correspond to has been dropped.
} | ||
newPCR := new.(*certificatesv1alpha1.PodCertificateRequest) | ||
|
||
klog.InfoS("Handling PodCertificateRequest update", "pcr", key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs in this method seem a bit too verbose. Probably debug logs? Some of them might be useful at some debug level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to V(2)
reason := "" | ||
message := "" | ||
for _, cond := range newPCR.Status.Conditions { | ||
// PCRs can have only one condition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer true after https://github.com/kubernetes/kubernetes/pull/128010/files#r2016806775 is implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this to properly handle other conditions.
v.Name, uniquePath, | ||
source.PodCertificate.SignerName, source.PodCertificate.KeyType, | ||
) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we try to optimistically create all PCRs and only log errors? And eventually reattempt the PCR creates that failed?
|
||
// TrackPod visits all PodCertificate projected volume sources within the pod, | ||
// creating an initial PodCertificateRequest for each one. | ||
func (m *IssuingManager) TrackPod(ctx context.Context, pod *corev1.Pod) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be reading this wrong, but is this the case?
Conditions (either, or):
- SA is not yet present in the NS
- network blip during SA fetch/PCR create
What happens:
pendingCredentials
is never populated, PCR create is never reattemptedGetPodCertificateCredentialBundle()
will always fail in backoff when the kubelet tries to populate the volume with files
If that's the case, shouldn't we have a retry mechanism built into the manager?
// 2) Kubelet created this PCR, but then restarted. | ||
// | ||
// In both cases, just do nothing. | ||
klog.InfoS("PodCertificateRequest update was for stray", "pcr", key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.InfoS("PodCertificateRequest update was for stray", "pcr", key) | |
klog.InfoS("PodCertificateRequest isn't tracked for any known pods", "pcr", key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// On delete, clean up the pending credential record corresponding | ||
// to this PodCertificateRequest, if any exists. | ||
func (m *IssuingManager) handlePCRDelete(old any) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, kubectl delete podcertificaterequest -A --all
will make all pending pods requesting a cert never to come up. Is that ok?
serviceAccount, err := m.kc.CoreV1().ServiceAccounts(pod.ObjectMeta.Namespace).Get(ctx, pod.Spec.ServiceAccountName, metav1.GetOptions{}) | ||
if err != nil { | ||
return fmt.Errorf("while fetching service account: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect there is a reason we don't do cached SA lookups, right?
} | ||
|
||
// Add informer functions. We don't follow the typical controller pattern | ||
// where we use a workqueue, because we don't have to do any RPCs or state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this eliminates the de-duping capability of a workqueue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments to be addressed before our next review session.
allPods := m.podManager.GetPods() | ||
for _, pod := range allPods { | ||
if err := m.refreshOnePod(ctx, pod); err != nil { | ||
klog.ErrorS(err, "Error while refreshing pod", "namespace", pod.ObjectMeta.Namespace, "name", pod.ObjectMeta.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does kubelet not use utilruntime.HandleError
uniquePath := source.PodCertificate.CredentialBundlePath | ||
if uniquePath == "" { | ||
uniquePath = source.PodCertificate.KeyPath | ||
} | ||
if uniquePath == "" { | ||
uniquePath = source.PodCertificate.CertificateChainPath | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the index of the projected volume and the index projected volume source as the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// refresh. | ||
return | ||
} | ||
if m.clock.Now().After(credRecord.beginRefreshAt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also be sure that beginRefreshAt
is not zero value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API validation places strong constraints on BefinRefreshAt --- it must be between NotBefore and NotAfter.
What would a zero check guard against here?
continue | ||
} | ||
|
||
serviceAccount, err := m.kc.CoreV1().ServiceAccounts(pod.ObjectMeta.Namespace).Get(ctx, pod.Spec.ServiceAccountName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must document why this is done. Add beta criteria to the KEP to consider the alternative of single-item list/watch as we do for secrets and configmaps. We have another one of these in projected service account tokens for image pulls.
path := source.PodCertificate.CredentialBundlePath | ||
if path == "" { | ||
path = source.PodCertificate.KeyPath | ||
} | ||
if path == "" { | ||
path = source.PodCertificate.CredentialBundlePath | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to use indexes.
pkg/volume/projected/projected.go
Outdated
Namespace: s.pod.ObjectMeta.Namespace, | ||
PodName: s.pod.ObjectMeta.Name, | ||
PodUID: s.pod.ObjectMeta.UID, | ||
ServiceAccountName: s.pod.Spec.ServiceAccountName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are two possible choices
- SA name is enough, but if this is true, then it doesn' tmatter because SA name cannot change
- SA name is not enough and we need a UID, and we must collect the correct UID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocker for merge
There must be static pod integration tests. We want static pods with pod certificate projected volumes to
- kubelet rejects the pod and does X to report this to someone
- kubelet runs the pod and doesn't create the projected volume
- we do NOT want a state where the kubelet runs the pod and cannot write the pod to the API.
Information to make the choice, what do we do with static pods with service accounts.
volumeName: v.Name, | ||
path: uniquePath, | ||
} | ||
delete(m.validCredentials, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clean them all
return fmt.Errorf("while PEM-encoding private key: %w", err) | ||
} | ||
|
||
req := &certificatesv1alpha1.PodCertificateRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add owner reference to the pod so we clean up automatically. And test with the OwnerReferencesPermissionEnforcement admission plugin. You may have to add the deletion powers back, but that seems worth the tradeoff. This prevents build up of podcertificaterequests when pods are short lived.
* Define feature gate * Define and serve PodCertificateRequest * Implement Kubelet projected volume source * kube-controller-manager GCs PodCertificateRequests * Add agnhost subcommand that implements a toy signer for testing Change-Id: Id7ed030d449806410a4fa28aab0f2ce4e01d3b10
Change-Id: I9edc0f3d7080330b164f6be78bb8f16350869ffa
085e872
to
4a9326a
Compare
@ahmedtd: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR implements KEP-4317 (Pod Certificates).
make update
to do all the regeneration for the new types./kind feature
/kind api-change