-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap all opaque types as interfaces #112
Conversation
f85ef81
to
7e28d72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial change look good. The main question I have is whether to top-level library functions that operate ON other types make sense, or whether we should always use the functions off the types?
pkg/nvml/api.go
Outdated
} | ||
|
||
// Define package level methods as aliases to Interface methods of libnvml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have toolking to generate this automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at the moment. It was fairly easy to do though. I just grepped for func (l* library)
copied that in and did some search and replace on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tool has been added in the latest revision.
var DeviceGetComputeRunningProcesses = deviceGetComputeRunningProcesses_v1 | ||
var DeviceGetGraphicsRunningProcesses = deviceGetGraphicsRunningProcesses_v1 | ||
var DeviceGetMPSComputeRunningProcesses = deviceGetMPSComputeRunningProcesses_v1 | ||
var deviceGetComputeRunningProcesses = deviceGetComputeRunningProcesses_v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a silly question: Can you remind me why we don't have an nvml
prefix here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nvml "functions" in this block come from #defines in nvml.h. The underlying versioned functions that get assigned to these variables are generated as part of the bindings with the v1
, v2
, v3
, etc. prefixes and then assigned in updateVersionedSymbols
below.
The difference with these functions is that we need to wrap the underlying nvml*_v1
, nvml*_v2
, etc. functions with some extra, version specific logic. Because of this, we create a level of indirection before assinging them back out to the "unversioned" variable.
pkg/nvml/api.go
Outdated
@@ -279,7 +279,7 @@ type Interface interface { | |||
DeviceSetVgpuSchedulerState(Device Device, PSchedulerState *VgpuSchedulerSetState) Return | |||
DeviceGetVgpuSchedulerCapabilities(Device Device) (VgpuSchedulerCapabilities, Return) | |||
GpuInstanceGetComputeInstancePossiblePlacements(GpuInstance GpuInstance, Info *ComputeInstanceProfileInfo) ([]ComputeInstancePlacement, Return) | |||
GpuInstanceCreateComputeInstanceWithPlacement(GpuInstance GpuInstance, Info *ComputeInstanceProfileInfo, Placement *ComputeInstancePlacement, ComputeInstance *ComputeInstance) Return | |||
GpuInstanceCreateComputeInstanceWithPlacement(GpuInstance GpuInstance, Info *ComputeInstanceProfileInfo, Placement *ComputeInstancePlacement) (ComputeInstance, Return) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bug? Does it make sense to do this in a separate PR so as to not have it go "missing" in the changelog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a bug. However, if I do it as a separate PR, then I can't make the updates in the commit that follows to turn ComputeInstance into an Interface. I'd rather have all the interface conversion in a single PR. I could look at maybe updating it in a "pre" PR, but I don't know how hard it will be to rebase if I do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to leave this in this PR as long as we remember to call it out in the release notes once we tag a new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make sure and call it out in the release notes
7e28d72
to
81e362c
Compare
I'd be fine returning concrete types for everything (including the top level library) and moving the API definitions to the |
No, that's not what I meant -- although it is a valid question. I meant that we have, as an example, three functions that do the same thing:
My question was whether we would consider moving towards requiring the use of |
I would be inclined to keep them since that is what matches the actual NVML API. That said, we should add tooling to autogenerate the top level Interface as well as the package level aliases from the functions hanging off the |
8cfa643
to
c4bbc89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments on the generation toolking to start with.
gen/nvml/generateapi.go
Outdated
|
||
func getWriter(outputFile string) (io.Writer, func() error, error) { | ||
if outputFile == "-" { | ||
return os.Stdout, nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of nil
as the "Closer" can we return:
func() error { return nil }
then we don't need to check for nil at the close site?
Alternatively, we can return a io.WriteCloser
and implement a noopCloser
that we combine with os.Stdout:
type noopCloser struct {
io.Writer
}
func (n *noopCloser) Close() error {
return nil
}
And then our return statement becomes:
return &noopCloser{Writer: os.Stdout}, nil
(and we can just return os.Create(outputFile)
directly in the file case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the first suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, I am not a fan of "noop code" and prefer to be explicit in the caller (which is why I hesitated even changing this at all), but this is minor, so its fine.
gen/nvml/generateapi.go
Outdated
return nil | ||
}) | ||
if err != nil { | ||
return nil, fmt.Errorf("walking the embed.FS path: %w\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
embed.FS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from an old iteration, removing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
217cd99
to
b02ebca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment about where we put our Mock sytems implementations.
pkg/nvml/mock/dgxa100.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a thought. Does it make sense to have the "implementations" under something like mock/system
or mock/platform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value does the extra level of nesting bring? To me it seems better to say mock.DGXA100Server
or mock.A100Device
over system.A100Device
or platform.*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy with a dgxa100
folder though. Then we could change the names to dgxa100.Server
and dgxa100.Device
, dgxa100.GpuInstance
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change -- let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value does the extra level of nesting bring?
It's more about organisation than anything else. The mocks themselves are all generated wherease the systems are implemented.
b02ebca
to
0d26ac8
Compare
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
As part of this, redefine all package level methods as methods hanging off of the 'library' type and create aliases for all package level methods to those methods hanging off of the default 'libnvml' instance of the 'library' type. Signed-off-by: Kevin Klues <kklues@nvidia.com>
0d26ac8
to
8d282c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @klueska.
I have some local changes in the device plugin that could benefit from the "Mock" implementations already!
8d282c0
to
f343a06
Compare
f343a06
to
3937a82
Compare
7b57959
to
abf8e8e
Compare
@@ -85,6 +113,10 @@ func (l *library) load() (rerr error) { | |||
return fmt.Errorf("error opening %s: %w", l.path, err) | |||
} | |||
|
|||
// Update the errorStringFunc to point to nvml.ErrorString | |||
errorStringFunc = nvmlErrorString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set this to the default once we close the library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good.
Minor comment regarding the error string.
} | ||
return nvmlErrorString(r) | ||
func (l *library) ErrorString(r Return) string { | ||
return r.Error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We either need to keep the lookup logic as before or reset the default if the library is unloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't leave it as before because in the r.Error() and r.String() methods I don't have access to l
. I will update it reset the function on unload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
pkg/nvml/lib.go
Outdated
l.dl = dl.New(o.path, o.flags) | ||
} | ||
|
||
func (l *library) GetExtendedInterface() ExtendedInterface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold on GetExtendedInterface
and would prefer GetExtensions()
, but this is not a blocker. I'm happy to keep it as is and revisit this later once we have some usage examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I went for a hybrid -- the interface is called ExtendedInterface
to be symmetrical with Interface
, but the method call is Extensions()
(which is the only actual symbol anyone will see in practice), e.g.:
nvml.Extensions().LookupSymbol("symbol")
or
envml := nvml.Extensions()
envml.LookupSymbol("symbol")
Its possible that the type itself might be embedded in a struct, but I only see that happening in testing (if at all) and I like the symmetry of that with Interface
for the core API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine. I'm not too concerned about the type name, since as you mention that isn't visible at the call site.
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
A new ComputeInstance should be returned, not passed in by reference Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
A new GpmSample should be returned, not passed in by reference. As part of this, add methods to hang of of the GpmSample type (which were missing previously). Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
We should eventually expand this package to include a unified mock server like we have in the go-nvlib testing and mig-parted testing. Signed-off-by: Kevin Klues <kklues@nvidia.com>
This code was pulled over (mostly) directly from 'mig-parted' Signed-off-by: Kevin Klues <kklues@nvidia.com>
The methods in this interface represent extensions to the core NVML API that are only accessible through calling GetExtensions() against the Interface in use (or at the package level for the default interface). Signed-off-by: Kevin Klues <kklues@nvidia.com>
abf8e8e
to
1fa43fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now!
No description provided.