Speaker
Description
Unexpected errors can sometimes occur at the GPU driver level, so the GPU must be capable of recovering from these problems in a way that minimizes disruption to the user experience. GPU recovery is a complex topic that becomes even more challenging when considering the specific characteristics of vendor hardware. Additionally, it is necessary to establish some level of interoperability between the kernel and userspace; ideally, both sides should work together to enhance the user experience.
In this context, this workshop is dedicated to discussing GPU recovery from kernel space to user space (and vice versa). With these ideas in mind, we aim to cover some topics such as:
- Per-component reset.
- User Queues Vs. Kernel Queues.
- Enforce Isolation.
- Trap handlers.
There are multiple patches around this topic floating on the mailing list, for example:
- https://lore.kernel.org/dri-devel/20230929092509.42042-1-andrealmeid@igalia.com/
- https://lore.kernel.org/dri-devel/20250204070528.1919158-1-raag.jadav@intel.com/
- https://lore.kernel.org/amd-gfx/20250701184451.11868-1-alexander.deucher@amd.com/T/#t
If you are interested in discussing some of the topics mentioned above, please join us for this workshop.
In-person or virtual presentation | In-person |
---|---|
Code of Conduct | Yes |
GSoC, EVoC or Outreachy | No |