-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could this be used with V4L2/libcamera buffers on the Raspberry Pi 4 (Arm A72) #107
Comments
Thanks for the issue. I have only run it on ARM Cortex®-A53 (Xilinx Zynq Ultrascale+ MPSoC) and ARM Cortex®-A9 (Xilinx ZYNQ / Altera CycloneV SoC), I don't know if it works on ARM Cortex®-A72 (Raspberry Pi 4). It may work on the ARM Cortex®-A72 (Raspberry Pi 4) since it has the same arm64 architecture as the A53. Please someone give me some information. |
udmabuf likely isn't a good way to pass the buffers, but if you're experiencing issues with mmap'ing buffers indeed it's because they are likely in uncached memory. |
Here is a little explanation about the cache being turned off. Performance issue with V4L2 streaming I/O (V4L2_MEMORY_MMAP)IntroductionV4L2 streaming I/O (V4L2_MEMORY_MMAP) is a V4L2 streaming I/O scheme that maps V4L2 buffers allocated in the V4L2 driver (in the kernel) to user space using the mmap mechanism, allowing user programs to access V4L2 This method is used relatively often because it allows direct access to the V4L2 buffers from user space. However, certain V4L2 drivers had a problem where caching was turned off when mapping to user space with mmap, resulting in very slow memory access and poor performance. One V4L2 driver that causes this problem is Xilinx's Video DMA. This topic describes the mechanism. Mechanism of cache turn-offThere is a problem with the mmap of dma-contig in the V4L2 buffer memory allocator, which in some cases turns off the cache. Memory allocator for V4L2 bufferThere are three types of memory allocators for V4L2 buffers
Of these, the last one, dma-contig, is the most problematic. vmallocvmalloc is a memory allocator for V4L2 drivers without DMA. dma-sgdma-sg is a memory allocator for devices with DMA supporting Scatter Gather, which allows DMA transfers even when buffers are not contiguous in physical memory space. dma-contigdma-contig is a memory allocator for devices with DMA that does not support Scatter Gather. kernel's dma API to allocate memory. mmap for dma-contigvb2_dc_mmap()The mmap for dma-contig is as follows static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
{
struct vb2_dc_buf *buf = buf_priv;
int ret;
if (!buf) {
printk(KERN_ERR "No buffer to map\n");
return -EINVAL;
}
if (buf->non_coherent_mem)
ret = dma_mmap_noncontiguous(buf->dev, vma, buf->size,
buf->dma_sgt);
else
ret = dma_mmap_attrs(buf->dev, vma, buf->cookie, buf->dma_addr,
buf->size, buf->attrs);
if (ret) {
pr_err("Remapping memory failed, error: %d\n", ret);
return ret;
}
vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
vma->vm_private_data = &buf->handler;
vma->vm_ops = &vb2_common_vm_ops;
vma->vm_ops->open(vma);
pr_debug("%s: mapped dma addr 0x%08lx at 0x%08lx, size %lu\n",
__func__, (unsigned long)buf->dma_addr, vma->vm_start,
buf->size);
return 0;
} Do not consider dma_mmap_attrs()dma_mmap_attrs() is as follows. https://elixir.bootlin.com/linux/v6.1.38/source/kernel/dma/mapping.c#L457 int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
if (dma_alloc_direct(dev, ops))
return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
attrs);
if (!ops->mmap)
return -ENXIO;
return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
} On the arm64 architecture, dma_alloc_direct() is normally TRUE, so dma_direct_mmap() is called. dma_direct_mmap()dma_direct_mmap() is as follows. https://elixir.bootlin.com/linux/v6.1.38/source/kernel/dma/direct.c#L555 int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
{
unsigned long user_count = vma_pages(vma);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
unsigned long pfn = PHYS_PFN(dma_to_phys(dev, dma_addr));
int ret = -ENXIO;
vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
if (force_dma_unencrypted(dev))
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
return ret;
if (dma_mmap_from_global_coherent(vma, cpu_addr, size, &ret))
return ret;
if (vma->vm_pgoff >= count || user_count > count - vma->vm_pgoff)
return -ENXIO;
return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
user_count << PAGE_SHIFT, vma->vm_page_prot);
} Here, the cache is set by dma_pgprot(). dma_pgprot()dma_pgprot() is as follows. https://elixir.bootlin.com/linux/v6.1.38/source/kernel/dma/mapping.c#L415 #ifdef CONFIG_MMU
/*
* Return the page attributes used for mapping dma_alloc_* memory, either in
* kernel space if remapping is needed, or to userspace through dma_mmap_*.
*/
pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
{
if (dev_is_dma_coherent(dev))
return prot;
#ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINE
if (attrs & DMA_ATTR_WRITE_COMBINE)
return pgprot_writecombine(prot);
#endif
return pgprot_dmacoherent(prot);
}
#endif /* CONFIG_MMU */ Note that the macro dev_is_dma_coherent() is used here. ConclusionOn arm64 architecture, V4L2 drivers employing dma-contig will turn off cache on mmap. |
Hi all! Firstly, I want to sincerely say thank you to all the contributors to this project and more specifically to @ikwzm. It seems to be a really dynamic project, where all the issue have answers. This is great! I am in a situation a bit like @octopus-russell. Indeed, I want to capture images from a camera using v4l2 and I need to write them at a given address in physical memory. Several solutions exist of course, but it seems from my point of view that using DMA is the most optimal way. In v4l2 it corresponds to the So I've entered the dark world of DMA in Linux. After several hours/days of research around the web, I've come across this device-driver and I thought for one glorious second that I've had found the right way. But after several attempts, I've finally find out that I cannot export dma-buffer file descriptors using Then, during another day I've tried to find a tool that allow me to export a dma-buffer file descriptor from a physical memory address... in vain. And a bit randomly, I encounter this issue, that is really close from what I want to do! Thus, may be that I can find the solution here. Indeed @kbingham you said that "udmabuf likely isn't a good way to pass the buffers" and I'm wondering if you could give me a hint on how to do what I want? I.e: using Sorry for this looooong text, and have a good day! PS1: unfortunately since December 2022, the strategy consisting in using PS2: I am aware that may be this is not the right place to ask that, if it is the case, could you redirect me to the right place? |
Thank you for your valuable information.
I did not know that the V4L2_MEMORY_USERPTR method was no longer available. It would be a shame if it is no longer available. This is not yet a decision, but I am currently trying to add the ability to export u-dma-bufs as dma-bufs. https://github.com/ikwzm/udmabuf/tree/dma-buf-export-develop I still have a long way to go, but I will make it public when it is ready. |
You are welcome. It is still available but, as far as I understand, v4l2 refuses to use it if it implicates in the end direct writing to physical memory address. Wow, having Thank you for your answer. |
Hi,
We've come across this driver as a potential way of passing a userspace dma buffer to V4L2 instead of V4L2's default mmap mode which is rather slow. Here I see someone's done this achieving a 15x speedup: #38
Do you know if this module supports the Raspberry Pi 4? (ARM A72, Debian bullseye, kernel 6.1.21)
Thanks
Russell
The text was updated successfully, but these errors were encountered: