Description
I'm seeing a very strange behaviour when running relatively large buffers (87968 bytes). I have a test FIR filter:
https://github.com/dernasherbrezon/clDsp/blob/main/fir_filter.cl
It takes input buffer, multiplies to filter taps (constant) buffer and writes to output.
The simple test ( https://github.com/dernasherbrezon/clDsp/blob/main/test/test_fir_filter.c ) works fine.
Performance test is not. When I execute the loop 10 times ( https://github.com/dernasherbrezon/clDsp/blob/main/test/perf_fir_filter.c#L47 ) it might hang on reading the data. Any subsequent executions of any programs on GPU will hang. So only reboot helps.
When I execute performance loop only once, then everything is fine. It produces valid results.
Another observation: running using "sudo" never hangs.
Another observation: running using "sudo" take ~10times slower than under normal user ( "pi" ).
How can I troubleshoot the slowness? Can be it related to some memory constraints or some user-specific limits while working with /dev/mem?