Description
Hey everyone
I have a question regarding the Transparent API / Preferable Target and hope someone can help me understand.
My Object Detection program takes a lot longer to process images when using
net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL_FP16)
or
image = cv2.imread(filePath, cv2.COLOR_BGR2RGB) uMat = cv2.UMat(image)
I've created 4 benchmark programs running sequentually, processing the same 10 .jpg files.
My Baseline is a standard openCV object detection programm, not using the setPreferableTarget or UMat class for images.
The second one sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16
The third converts images into UMat objects
The fourth sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16 and converts images into UMat objects.
I always measured the full processing time starting before I read the image, ending after drawing the labels (excluding writing the output image or detection log) as well as the model inference time with
t, _ = net.getPerfProfile() infTime = (t / cv2.getTickFrequency())
The collected output is as follows:
Benchmark One Full Processing Time: 2.78063s
Benchmark One Model Inference Time: 1.030843s
Benchmark Two Full Processing Time: 3.2567s
Benchmark Two Model Inference Time: 1.12314s
Benchmark Three Full Processing Time: 12.76886s
Benchmark Three Model Inference Time: 10.83879s
Benchmark Four Full Processing Time: 13.43161047s
Benchmark Four Model Inference Time: 11.27375169s
Is there such a large gap between CPU and GPU execution because of the data transferrel between the processing units? Am I missing something crucial?
If this big gap difference can be explained by the data transfer, is there a possibility to "bundle" my workload to reduce the amount of transferrals?
I can provide the full code for these benchmark Programs if they should be helpful.
Thanks in advance!