Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opengl performance improvements #1410

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

gizahNL
Copy link
Contributor

@gizahNL gizahNL commented Nov 24, 2021

Low hanging fruit: change glWaitSync behavior
Optimize fragment shader to discard when invisible (alpha < 0.01)
Change shader (less branching due to non-uniform flow control)
Using a separate shader program as a "fast path" & using openGL blend func (executed on GPU ROP, less texture reads) shaves off another ~5%

Total improvement from ~80-85% to ~50-55% utilization when running 4 HD 50i channels each with 2 layers, close to 2.1 GPU utilization.

Gijs Peskens added 4 commits November 24, 2021 15:01
instead of repeatedly calling glClientWaitSync with a 1
nanosecond timeout, call it with a 20ms timeout w flush.

Decreases average GPU utilisation on my testbench by about
10% (~85%->~75%, 4 * 1080i5000 on k620)
@gizahNL gizahNL changed the title [WIP] opengl linux performance opengl performance improvements Nov 25, 2021
When running it do blending via OpenGL, this is a tad bit faster.
Gijs Peskens added 4 commits November 29, 2021 15:41
Since we keep filling the command buffer there is no need
to flush and we can safely forego it.
This marginally improves performance.
@Julusian
Copy link
Member

Trying this with 4x 1080i50 channels (each playing 2 AMB) on ubuntu 22.04 with a GTX1060, I am seeing gpu usage go from 40-45% to 38-42%, which is not a significant improvement. What gpu and os are you using?

On windows it gets stuck in an error loop when playing any media with caspar::gl::ogl_invalid_framebuffer_operation_ext

Change shader (less branching due to non-uniform flow control)

It has been quite a while (~10 years) since I have had to think about optimising cuda code, but from what I remember branching is only an issue when threads in the same cluster make take different routes. So for us, different branches being used for each frame being composited should have no major impact?

What is the cost of frequently switching shaders? some layers on a channel could be on the fast and some on the slow shader

As it currently stands, I am not convinced that this will give a noticeable performance benefit to most users, so I am not convinced it is worth the extra complexity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants