I’m still polishing CamMuxer software for WeCan 2012. Up to now, data buffers reported by DirectShow graphs running intermittently were cached in the process space and then passed to VRAM by means of glTexSubImage2D() calls, executed from within the main rendering pipeline handler. This was far from perfect, for at least the following reasons:
- introduced unnecessary lags into main rendering pipeline – this might seem like a minor issue but there is a certain pitfall here. CamMuxer supports showing a slider (containing up to two lines of text) which slides in and out in a smoothstep manner. Any lag introduced into the pipeline delays the buffer swap operation, hence causes a clearly noticeable jerkiness we do not want.
- stalled the pipeline – glTex*Image*() calls are synchronous (unless you operate on buffer objects bound to pixel unpack targets instead of client-space pointers, in which cause the operation can take place without crossing GPU borders. This unfortunately is not the case in my software 🙂 )
The next step that I’ve been working on the last couple of days was rearranging things a bit by introducing worker threads. Each camera stream is assigned one worker thread. The thread creates an invisible window and binds a new OpenGL context to the window. The new context shares namespace with the main context. This allows us to move the hectic glTexImage*() calls to separate per-camera threads, effectively off-loading the main rendering pipeline and – in the end – providing a smoother watching experience.
I’ve already done some work with context sharing during my recent commercial activities, so I knew most of the wiles OpenGL could throw at me. However, there was one thing that caught me off-guard and that I thought would have been worthy to put up on the blog:
If you happen to work with multiple contexts working in multiple threads and you use threads that run in parallel to render off-screen textures using a shared program object, do make sure you serialize the process! No, really double-check.
The not-so-obvious catch in this case is that context sharing does not really mean you can access all OpenGL objects in all contexts that use common namespace. There are some object types that are not shared (like FBOs, for instance), no matter how hard you try. Right, so far so good.
The thing I’ve forgotten myself was that program objects (which are shared) are also assigned information regarding values of the uniforms associated with the program. Before running it, many times you happen to update at least one of the associated values. Boom! Good luck if you happen to be doing that from within multiple contexts running in multiple threads 🙂 Wrapping the program execution routines in process-wide critical sections did the trick for me.
SITUATION UPDATE: Turns out wrapping all the code using a program object inside a CS does not solve the problem, nor forced glFlush() and glFinish() calls do. Looks like a program object must explicitly be created in and used within a single context. If you don’t play ball, you get nasty uniform value leakages that appear to be caused by inexplicable forces. This might be NViDiA-specific, I don’t recall seeing explanation behind this kind of behaviour in GL3.3 core profile specification 🙁