How come a cross-platform emulation layer for WaitForSingleObject and WaitForMultipleObjects API entry-points, based on condition variables & binary semaphores, manages to work significantly faster than the native Windows functions is beyond me. Sure, I only support event & thread objects, as opposed to the number of different object types the usual API supports, but..

Could it be the monitor thread which works as a spin-lock, if there’s at least one thread waiting? I could work around this, letting the CPU spend cycles on something more productive, but there’s a handful of thread race conditions that could backfire if I don’t handle them carefully.

I was initially worried that the Emerald’s performance on Linux would be a shame, if someone compared it to the Windows build’s. Turns out I may actually switch to the event emulation one day in the Windows world :-)

The march for the Linux port carries on..

VSM support: Finished!

If you google for VSM, you’ll get a great deal of web-sites describing the technique. What you’re likely going to find a bit more troublesome to find are actual pictures, showing the pros and cons of Variance Shadow Maps.

Therefore, instead of repeating all the content you can easily find elsewhere, let me just put up a few pictures, demonstrating VSM in action:

Scene 1:
Cylinders rotating at different speeds.
Ambient light + a single directional light.

s1-3Plain Shadow Mapping + PCF + normal-based bias adjustment:
Jaggy edges clearly visible.


Variance Shadow Mapping + ~10-tap gaussian blur of the SM:
Shadows definitely softer, but they come at a performance price.
Note the light bleeding problem in the deeper parts of the scene.

s1-2Tweaked VSM + ~10-tap gaussian blur of the SM:
Enforce a minimum value of the pmax and normalize the range, and the problem’s gone. This comes at a price:

  • The shadows have become stronger and their penumbras are not as nice as in the picture right above.
  • The parameter needs to be tweaked for each camera cut.


Scene 2:
Robot :-)
Ambient light + a directional light.

s2-1Plain Shadow Mapping + PCF + normal-based bias adjustment:
Yeah, well. Meh.

s2-2Variance Shadow Mapping + Gaussian Blur of the SM:
Note the damage the blurring has done to the tiny scene details.

s2-3If we increase the SM size, the details start to come back, but the light bleeding problem intensifies.

Even if you tweak the minimum/maximum allowed variance value, and modify the cut-off, it’s very difficult to find the right balance:
Note taken
: VSMs are not fine for detailed scenes. Layered VSMs may work better in this case.

 Scene 3:
Cube inside a cube.
Ambient light + a directional light + a point light

s3-1Plain Shadow Mapping

s3-2Variance Shadow Mapping (Dual Paraboloid SM) +
2-level Gaussian Blur of the 2-layer SM

The shadows on the cube look nicer and the projection is softened, but the performance cost is huge if you look at the FPS counter of the Plain Shadow Mapping solution. That’s mostly due to the blur which currently is executed separately for each SM layer. With multi-layered rendering, the performance could be likely improved by 40-50%, which would make it much more feasible than as it stands right now.

s3-4Variance Shadow Mapping (Cubical SM) +
2-level Gaussian Blur of the Cube-Map SM

It’s alive!

Apologies for the lack of any updates at all, but the last couple of months were quite hectic. I have changed my workplace, but – first and foremost – I have been busy bringing all the different pieces of my pet project jigsaw puzzle altogether.

There were quite a few things on my TO-DO list that were quite tedious to do, hence I had been putting them off for way too long. Things like scene data compressor & decompressor, multiple scene loaders, significant clean-up of various dark corners of the engine – these have finally been dealt with and I am kind of proud to let the collective mind of the Internet know, that my Emerald project has finally reached an important milestone. Things like:

  • asynchronous mesh, scene, texture data loaders.
  • lighting shaders for the forward rendering-based scene renderer, generated in run-time, basing on the scene configuration and the per-mesh material settings.
  • shadow mapping support for three basic types of lights, with bias/filtering/technique properties adjustable in real-time

These are all finally there. Admittedly, the previous engine I wrote for Suxx and the other two demos provided support for significantly more features, but this is a good starting point, with the engine playing more of a tool-set role and not enforcing as many restrictions as in the past. However, what perhaps is the most important, is the fact that I finally have (nearly 😉 ) all the tools I needed to start working on some new stuff. Things will be happening in the ucpoming months!

Okay, so without further ado, here is a link to a build that will let you run a test application that plays four scenes from Suxx in a loop. The player will cycle through available cameras, if more than one camera is defined for a particular scene. You can download the build here: *click click*

It may not look like it but I actually do frustum culling at shadow map generation pass. However, you can’t walk on water if some of the meshes span across the whole scene :-) )

This build has been verified to work flawlessly on NVIDIA drivers (assuming you have a Fermi-generation GPU or newer). It *should* work on anything else that supports OpenGL 4.2 core profile contexts (which – to the best of my knowledge – include both AMD and Intel drivers), but if it doesn’t, I would be more than grateful if you could send me a log file, together with a short description of what happened.

If curious, you can find all the source code at my GitHub profile (

*psst*: Yeah, there is a memory leak somewhere which drops a few kilos of commited memory every second. My educated guess would be on the matrices that are probably not being released back to the matrix pool at some point, but that’s something that I am planning to eventually look at, when we get closer to an actual release date of the next Elude’s PC demo 😉

Pushed all my code to GitHub

Hey, been some time!

There’s quite a lot of stuff happening right now in my commercial life, so – while still on the wave – I’ve decided to open-source my rendering engine implementation to the community.

If you’re interested in having a look at what I’m currently working on, or simply need a reference base of some sort, feel free to have a look at :-)

My previous engine has not been made available (mostly because it would take a huge amount of time to clean it up, and – as you might’ve guessed already – free time is a pretty precious scarce resource for me at the moment). Same goes for CamMuxer and the three demos Elude released. I don’t think these will ever see the light of the day, but hey – you never know!

The engine is written specifically with what we’ve learned as a team, having worked on the three demos. It’s definitely not aimed at game developers, and does not have a fancy UI (did I mention that I despise writing those already?). Emerald is basically a toolset that you can use to write tools, PoC applications or – you name it! – demos, and that’s the direction in which I think the development will take in the upcoming years.

You can create ES 3.1 & GL 4.2 contexts with Emerald with just a few lines of code, and it allows you to jump into writing prototypes in a matter of seconds. It’s got a support for reading a number of image formats, has a built-in OpenCL inter-op support, can read COLLADA files and store them in an engine-friendly format. There’s also a few other pieces of functionality that have been implemented along the way while I was porting previous engine to the latest concept the Emerald is being written around. So yeah, that’s basically it.

The engine has not been tested on non-NVIDIA platforms. I do believe that there will come a day when I’ll have to spend a few days, fixing shaders-n-stuff, but for now there are more interesting things on the horizon I’d like to take care of, first. Consider yourselves warned :)

Quick update

This blog is not dead :-) I’ve been recently very busy with various commercial activities that leave me literally zero time for writing any posts. I’ll try to pencil in some time soon to write something on a bit entangled piece of a certain functionality in OpenGL. But until then..

Fingers crossed!

Particle emitter (3)

Seems like I’m done with a proof-of-concept implementation of an OpenCL+OpenGL particle collision tester :)

For the curious, white line segments represent direction in which a collision would have been considered for the frame (but not necessarily affecting the particle movement). Blue lines represents velocity vectors (unnormalized)

Particle emitter (2)

One of the things which are very peculiar about graphics development is how tricky it is to hunt bugs down. You often have to resort to visual tracing in order to actually be able to experience the “ah-hah!” Archimedes moment.

I have spent a great deal of my free time recently to track down the problem that was causing my particles to fall behind geometry. Were it not the visual hints, I would probably have not been able to tell for the next month that it was AABB/ray intersection test implementation in my OpenCL code that was returning negative values for cases which should be reporting collisions.

Oh well. Let’s move on.

wip(gray-to-white line segments indicate detected collisions, blue ones represent velocity vectors <not normalized>)

Random thoughts on 3D programming