Category Archives: Uncategorized

How come a cross-platform emulation layer for WaitForSingleObject and WaitForMultipleObjects API entry-points, based on condition variables & binary semaphores, manages to work significantly faster than the native Windows functions is beyond me. Sure, I only support event & thread objects, as opposed to the number of different object types the usual API supports, but..

Could it be the monitor thread which works as a spin-lock, if there’s at least one thread waiting? I could work around this, letting the CPU spend cycles on something more productive, but there’s a handful of thread race conditions that could backfire if I don’t handle them carefully.

I was initially worried that the Emerald’s performance on Linux would be a shame, if someone compared it to the Windows build’s. Turns out I may actually switch to the event emulation one day in the Windows world 🙂

The march for the Linux port carries on..

VSM support: Finished!

If you google for VSM, you’ll get a great deal of web-sites describing the technique. What you’re likely going to find a bit more troublesome to find are actual pictures, showing the pros and cons of Variance Shadow Maps.

Therefore, instead of repeating all the content you can easily find elsewhere, let me just put up a few pictures, demonstrating VSM in action:

Scene 1:
Cylinders rotating at different speeds.
Ambient light + a single directional light.

s1-3Plain Shadow Mapping + PCF + normal-based bias adjustment:
Jaggy edges clearly visible.

s1-1

Variance Shadow Mapping + ~10-tap gaussian blur of the SM:
Shadows definitely softer, but they come at a performance price.
Note the light bleeding problem in the deeper parts of the scene.

s1-2Tweaked VSM + ~10-tap gaussian blur of the SM:
Enforce a minimum value of the pmax and normalize the range, and the problem’s gone. This comes at a price:

  • The shadows have become stronger and their penumbras are not as nice as in the picture right above.
  • The parameter needs to be tweaked for each camera cut.

 

Scene 2:
Robot 🙂
Ambient light + a directional light.

s2-1Plain Shadow Mapping + PCF + normal-based bias adjustment:
Yeah, well. Meh.

s2-2Variance Shadow Mapping + Gaussian Blur of the SM:
Note the damage the blurring has done to the tiny scene details.

s2-3If we increase the SM size, the details start to come back, but the light bleeding problem intensifies.

Even if you tweak the minimum/maximum allowed variance value, and modify the cut-off, it’s very difficult to find the right balance:
s2-4
Note taken
: VSMs are not fine for detailed scenes. Layered VSMs may work better in this case.

 Scene 3:
Cube inside a cube.
Ambient light + a directional light + a point light

s3-1Plain Shadow Mapping

s3-2Variance Shadow Mapping (Dual Paraboloid SM) +
2-level Gaussian Blur of the 2-layer SM

The shadows on the cube look nicer and the projection is softened, but the performance cost is huge if you look at the FPS counter of the Plain Shadow Mapping solution. That’s mostly due to the blur which currently is executed separately for each SM layer. With multi-layered rendering, the performance could be likely improved by 40-50%, which would make it much more feasible than as it stands right now.

s3-4Variance Shadow Mapping (Cubical SM) +
2-level Gaussian Blur of the Cube-Map SM

It’s alive!

Apologies for the lack of any updates at all, but the last couple of months were quite hectic. I have changed my workplace, but – first and foremost – I have been busy bringing all the different pieces of my pet project jigsaw puzzle altogether.

There were quite a few things on my TO-DO list that were quite tedious to do, hence I had been putting them off for way too long. Things like scene data compressor & decompressor, multiple scene loaders, significant clean-up of various dark corners of the engine – these have finally been dealt with and I am kind of proud to let the collective mind of the Internet know, that my Emerald project has finally reached an important milestone. Things like:

  • asynchronous mesh, scene, texture data loaders.
  • lighting shaders for the forward rendering-based scene renderer, generated in run-time, basing on the scene configuration and the per-mesh material settings.
  • shadow mapping support for three basic types of lights, with bias/filtering/technique properties adjustable in real-time

These are all finally there. Admittedly, the previous engine I wrote for Suxx and the other two demos provided support for significantly more features, but this is a good starting point, with the engine playing more of a tool-set role and not enforcing as many restrictions as in the past. However, what perhaps is the most important, is the fact that I finally have (nearly 😉 ) all the tools I needed to start working on some new stuff. Things will be happening in the ucpoming months!

Okay, so without further ado, here is a link to a build that will let you run a test application that plays four scenes from Suxx in a loop. The player will cycle through available cameras, if more than one camera is defined for a particular scene. You can download the build here: *click click*

It may not look like it but I actually do frustum culling at shadow map generation pass. However, you can’t walk on water if some of the meshes span across the whole scene 🙂 )

This build has been verified to work flawlessly on NVIDIA drivers (assuming you have a Fermi-generation GPU or newer). It *should* work on anything else that supports OpenGL 4.2 core profile contexts (which – to the best of my knowledge – include both AMD and Intel drivers), but if it doesn’t, I would be more than grateful if you could send me a log file, together with a short description of what happened.

If curious, you can find all the source code at my GitHub profile (https://github.com/kbiElude/Emerald).

*psst*: Yeah, there is a memory leak somewhere which drops a few kilos of commited memory every second. My educated guess would be on the matrices that are probably not being released back to the matrix pool at some point, but that’s something that I am planning to eventually look at, when we get closer to an actual release date of the next Elude’s PC demo 😉

Quick update

This blog is not dead 🙂 I’ve been recently very busy with various commercial activities that leave me literally zero time for writing any posts. I’ll try to pencil in some time soon to write something on a bit entangled piece of a certain functionality in OpenGL. But until then..

Fingers crossed!

Particle emitter (1)

Experimenting a little bit with particle emitters..

The particles you can see above bounce over an infinite plane and a sphere that is in the origin of the scene. This is simple Transform Feedback at work – it costs literally nothing (as the numbers tell) and the renderer is capped at 60 FPS.

For the next stop, I’m planning to have the particles bounce over a mesh represented with a Kd tree.

Framebuffer blitting catch

Sorry for the recent lack of updates – I’ve been pushing pedal to the metal for some time now in my commercial life, in which an interesting and challenging project emerged and took control of all my development mana. What can you do.

I don’t expect the situation to change till the end of July, so in absence of time I’d have to spend to prepare some more engaging posts, I decided to share some quickly-hackable GL/GLES hints that caught me off-guard @ work. Hope they save somebody’s effort!

Here’s a piece that is far from obvious (at least it was a far cry from being one for me! 🙂 ) and which had me blocked for a day or two before I found out what the actual reason for the rather baffling misbehavior was:

Framebuffer blits are affected by scissor test.

In other words, if you care about having your data transffered from one attachment point of your read framebuffer to another attachment of your draw framebuffer as a whole, make sure you either use a sufficiently large scissor window (slippy) OR disable it before you call glBlitFramebuffer() (safer). Trust me, it won’t hurt and can potentially save you a lot of hassle.

Static Code Analysis in VS2010 Express

If you install Visual Studio 2010 Express, what you may find missing is support for static code analysis which is claimed to be only available in paid editions.

However, it’s relatively quick & legitimate to make the magic happen. The steps are:

1) Install Microsoft Windows SDK v7. Make sure the Compiler options is ticked!
2) Now you can install updated compilers by downloading this patch.
3EDIT: Oh yes, you might also be interested in downloading a patch for a patch, so that your patch is patched. 😉

Now it only takes an /analyze argument to convince the compiler to take a closer look at the files it actually digests 🙂

Spherical Harmonics (7): Battle-field report

I’ve recently taken a few days off from work which implies I can spend some more time on my pet projects 🙂 Unfortunately, for various reasons we won’t be releasing a PC demo this Revision (yeah, well..), so instead I decided to put the free hours before the party to some fruitful and good use.

I’ve already found a few quirks in my SH implementation which caused the bounces and intersections to work.. well, a bit clumsily and incorrectly. However, this was only a side-effect of my actual task which was introducing support for multiple bounces for the inter-reflected transfer case:

scrshot_sh5Seems to work pretty neatly – multiple ray bounces give an interesting fairy-tale look 🙂

Transform feed-back: watch out for an output array variable catch!

Transform feed-back is one of the features in OpenGL 3.x that are very handy if you need to perform any calculations which are of iterative nature and are more interested in performance, rather than in flexibility that OpenCL offers. Watch out though, as the latter solution might prove to be more reliable – as it turns out.

Consider the following GLSL snippet:

#version 330 core

out result
{
    vec3 result[16];
} Out;

void main()
{
    for (int n = 0; n < 16; ++n)
    {
        result[n] = n;
    }
}

Trivial so far. Now, what I would expect to see after using this shader for a transform feed-back operation configured to capture result varying would be to have a certain region of target buffer object filled with sequence of numbers.

WRONG! (..most of the time)

At least with my favourite NVIDIA driver, that is 🙂 The latest version (314.21) do either of the three things:

  • Buffer object is filled with QNBs; (seems to happen 45% of the time)
  • Buffer object is filled with zeroes; (45% again)
  • Buffer object is filled with expected values (remainder of the attempts)

In case you’re curious – no, no errors are reported in the process.

Let’s take a quick look at OpenGL 3.3 Core Profile specification. Here’s a nice excerpt from p. 98:

When an  individual point, line, or triangle primitive  reaches
the  transform feedback stage  while transform feedback
is active, the values of the specified varying variables
of the vertex are appended to the buffer objects bound
to the transform feedback binding points. 

The  attributes  of  the  first  vertex  received  after
BeginTransformFeedback are written at the starting offsets 
of the bound buffer objects set by BindBufferRange, and
subsequent vertex attributes are appended to the buffer
object.

(..). 

When writing varying variables that are arrays, individual
array elements are written in order. 

For multi-component varying variables or varying array
elements, the individual components are written in order.

Well then, seems like theory is to practice as practice is to theory. The only solution I could come up with is to revert to le good ol’ gl_VertexID friend. At least it works 🙂

1355989300272