All posts by kbi

Particle emitter (1)

Experimenting a little bit with particle emitters..

The particles you can see above bounce over an infinite plane and a sphere that is in the origin of the scene. This is simple Transform Feedback at work – it costs literally nothing (as the numbers tell) and the renderer is capped at 60 FPS.

For the next stop, I’m planning to have the particles bounce over a mesh represented with a Kd tree.

dlopen() catch

Trivia for today:

Q: What happens if you dlopen() for a library that has already landed in the process space thanks to being dynamically linked with the process considered?

A: Obvious, stupid! The library will load in, overwriting already initialized library-specific variables and making massive waves of doom and destruction. What’s worse, if your current life’s feature set comes with a really bad karma, the library that gets loaded may even come from a different location than the one you linked with, causing ultimate ragnarok and software demise.

Reference counters? lol!

Framebuffer blitting catch

Sorry for the recent lack of updates – I’ve been pushing pedal to the metal for some time now in my commercial life, in which an interesting and challenging project emerged and took control of all my development mana. What can you do.

I don’t expect the situation to change till the end of July, so in absence of time I’d have to spend to prepare some more engaging posts, I decided to share some quickly-hackable GL/GLES hints that caught me off-guard @ work. Hope they save somebody’s effort!

Here’s a piece that is far from obvious (at least it was a far cry from being one for me! 🙂 ) and which had me blocked for a day or two before I found out what the actual reason for the rather baffling misbehavior was:

Framebuffer blits are affected by scissor test.

In other words, if you care about having your data transffered from one attachment point of your read framebuffer to another attachment of your draw framebuffer as a whole, make sure you either use a sufficiently large scissor window (slippy) OR disable it before you call glBlitFramebuffer() (safer). Trust me, it won’t hurt and can potentially save you a lot of hassle.

Static Code Analysis in VS2010 Express

If you install Visual Studio 2010 Express, what you may find missing is support for static code analysis which is claimed to be only available in paid editions.

However, it’s relatively quick & legitimate to make the magic happen. The steps are:

1) Install Microsoft Windows SDK v7. Make sure the Compiler options is ticked!
2) Now you can install updated compilers by downloading this patch.
3EDIT: Oh yes, you might also be interested in downloading a patch for a patch, so that your patch is patched. 😉

Now it only takes an /analyze argument to convince the compiler to take a closer look at the files it actually digests 🙂

Spherical Harmonics (7): Battle-field report

I’ve recently taken a few days off from work which implies I can spend some more time on my pet projects 🙂 Unfortunately, for various reasons we won’t be releasing a PC demo this Revision (yeah, well..), so instead I decided to put the free hours before the party to some fruitful and good use.

I’ve already found a few quirks in my SH implementation which caused the bounces and intersections to work.. well, a bit clumsily and incorrectly. However, this was only a side-effect of my actual task which was introducing support for multiple bounces for the inter-reflected transfer case:

scrshot_sh5Seems to work pretty neatly – multiple ray bounces give an interesting fairy-tale look 🙂

Transform feed-back: watch out for an output array variable catch!

Transform feed-back is one of the features in OpenGL 3.x that are very handy if you need to perform any calculations which are of iterative nature and are more interested in performance, rather than in flexibility that OpenCL offers. Watch out though, as the latter solution might prove to be more reliable – as it turns out.

Consider the following GLSL snippet:

#version 330 core

out result
    vec3 result[16];
} Out;

void main()
    for (int n = 0; n < 16; ++n)
        result[n] = n;

Trivial so far. Now, what I would expect to see after using this shader for a transform feed-back operation configured to capture result varying would be to have a certain region of target buffer object filled with sequence of numbers.

WRONG! (..most of the time)

At least with my favourite NVIDIA driver, that is 🙂 The latest version (314.21) do either of the three things:

  • Buffer object is filled with QNBs; (seems to happen 45% of the time)
  • Buffer object is filled with zeroes; (45% again)
  • Buffer object is filled with expected values (remainder of the attempts)

In case you’re curious – no, no errors are reported in the process.

Let’s take a quick look at OpenGL 3.3 Core Profile specification. Here’s a nice excerpt from p. 98:

When an  individual point, line, or triangle primitive  reaches
the  transform feedback stage  while transform feedback
is active, the values of the specified varying variables
of the vertex are appended to the buffer objects bound
to the transform feedback binding points. 

The  attributes  of  the  first  vertex  received  after
BeginTransformFeedback are written at the starting offsets 
of the bound buffer objects set by BindBufferRange, and
subsequent vertex attributes are appended to the buffer


When writing varying variables that are arrays, individual
array elements are written in order. 

For multi-component varying variables or varying array
elements, the individual components are written in order.

Well then, seems like theory is to practice as practice is to theory. The only solution I could come up with is to revert to le good ol’ gl_VertexID friend. At least it works 🙂


Spherical Harmonics (5): Battle-field report

There was a lot of general engine-centric work that had to be done before I could introduce per-vertex albedo support to my SH implementation, mostly focused on getting the multi-surfaced meshes to work with both OpenCL and OpenGL. After a few rather laborious days I finally wrote all new parts that were needed, so here: have a screen-shot proof 🙂

Things are starting to look really good! Moving on to the most interesting part which is interreflected transfer !


Programming pouts: Process spawn trap!

Spawning child processes under Windows and worried about opened handles’ counter going through the roof, even though they all gently quit and are no longer listed in system process list?

Oh you should!

I’ve been working on a certain piece of software for over a year now and it’s not happened until today that we spotted and fixed a gruesome bug hiding in a location that is executed very, very often. It would have probably stayed there and happily lived ever after if it wasn’t stress tests that we finally got to run in their full glory for the very first time. I don’t want to/am not allowed to get into too much level of detail, so let’s just say we were extremely surprised to find out that our lovely & shiny piece of software caused Windows 7 to come to a complete halt after 2 hours of non-stop tests (having first crashed Google Chrome and Process Explorer I usually keep opened). Worse, the use case we were stress-testing was expected to happen every single time somebody attempted to launch our tool. If they were to use the product for longer than an hour, it wouldn’t lead anywhere good.


Topping it all off, it’s probably not going to come as a surprise that we had only 6 work-days left till project dead-line 🙂 That’s life.

Having panicked for an hour or two and having tested a rather hacky workaround that could awfully backfire and lead to ugly side effects, not to mention it would require off-site person’s effort, I launched the tests again and started sniffing around. Deferred crash, slow-down effect over time, overall system damage – why? What could be the reason?

At moments like these it’s very useful to have Process Explorer at hand (okay, a cup of coffee is also highly recommended), just to look at a variety of counters it shows you. They provide you with a lot of insight into how every single active process behaves. Recently they even implemented a new feature that allows you to check GPU activity and memory usage! If you happen to have a spare monitor that you don’t know what to do with (right 🙂 ), keep the tool full-screen. Trust me: I learned my lesson the hard way today 🙂

Any road, when I focused on the stress-test process that had been running for a few minutes, I instantly noticed that there were thousands and thousands of zombie process and thread handles. The processes they represented were long gone, not to mention the threads which should have died a long time ago, but no. How come! You don’t need to release a child process handle under Windows, you don’t get one when doing a CreateProcessA() call, do you?

Well, for Linux programmers it’s probably an easy spot – yes, you do. However, under Windows, the handle is not returned by stack so it’s a potential banana skin. Instead, you need to dive into process_info structure, that you are expected to provide a pointer to as one of the arguments, and there you’ll find it.

Also, judging by what MSDN page for the function says, it’s a good idea to release main thread handle for the newly spawned process, because otherwise reference counting may hit you hard.

After we started giving the child process’ handles the care they deserved, the problem disappeared and we could call it a day 🙂

All in all, the lesson learned today is that stress tests are vital. And the easier you automate them, the better.


NViDiA drivers & OpenCL (1): Getting a CL_OUT_OF_RESOURCES error?

Having passed an unsigned char* argument to your kernel and cast it to a different type in your kernel, and now getting a CL_OUT_OF_RESOURCES whenever you try to map the buffer to user-space after the kernel has executed with (alleged) success? Ensure you’re doing a proper space cast! For instance, instead of doing:

float3 albedo = vload3(0, (float*) input_data); should be more precise, for instance as presented below:

float3 albedo = vload3(0, (__global float*) input_data);

Sadly, NViDiA drivers will not tell you where the real problem is but just drop dead with an error whenever you try to have a look at the result buffer data..