Category Archives: Uncategorized

Spherical Harmonics (5): Battle-field report

There was a lot of general engine-centric work that had to be done before I could introduce per-vertex albedo support to my SH implementation, mostly focused on getting the multi-surfaced meshes to work with both OpenCL and OpenGL. After a few rather laborious days I finally wrote all new parts that were needed, so here: have a screen-shot proof 🙂

Things are starting to look really good! Moving on to the most interesting part which is interreflected transfer !

scrshot_sh3

*after-update sigh*

Hello latest public NViDiA drivers (310.70, for what it matters) , your argument is invalid..

scrshot_nvidia
Please click on the screenshot to open it in new tab – full-size version breaks WordPress layout

Edit: I worked out what the warning is about. In my case, the drivers seem to be worried whenever they bump on textures that lack mip-maps, even though the minification filter I use for the texture sampling does not require those to be present! Makes some sense, since the default maximum level for textures is 1000 (at least for GL3.3). What I therefore had to do was setting GL_TEXTURE_MAX_LEVEL property to 0 for the texture; having done that, the notification is now gone.

All in all, the error makes *some* sense, but the value it calls an identifier of texture is a far cry from one. I think it might be referring to a texture unit instead, but I haven’t checked. 

Edit 2: The value referred to in the warning message is a texture unit, not the texture id.

LW exporter plug-in (1): Battlefield report

It’s been quite a few busy days recently. Owing to the fact I’m currently rewriting Elude demo engine from scratch, it’s mostly quite boring, visually-lacking odds and sods that you expect every engine out there to have but never think about coding by hand. To keep this blog alive though, I’ll be posting some screen-shots every now and then. 🙂

Currently, it’s Lightwave exporter plug-in and uber-shader that’s in the workings. After a couple of days, I finally managed to get to a point where the simplest single-layer normal/vertex data can be exported from LW, as well as loaded into the app and rendered to the buffers. No rocket science, obviously, but a good starting point to take a break and do some further Spherical Harmonics experiments before moving on to even more exciting details like UV mapping export 🙂 (agreed: boring in a sense that once you’ve been there, it’s a piece of cake, but before that it’s a freaking nightmare).

scrshot_plain
Yes, a touch of FXAA would do some justice here..

Spherical Harmonics (3): Why rotate?

During the last couple of days, I have had the indescribable joy of learning in practice about how tiny typos, understatements or mistakes in scientific papers can turn a lots-of-typing-but-that’s-it kind of an algorithm into a purgatory. And hell do I mean it 🙂

Rotating spherical functions expressed in spherical harmonics coefficients have been described by some academic people as simple, but don’t take their word – it’s a cynical irony.. unless all you intend to do is a little bit of shifting on the Z axis, in which case I will humbly agree and pass you the microphone. If your plans are more ill-willed or if you are striving for a generic XYZ solution which works for any reasonable number of bands, well then – bad news: prepare for a bumpy ride, things will hit the roof.

Now there are some very important works on the net that you’ll have definitely bumped upon, if you ever showed interest in the subject. These include papers like:

* “Rapid and stable determination of rotation matrices between spherical harmonics by direct recursion” by Ivanic et al;
* “Spherical Harmonics: The Gritty Details” by Green which is an excellent starting point  but ..bear with me :);
*  “Fast Approximation to Spherical Harmonic Rotation” by Krivanek et al;

What you quickly learn from reading these papers is that naive XYZ rotation of SH coefficients is a no-go if you intend to do it on a per-fragment manner – it’s just too many calculations and operating instructions for hardware of the man on the street. Instead, you need to hack your way through, tabulating data and doing a bit of a wacky stuff.

My intent today is not to give you an overview of available techniques – you can read about them by yourself, assuming you do not quickly get worn-out when shown integrals (or even double ones, I dare you!). Since this blog is – at least for the time being – mostly about showing off with stuff that is a far cry from anything that’s visually pleasible 😉 and giving  heads-up to people about rather unobvious coding problems I had faced in my programming career, please: let’s have a closer look at the but word in the bullet-point list above.

Robin Green, who is the person behind that “but”t-ed article, did an excellent job giving a quick walkthrough for various topics connected with working with Spherical Harmonics. Really, hats off. What I’m missing, however, is an errata which is basically nowhere to be found on the net.
One thing that I have spotted so far in there which is not flawless is the last appendix which contains various formulas necessary to calculate rotation matrices for all SH bands. If you implemented the algorithm, you’ll have noticed that your implementation accesses illegal array elements during the very first iteration! It seems as if the assumption:

-l <= m’ <= l

does not hold at all! Depending on how you implement the algorithm, in the worst case you may not even notice it and spend countless hours, trying to debug rotations, going mad, simultaneously observing your hair slowly receding to oblivion.

 What seems to be a solution in this case (seems, as I have not yet had a chance to visually verify this) is to recursively calculate the rotation SH coefficients for all <l, m, m’> triple combinations that are.. well, illegal. 🙂 What I know at this moment is that this approach does not cause a stack overflow and seems to be the way taken by other people as well (see this as an example).

    Update: 

Finally got the shaders right. From the looks of it, the right way to cope with the bug in the aforementioned paper is to use zeroes for coefficients that are outside the allowed domain. If you try to do the calculations anyway, you’ll end up with invalid floating-point values due to divide-by-zero operation getting into your way. Hope this saves somebody’s day 🙂

Spherical Harmonics: Fooling around (2)

Added SH-encoded light data preview – things finally start to look quite nice!

What I intend to work on next is modifying my existing LW exporter plug-in to use LW raytracing API, so that I could encode visibility & ray bouncing data and join them to regular scene data. By doing so, we would have a pretty nice toolchain for static background parts which we could use for our upcoming demo!

But then, frankly: I wonder if we’ll actually need in the end.. I’ve recently stumped upon ueber-interesting paper from Cyril Crassin et al on dynamic energy baking using voxel cone tracing (here, have a read!). and it landed extremely high on my to-do list. It looks amazing; if you’ve seen the recent showreel of the latest UDK from Epic Games, that’s the approach they used for GI. Aw yiss!

I’m not sure if the technique can be used without compute shaders at hand (which NViDiA drivers appear to lack, at least for GL3.3 core profile that I currently use in Emerald), but if I have to leverage the requirements in the end, hell! It’s worth it!

For the time being, here – fresh screenshot from the poor man’s SH-based AO solution. 😉

GLSL hint: Preventing unrolling

Having completely rewritten all the low-level parts of the engine with a completely new approach (details available at beer gardens on demo-scene parties I’ll be attending in the near future 🙂 ), I finally started working on a new set of effects I intend to use in next Elude‘s PC demo.

My current area of interest is a GPU-only Spherical Harmonics-based lighting implementation. Since the approach is more or less based on calculating approximations of a gazzilion+1 of integrals, the pre-calc shaders tend to run in long loops. The problem that I found myself in today was that after issuing a transform feedback-enabled glDrawArrays() call with the “rasterizer discard” mode enabled, the user-mode layer of the OpenGL implementation started sucking in lots of memory (>1 GB and increasing..), as well as blocked on the call. This is quite unusual, given the rather asynchronous nature of NViDiA’s implementation of OpenGL, which tends to enqueue the incoming commands in the command buffer and return the control flow to the caller at an instant!

Let me interrupt at this moment and shed some light on a different aspect that rendered the problem even more weird: Emerald stashes GL program blobs on hard-drive after linking finishes successfully, so that future linking requests will not (theoretically) require full recompilation of the attached shaders – assuming their hashes did not change, of course, in which case the engine goes the usual path.

Coming back to the problem. I played around the shader and tried commenting out various bits and pieces, checking how the modifications affected the driver’s behaviour. It wasn’t long till I discovered it was caused by the large number of iterations I was doing in the loop. One way to get around this would be to reinvent the approach and break down the loop into multiple passes  but it didn’t sound like a good idea. I expect the implementation to be rather demanding in terms of hardware and introducing CPU interventions where it’s not strictly necessary is something that I’d prefer to avoid.

After doing some hunting on them nets, I found a solution that appears to be NViDiA only. It works for me and I have no AMD hardware at hand to see if they also support the pragma that altered the compiler’s behaviour so I can’t tell if it’s portable – I can’t see any OpenGL extension which would describe the feature, so it’s probably a bad omen..

Turns out that what needs to be done is to use:

#pragma optionNV(unroll none) 

right before the loop in question, which prevents the shader compiler from endlessly trying to unloop it, forcing it to the “Just don’t!” line of thinking. The behaviour has a very large OOM potential and I’m really willing to learn why NViDiA decided to go with such deep unrolls. I can only imagine how painful that decision must be to all the folks working on fractal visualisations 🙂

An excerpt from the chronicles of x64 programming..

Looking for a solution for the following problem?:

fatal error LNK1112: module machine type 'X86' conflicts with target machine type 'AMD64' 

Guess what: it’s brain-dead simple to solve! Just head on to Project Properties/Linker/Command Line. See the “Additional Options” at bottom? It’s got a “/MACHINE:X86” line which shouldn’t be there. Just get rid of it and you’re set.

I’ve hit this problem for so many times now that I think this post might actually save some of those precious hours you don’t have for spending time on hunting down such trivial bugs..

On performance bottlenecks..

A quick thought for today evening:

Pushing 36,000KB of data from VRAM to RAM per second by means of doing 180 glReadPixels() calls per second is a bad, bad idea..

*gasp*. I’ve always known that reading pixels off a color buffer of either type of FBO is not the gentlemen’s way of behaving, but – to be honest – it’s the first time I’m hitting the dreaded bus throughput problem. The issue I’m getting is that the rendering output appears jerky and the jittering appears to be happening in rather random delays. Yes, random enough to ruin the whole smooth experience 🙂 Oh well, with 60 FPS set as the desired frame-rate, it was bound to happen.

Problem is, CamMuxer is already complex enough for such a little pet project I intended it to be at the beginning, but looks like there’s no other way to overcome performance bottlenecks I’m seeing but by introducing PBOs to the pipeline, making the solution even more twisted than it already is.

Funny thing about it is that the jerky updates come into play only if you start moving the cursor around. My guess is that it could have something to do with more frequent, system-enforced context switching occuring due to necessary window repaints. May it be that  Microsoft might have finally started to hardware accelerate GDI with the advent of Windows 7?