Spherical Harmonics (4): Battle-field report

Finally got my GPU-based Kd-tree traversal implementation to work! With the tool at hand, there’s nothing stopping me from calculating visibility for model vertices in an off-line manner, converting the data to SH representation and then using it for AO calculation within my engine 🙂 Unfortunately, GTX 570 is far too slow to do the calculations in real-time – a rather typical test scene presented below, consisting of ~25k vertices, takes about 1 minute to calculate the data (with 2500 rays shot from every point) and I estimate available optimisation potential to be placed somewhere near 50%, which leaves us with 30 seconds needed for precalculations; okay for a demo loader, but nowhere near real-time. I’m using OpenCL for the purpose – sticking to a shader-based approach would probably have been a bit faster but I’d lose the neat extendability I can afford when using CL.

Compared to my previous engine (which laid ground for Three Fourth, Futuricon and Suxx), current state of the scene importing layer in Emerald is that it lacks nearly everything 😉 Having finished the visibility precalc work, I will now be able to move on with support for in-direct bounces so expect some more interesting screen-shots in a week or two!

scrshot_sh1

 

scrshot_sh

CamMuxer and Tokyo Demo Fest 2013

For the last two weeks I had been working with Zavie on bringing CamMuxer to next level of perfection by making it slightly more compatible with various hardware configurations, not just with my box which – quite accidentally – used to play a role of the streaming server last WeCan 2012 🙂 When ran on configuration that meets all the weird requirements I enforced in Emerald engine for various reasons (including, but not limited to: laziness, time management efficiency, NViDiA fanboyism – pick any or all options of your interest 😉 ), the tool is – how unhumbly of me! – quite stable, due to the fact that no client-side/video memory allocations are done within the rendering pipeline (excluding “camera plugged in” event handlers, but let’s not go too hardcore). The tool is also quite fast, since it has been written with multi-threading in mind from the very beginning – each camera gets its own rendering context, updates texture objects in its own time slice, and is completely separated from the main rendering thread (which refreshes with 60 FPS frame-rate).

Zavie contacted me a couple of weeks ago and asked me if I could give him a hand with setting up a web-stream for his Tokyo Demo Fest 2013 demo-scene party, which is happening next week-end in Japan. We started off with the tool crashing hard when used with nearly all cameras Zavie had at hand (and the time zone differences were not helping either, as Japan is 8 hours ahead of Poland!). After a few days it started to be clear that if you are ever to work with web-cameras and DirectShow, you *need* to have access to various models as there’s close-to-none information as to what to expect from the cameras out there on the market. Here’s a couple of sample assumptions I made.. which turned out to absolutely *wrong*:

Axiom: All web cameras out there obviously expose RGB24 feeds;
Truth:

Some of the web cameras we had a chance to work with only exposed YUY2 feeds. While YUV is not particularly difficult to tackle (as in: to convert to RGB format), this assumption was probably the most deadly of them all, as it soon proved to have been but a tip of an iceberg. As we continued to heat the ice tile, it didn’t take much time to find out there were some pretty nasty bugs down there, which were related to switching between feed formats, and which had to take some time in order to be remotely debugged and fixed.

Axiom: All feeds that web cameras expose can be freely launched;
Truth: 

One of the web cameras we had the pleasure to work with happily informed of a twice Full-HD resolution feed, available at 30 FPS. Considering the bandwidth of USB2.0 bus and amount of traffic the feed would cost per second, this seemed rather immoral. And, well, guess what – it wasn’t real at all. In fact, if you requested for it, the driver instantly reported an error. Still, that feed must have made quite a couple of good screen-shots (marketing-wise 😉 )

Axiom: Switching between media formats is a piece of cake;
Truth:

Switching between different resolutions for the same format does work flawlessly. However, if you ask Media Control interface to switch to a feed using different data-type, things take a different turn. What you *usually* get is an error – can’t do, nope. The only action you can take from this point is to re-instantiate the whole DirectShow graph.. and that’s slippy from the stability point of view 🙁

Axiom: No need to care about native feed’s data format – I can always ask DirectShow to create a graph that will feed the sample grabber with RGBxx data;
Truth:

 

I agree that it is usually worth to ask as many questions as you can (assuming they make the picture clearer, that is) but DirectShow does not have a magic wand at its hands. Some of the cameras we had access to exposed H.264 feed which is quite unfunny to work with if you need to quickly create a GPU-specific solution. These feeds could not have been converted in software to RGB24 – perhaps we missed some filters. But since the rule of a thumb is to always assume the worst-case scenario, I quickly dropped support for the feed format.

All in all, CamMuxer v2 is now finished and it’s very likely going to be used for streaming purposes at the party. I’m starting to seriously consider publishing the tool for all the party organizers out there who would like to see a multi-camera feed broadcasted in the network but don’t have the necessary tools of the trade (or monies to buy the licenses) to do so. If you happen to be looking for a similar solution, drop me a line – the only reason I haven’t put it up on the web-site yet is that I’m afraid I wouldn’t be able to handle all the bug reports that might start to flow in the moment more than just one or two folks start to play with it. Looking at the number of problems we had with just a number of cameras, it’s.. inevitable to happen 🙂

While trying to get the software to run, some new ideas on how to expand the tool so that it’s even more useful to party organisers came to my mind, so you can expect some more posts about CamMuxer in the next months 🙂

Bug-fixing: DirectShow and Flash Player integration

Trying to integrate your DirectShow input filter with Adobe Flash player and can’t get it work? Chances are you started out with Vivek’s sample implementation which features a not-so-trivial omission which causes Chrome to simply refuse playback of the video stream, even though no error is shown. What’s worse is that other DirectShow-dependent applications (Adobe Live Encoder, Skype, VirtualDub, etc.) will continue to work – it seems as if there was something wrong with Flash.. but then other drivers work – how come?

The problem is – as usual – banal 🙂 It’s one of those issues that are difficult to spot but trivial to fix, once you know where to look. In our case, we need to focus at:

HRESULT STDMETHODCALLTYPE CCamMuxerStream::SetFormat(AM_MEDIA_TYPE* media_type_ptr)
{
   IPin* pin_ptr = NULL;

   m_mt = *media_type_ptr;
   
(...)

Easy peasy – the original example does not do a null-pointer check! If you return S_OK whenever media_type_ptr is NULL, things go extra fine! Just like in the gif animation below:

1357589524021

OpenCL and NViDiA: Pitfalls (1)

Q:  What are some of the potential causes for this error reported verbosely by NViDiA drivers using context notification call-backs, apart from the obvious catches mentioned in OpenCL 1.1 specification?

opencl_heavenA: Well, if you ever meet the funny fellow, please make double sure you:

– meet type alignment requirements (no, really, single floats must be aligned to 4 in your memory buffer!)
– do not cast global-space pointers of type A to private-space ones (which is the default behaviour if you skip the __global keyword!) of type B.
– edit: initialize default values for all private variables. Yes. I have just fixed an issue that was giving awfully weird and totally unexpected side effects just because it was not initialized and I had the audacity to query value of the variable.. until now!

It’s just so EASY to ignore these rules, the driver acts like an attercop, being nothing but helpful, and then you suddenly have that “Oh shit, no!” moment out of nowhere when you suddenly learn the lesson and feel as depicted below.

1355989327558

LW exporter plug-in (2): How about we cast some rays?

The last couple of days were quite fun, thanks to a rather weird quirk in Lightwave SDK. Apparently, the software does its best to convince you it’s never a good idea to have a wish about being able to throw a couple of rays throughout the scene, unless you’re into volumetrics or you happen to be writing a shader implementation. If that’s the case – congratulations, you may stop holding your breath now 🙂

However, if you ever happen to be in a position where you’d like to cast a few rays around the scene from within that export plug-in of yours, especially the one that you have already spent a couple of hours on, coding it in native C instead of LWScript in order to get access to lower-level stuff, let me cut the chase and get to the rotten core.
The question is: will it happen? Even if I’m only interested in “has it intersected with any kind of geometry?” information, and I don’t really give a single penny about color of the triangle that I have bombed with my ray? Well, the answer is:

Just Chuck Testa :)

And no, the logical trick as depicted below:

does not work either 😉 Ray-casting in Lightwave appears to be reserved for a very narrow set of plugins only and there’s nothing you can do about it. That is, unless you are desperate enough to hack your way through by writing yet another plug-in that is completely outside the scope of your original interest, but that would share the functionality with your export plug-in by some sort of inter-plugin request queueing. Let’s be honest though – it’s really last resort stuff that I personally would only reserve for commercial  projects. It’s very likely to be a time-consuming, dull and painful experience and pet projects are supposed to be fun, right?

The limitation was a very nasty surprise and, frankly, Lightwave had me caught off guard for a moment. I simply needed the visibility information on a per-vertex basis in order to be able to do SH-based AO for meshes that I’d like to export using the tool and the last thing that would have crossed my mind is that such functionality is a no-go for export plug-ins.
Given that the triangle count for some of the scenes is very likely to exceed million or two and I need to be able to cast at least 100.000 rays for every single point in a reasonable amount of time, I certainly didn’t want to go with naive, brute-force O(N^2) approach.

So, the cunning plan for the next week that I came up with is:

* Implement CPU-based Kd tree generation for meshes;
* Implement parallelized GPU-based ray-triangle intersection kernel that makes use of the filled structure and allows you to throw a lot of rays in one go, filling a hit/didn’t hit cell for each ray;
* Using the two features above, implement visibility information exporting in the plug-in.

The first bullet-point is now ready 🙂 To prove it, here’s a quick screen-shot:

kdtree1

More are yet to come, of course – I’m really interested in seeing what the GPU performance is going to be like for this task!

 

*after-update sigh*

Hello latest public NViDiA drivers (310.70, for what it matters) , your argument is invalid..

scrshot_nvidia
Please click on the screenshot to open it in new tab – full-size version breaks WordPress layout

Edit: I worked out what the warning is about. In my case, the drivers seem to be worried whenever they bump on textures that lack mip-maps, even though the minification filter I use for the texture sampling does not require those to be present! Makes some sense, since the default maximum level for textures is 1000 (at least for GL3.3). What I therefore had to do was setting GL_TEXTURE_MAX_LEVEL property to 0 for the texture; having done that, the notification is now gone.

All in all, the error makes *some* sense, but the value it calls an identifier of texture is a far cry from one. I think it might be referring to a texture unit instead, but I haven’t checked. 

Edit 2: The value referred to in the warning message is a texture unit, not the texture id.

LW exporter plug-in (1): Battlefield report

It’s been quite a few busy days recently. Owing to the fact I’m currently rewriting Elude demo engine from scratch, it’s mostly quite boring, visually-lacking odds and sods that you expect every engine out there to have but never think about coding by hand. To keep this blog alive though, I’ll be posting some screen-shots every now and then. 🙂

Currently, it’s Lightwave exporter plug-in and uber-shader that’s in the workings. After a couple of days, I finally managed to get to a point where the simplest single-layer normal/vertex data can be exported from LW, as well as loaded into the app and rendered to the buffers. No rocket science, obviously, but a good starting point to take a break and do some further Spherical Harmonics experiments before moving on to even more exciting details like UV mapping export 🙂 (agreed: boring in a sense that once you’ve been there, it’s a piece of cake, but before that it’s a freaking nightmare).

scrshot_plain
Yes, a touch of FXAA would do some justice here..

Spherical Harmonics (3): Why rotate?

During the last couple of days, I have had the indescribable joy of learning in practice about how tiny typos, understatements or mistakes in scientific papers can turn a lots-of-typing-but-that’s-it kind of an algorithm into a purgatory. And hell do I mean it 🙂

Rotating spherical functions expressed in spherical harmonics coefficients have been described by some academic people as simple, but don’t take their word – it’s a cynical irony.. unless all you intend to do is a little bit of shifting on the Z axis, in which case I will humbly agree and pass you the microphone. If your plans are more ill-willed or if you are striving for a generic XYZ solution which works for any reasonable number of bands, well then – bad news: prepare for a bumpy ride, things will hit the roof.

Now there are some very important works on the net that you’ll have definitely bumped upon, if you ever showed interest in the subject. These include papers like:

* “Rapid and stable determination of rotation matrices between spherical harmonics by direct recursion” by Ivanic et al;
* “Spherical Harmonics: The Gritty Details” by Green which is an excellent starting point  but ..bear with me :);
*  “Fast Approximation to Spherical Harmonic Rotation” by Krivanek et al;

What you quickly learn from reading these papers is that naive XYZ rotation of SH coefficients is a no-go if you intend to do it on a per-fragment manner – it’s just too many calculations and operating instructions for hardware of the man on the street. Instead, you need to hack your way through, tabulating data and doing a bit of a wacky stuff.

My intent today is not to give you an overview of available techniques – you can read about them by yourself, assuming you do not quickly get worn-out when shown integrals (or even double ones, I dare you!). Since this blog is – at least for the time being – mostly about showing off with stuff that is a far cry from anything that’s visually pleasible 😉 and giving  heads-up to people about rather unobvious coding problems I had faced in my programming career, please: let’s have a closer look at the but word in the bullet-point list above.

Robin Green, who is the person behind that “but”t-ed article, did an excellent job giving a quick walkthrough for various topics connected with working with Spherical Harmonics. Really, hats off. What I’m missing, however, is an errata which is basically nowhere to be found on the net.
One thing that I have spotted so far in there which is not flawless is the last appendix which contains various formulas necessary to calculate rotation matrices for all SH bands. If you implemented the algorithm, you’ll have noticed that your implementation accesses illegal array elements during the very first iteration! It seems as if the assumption:

-l <= m’ <= l

does not hold at all! Depending on how you implement the algorithm, in the worst case you may not even notice it and spend countless hours, trying to debug rotations, going mad, simultaneously observing your hair slowly receding to oblivion.

 What seems to be a solution in this case (seems, as I have not yet had a chance to visually verify this) is to recursively calculate the rotation SH coefficients for all <l, m, m’> triple combinations that are.. well, illegal. 🙂 What I know at this moment is that this approach does not cause a stack overflow and seems to be the way taken by other people as well (see this as an example).

    Update: 

Finally got the shaders right. From the looks of it, the right way to cope with the bug in the aforementioned paper is to use zeroes for coefficients that are outside the allowed domain. If you try to do the calculations anyway, you’ll end up with invalid floating-point values due to divide-by-zero operation getting into your way. Hope this saves somebody’s day 🙂

Spherical Harmonics: Fooling around (2)

Added SH-encoded light data preview – things finally start to look quite nice!

What I intend to work on next is modifying my existing LW exporter plug-in to use LW raytracing API, so that I could encode visibility & ray bouncing data and join them to regular scene data. By doing so, we would have a pretty nice toolchain for static background parts which we could use for our upcoming demo!

But then, frankly: I wonder if we’ll actually need in the end.. I’ve recently stumped upon ueber-interesting paper from Cyril Crassin et al on dynamic energy baking using voxel cone tracing (here, have a read!). and it landed extremely high on my to-do list. It looks amazing; if you’ve seen the recent showreel of the latest UDK from Epic Games, that’s the approach they used for GI. Aw yiss!

I’m not sure if the technique can be used without compute shaders at hand (which NViDiA drivers appear to lack, at least for GL3.3 core profile that I currently use in Emerald), but if I have to leverage the requirements in the end, hell! It’s worth it!

For the time being, here – fresh screenshot from the poor man’s SH-based AO solution. 😉

Spherical Harmonics: Fooling around (1)

I’m currently doing some experiments with Spherical Harmonics lighting, with all sample coefficients calculated (or rather: pre-calculated, as it – sadly! – turns out) on the GPU. I’ll be posting some screenshots in the next couple of days till I get fed-up with the technique and decide to move on 🙂
The screen-shots will be posted without any post-processing applied, so please forgive me for lack of any proper anti-aliasing / color grading / tone mapping / what-not. You have been warned.

(..and if you ask, yes: I’m aware of various artifacts in the shots.. 🙂 )

Random thoughts on 3D programming