Implementing Variance Shadow Maps support in Emerald. Over 80% is already done. I’m planning to post a few screenshots after the feature is in.
Hey, been some time!
There’s quite a lot of stuff happening right now in my commercial life, so – while still on the wave – I’ve decided to open-source my rendering engine implementation to the community.
If you’re interested in having a look at what I’m currently working on, or simply need a reference base of some sort, feel free to have a look at https://github.com/kbiElude/Emerald. 🙂
My previous engine has not been made available (mostly because it would take a huge amount of time to clean it up, and – as you might’ve guessed already – free time is a pretty precious scarce resource for me at the moment). Same goes for CamMuxer and the three demos Elude released. I don’t think these will ever see the light of the day, but hey – you never know!
The engine is written specifically with what we’ve learned as a team, having worked on the three demos. It’s definitely not aimed at game developers, and does not have a fancy UI (did I mention that I despise writing those already?). Emerald is basically a toolset that you can use to write tools, PoC applications or – you name it! – demos, and that’s the direction in which I think the development will take in the upcoming years.
You can create ES 3.1 & GL 4.2 contexts with Emerald with just a few lines of code, and it allows you to jump into writing prototypes in a matter of seconds. It’s got a support for reading a number of image formats, has a built-in OpenCL inter-op support, can read COLLADA files and store them in an engine-friendly format. There’s also a few other pieces of functionality that have been implemented along the way while I was porting previous engine to the latest concept the Emerald is being written around. So yeah, that’s basically it.
The engine has not been tested on non-NVIDIA platforms. I do believe that there will come a day when I’ll have to spend a few days, fixing shaders-n-stuff, but for now there are more interesting things on the horizon I’d like to take care of, first. Consider yourselves warned 🙂
Seems like I’m done with a proof-of-concept implementation of an OpenCL+OpenGL particle collision tester 🙂
For the curious, white line segments represent direction in which a collision would have been considered for the frame (but not necessarily affecting the particle movement). Blue lines represents velocity vectors (unnormalized)
One of the things which are very peculiar about graphics development is how tricky it is to hunt bugs down. You often have to resort to visual tracing in order to actually be able to experience the “ah-hah!” Archimedes moment.
I have spent a great deal of my free time recently to track down the problem that was causing my particles to fall behind geometry. Were it not the visual hints, I would probably have not been able to tell for the next month that it was AABB/ray intersection test implementation in my OpenCL code that was returning negative values for cases which should be reporting collisions.
Oh well. Let’s move on.
Trivia for today:
Q: What happens if you dlopen() for a library that has already landed in the process space thanks to being dynamically linked with the process considered?
A: Obvious, stupid! The library will load in, overwriting already initialized library-specific variables and making massive waves of doom and destruction. What’s worse, if your current life’s feature set comes with a really bad karma, the library that gets loaded may even come from a different location than the one you linked with, causing ultimate ragnarok and software demise.
Reference counters? lol!
Spawning child processes under Windows and worried about opened handles’ counter going through the roof, even though they all gently quit and are no longer listed in system process list?
Oh you should!
I’ve been working on a certain piece of software for over a year now and it’s not happened until today that we spotted and fixed a gruesome bug hiding in a location that is executed very, very often. It would have probably stayed there and happily lived ever after if it wasn’t stress tests that we finally got to run in their full glory for the very first time. I don’t want to/am not allowed to get into too much level of detail, so let’s just say we were extremely surprised to find out that our lovely & shiny piece of software caused Windows 7 to come to a complete halt after 2 hours of non-stop tests (having first crashed Google Chrome and Process Explorer I usually keep opened). Worse, the use case we were stress-testing was expected to happen every single time somebody attempted to launch our tool. If they were to use the product for longer than an hour, it wouldn’t lead anywhere good.
Topping it all off, it’s probably not going to come as a surprise that we had only 6 work-days left till project dead-line 🙂 That’s life.
Having panicked for an hour or two and having tested a rather hacky workaround that could awfully backfire and lead to ugly side effects, not to mention it would require off-site person’s effort, I launched the tests again and started sniffing around. Deferred crash, slow-down effect over time, overall system damage – why? What could be the reason?
At moments like these it’s very useful to have Process Explorer at hand (okay, a cup of coffee is also highly recommended), just to look at a variety of counters it shows you. They provide you with a lot of insight into how every single active process behaves. Recently they even implemented a new feature that allows you to check GPU activity and memory usage! If you happen to have a spare monitor that you don’t know what to do with (right 🙂 ), keep the tool full-screen. Trust me: I learned my lesson the hard way today 🙂
Any road, when I focused on the stress-test process that had been running for a few minutes, I instantly noticed that there were thousands and thousands of zombie process and thread handles. The processes they represented were long gone, not to mention the threads which should have died a long time ago, but no. How come! You don’t need to release a child process handle under Windows, you don’t get one when doing a CreateProcessA() call, do you?
Well, for Linux programmers it’s probably an easy spot – yes, you do. However, under Windows, the handle is not returned by stack so it’s a potential banana skin. Instead, you need to dive into process_info structure, that you are expected to provide a pointer to as one of the arguments, and there you’ll find it.
Also, judging by what MSDN page for the function says, it’s a good idea to release main thread handle for the newly spawned process, because otherwise reference counting may hit you hard.
After we started giving the child process’ handles the care they deserved, the problem disappeared and we could call it a day 🙂
All in all, the lesson learned today is that stress tests are vital. And the easier you automate them, the better.
Having passed an unsigned char* argument to your kernel and cast it to a different type in your kernel, and now getting a CL_OUT_OF_RESOURCES whenever you try to map the buffer to user-space after the kernel has executed with (alleged) success? Ensure you’re doing a proper space cast! For instance, instead of doing:
float3 albedo = vload3(0, (float*) input_data);
..you should be more precise, for instance as presented below:
float3 albedo = vload3(0, (__global float*) input_data);
Sadly, NViDiA drivers will not tell you where the real problem is but just drop dead with an error whenever you try to have a look at the result buffer data..
For the last two weeks I had been working with Zavie on bringing CamMuxer to next level of perfection by making it slightly more compatible with various hardware configurations, not just with my box which – quite accidentally – used to play a role of the streaming server last WeCan 2012 🙂 When ran on configuration that meets all the weird requirements I enforced in Emerald engine for various reasons (including, but not limited to: laziness, time management efficiency, NViDiA fanboyism – pick any or all options of your interest 😉 ), the tool is – how unhumbly of me! – quite stable, due to the fact that no client-side/video memory allocations are done within the rendering pipeline (excluding “camera plugged in” event handlers, but let’s not go too hardcore). The tool is also quite fast, since it has been written with multi-threading in mind from the very beginning – each camera gets its own rendering context, updates texture objects in its own time slice, and is completely separated from the main rendering thread (which refreshes with 60 FPS frame-rate).
Zavie contacted me a couple of weeks ago and asked me if I could give him a hand with setting up a web-stream for his Tokyo Demo Fest 2013 demo-scene party, which is happening next week-end in Japan. We started off with the tool crashing hard when used with nearly all cameras Zavie had at hand (and the time zone differences were not helping either, as Japan is 8 hours ahead of Poland!). After a few days it started to be clear that if you are ever to work with web-cameras and DirectShow, you *need* to have access to various models as there’s close-to-none information as to what to expect from the cameras out there on the market. Here’s a couple of sample assumptions I made.. which turned out to absolutely *wrong*:
Axiom: All web cameras out there obviously expose RGB24 feeds;
Some of the web cameras we had a chance to work with only exposed YUY2 feeds. While YUV is not particularly difficult to tackle (as in: to convert to RGB format), this assumption was probably the most deadly of them all, as it soon proved to have been but a tip of an iceberg. As we continued to heat the ice tile, it didn’t take much time to find out there were some pretty nasty bugs down there, which were related to switching between feed formats, and which had to take some time in order to be remotely debugged and fixed.
Axiom: All feeds that web cameras expose can be freely launched;
One of the web cameras we had the pleasure to work with happily informed of a twice Full-HD resolution feed, available at 30 FPS. Considering the bandwidth of USB2.0 bus and amount of traffic the feed would cost per second, this seemed rather immoral. And, well, guess what – it wasn’t real at all. In fact, if you requested for it, the driver instantly reported an error. Still, that feed must have made quite a couple of good screen-shots (marketing-wise 😉 )
Axiom: Switching between media formats is a piece of cake;
Switching between different resolutions for the same format does work flawlessly. However, if you ask Media Control interface to switch to a feed using different data-type, things take a different turn. What you *usually* get is an error – can’t do, nope. The only action you can take from this point is to re-instantiate the whole DirectShow graph.. and that’s slippy from the stability point of view 🙁
Axiom: No need to care about native feed’s data format – I can always ask DirectShow to create a graph that will feed the sample grabber with RGBxx data;
I agree that it is usually worth to ask as many questions as you can (assuming they make the picture clearer, that is) but DirectShow does not have a magic wand at its hands. Some of the cameras we had access to exposed H.264 feed which is quite unfunny to work with if you need to quickly create a GPU-specific solution. These feeds could not have been converted in software to RGB24 – perhaps we missed some filters. But since the rule of a thumb is to always assume the worst-case scenario, I quickly dropped support for the feed format.
All in all, CamMuxer v2 is now finished and it’s very likely going to be used for streaming purposes at the party. I’m starting to seriously consider publishing the tool for all the party organizers out there who would like to see a multi-camera feed broadcasted in the network but don’t have the necessary tools of the trade (or monies to buy the licenses) to do so. If you happen to be looking for a similar solution, drop me a line – the only reason I haven’t put it up on the web-site yet is that I’m afraid I wouldn’t be able to handle all the bug reports that might start to flow in the moment more than just one or two folks start to play with it. Looking at the number of problems we had with just a number of cameras, it’s.. inevitable to happen 🙂
While trying to get the software to run, some new ideas on how to expand the tool so that it’s even more useful to party organisers came to my mind, so you can expect some more posts about CamMuxer in the next months 🙂