In my last update, I discussed using Socket.io for the implementation of persistent server connections in our music visualizer app. This week, we inch closer to our ultimate goal of rendering thirty frames per second with Three.js and sending them to the to the server for rendering to video. Great strides were made, but I also discovered performance issues that have serious implications for our rendering approach.
No Time to Dally
First off, let’s break down our time budget. We get 1000ms every second, and if we want to produce 30 frames per second, then we can only spend a maximum of 33.33ms on each frame. That’s not much, and depending on scene complexity and the client hardware, it could take that long just to render the frame.
However, our client is currently achieving rates as high as 60fps, so my thinking was that with a mandate to produce only half that amount of frames, we have somewhere in the neighborhood of 500ms per second to spare. During that time, we need to get the rendered frame from the canvas and send it to the server.
And why are we concerned about producing the video in realtime? Once the frames are turned into a movie, it doesn’t matter if it took an hour to produce each frame, right? Heck, back in 1995, Toy Story reportedly took anywhere from 2 to 15 hours per frame to render, but the movie still flowed across the screen like pure awesome sauce. So what gives?
Well, if it were only visuals, I wouldn’t care. But we’re creating a music visualizer, which samples the audio throughout playback, doing spectrum analysis to extract low, mid, high, and overall volume values. Certain attributes of the Three.js scene objects (such as size and position) can be modulated by these values each time a frame is rendered. That audio processing also counts against our per-frame budget of 33.33ms. And if our rendering gets out of sync with the audio, the resulting video will almost certainly not be what the user expects. Thus, our challenge is to pull this off in realtime.
Offloading Frame Transmission to a Web Worker
By choosing to use sockets, we make the per frame transmission less chatty than a series of HTTP POST requests. But we don’t want to do the actual transmission inside the render function, because even using sockets, we might not have time to extract the image data AND send it.
Sending the frames to the server is a perfect job for a web worker, which runs in its own thread and therefore (for the most part) doesn’t impact the performance of the JavaScript running in the page.
Also, until a frame has been sent across the connection, we don’t want to start sending another one. It’s true that using HTTP POST would allow parallel transmission, but only up to six requests to the same domain, depending on the browser. Some only support two or four. And since we’ll be using a separate thread for this part, it doesn’t matter how long it takes to send all the frames.
So the current strategy looks like this:
- The script in the page renders a frame, extracts the data from the canvas, and passes it to the web worker.
- The worker will queue up frames coming from the page and keep sending frames until the page tells it that we’re done and the queue has been drained.
Where the Rubber Hits the Road
In order to test this theory, we need a client page, a web worker, and a server. While researching, I found a kindred spirit who’d traveled this path before me, and I followed his example of using the Three.js ‘your first scene’ code, which renders a rotating cube.
In my case, since time is of the essence, I’m not rendering to the screen, but I’ve left the commented-out line of code which does. It adds approximately 30ms to the frame render time. Also, I’m specifying the 1920×1080 frame size that we need for 1080p video rather than the window size. And I’ve added worker communication, timing calculation, and progress display into the mix.
The Page
<!-- test-render-client.html --> <script src="https://rawgithub.com/mrdoob/three.js/master/build/three.js"></script> <script> // Create scene and renderer, connect to canvas var scene = new THREE.Scene(); var camera = new THREE.PerspectiveCamera(75, 1920/1080, 0.1, 1000); var geometry = new THREE.CubeGeometry(1,1,1); var material = new THREE.MeshBasicMaterial({color: 0x00ff00}); var cube = new THREE.Mesh(geometry, material); var renderer = new THREE.WebGLRenderer(); scene.add(cube); camera.position.z = 5; renderer.setSize(1920,1080); // HD / 1080p // Progress div var outputDiv = document.createElement("div"); document.body.appendChild(outputDiv); // Add renderer's canvas to the DOM (it's faster not to, though) var canvas = renderer.domElement; // document.body.appendChild(canvas); // adds ~30ms / frame // Create a web worker to transmit the frames on another thread var worker; if (typeof(Worker) !== 'undefined') { worker = new Worker('test-render-worker.js'); } else { throw new Error('No Web Worker support'); } // Create a predetermined number of frames then disconnect var message = null; var ordinal = 0; var cutoff = 30; var done = false; var endTime = null; var startTime = null; var totalTime = null; var frameRenderTime = null; var render = function () { if (!done) { if (ordinal === cutoff) { // Notify the worker that we're done endTime = Date.now(); done = true; message = {type:'done'}; worker.postMessage(message); // Report total frames and render time on page totalTime = (endTime-startTime); frameRenderTime = totalTime/ordinal; outputDiv.innerHTML = "Total Frames: " + ordinal + "<br/>Total time: " + totalTime + "ms" + "<br/>ms per frame: " + frameRenderTime; } else { // Send the rendered frame to the web worker message = { type: 'frame', ordinal: ordinal++, data: canvas.toDataURL('image/png') // ~116 ms!!! }; worker.postMessage(message); // ~2ms // Kick off the next frame render requestAnimationFrame(render); renderer.render(scene, camera); // ~ 20ms cube.rotation.x += 0.1; cube.rotation.y += 0.1; outputDiv.innerHTML = "Rendering frame "+ordinal; // ~4ms } } }; // One, two, three, go! startTime = Date.now(); render(); </script>
The Web Worker
// test-render-worker.js // Connect to the socket server self.importScripts('https://cdn.socket.io/socket.io-1.4.5.js'); var socket = io.connect('http://localhost:3000'); // Queue the images to be transmitted, // servicing the queue by timer, and // closing the socket and worker when // the last image has been sent. var frame, queue = [], done = false, sending=false, timer = setInterval(serviceQueue,30); function serviceQueue(){ if (sending) return; if (queue.length > 0) { sending=true; frame = queue.shift(); socket.emit('frame', frame, function(){ console.log('[WORKER]: Send complete '+ frame.ordinal); sending=false; }); console.log('[WORKER]: Sending frame '+ frame.ordinal); } else if (done && queue.length === 0) { clearInterval(timer); socket.close(); close(); } } // Handle messages from the web page onmessage = function (e){ var message = e.data; switch (message.type) { // Add a frame to the queue case 'frame': delete message['type']; console.log('[WORKER]: Received frame '+ message.ordinal); queue.push(message); break; // That's all, folks case 'done': console.log('[WORKER]: Done. Closing socket and web worker.'); done = true; break; } };
The Server
// test-render-server.js // Required modules var fs = require('fs'); var mkdirp = require('mkdirp'); // Create the socket server const PORT = 3000; var socket = require('socket.io')(PORT); console.log('Socket server listening on port: '+PORT); // Handle connections socket.on('connection', function(client) { // Listen for frame and disconnect events client.on('frame', onFrame); client.on('disconnect', onDisconnect); // Create output folder for this client var output = "/var/tmp/test-render-server/" + client.id + "/"; mkdirp(output); // Handle a frame event from the client function onFrame(frame, callback) { console.log('Received frame: "' + frame.ordinal + '" from client: ' + client.id); // Assemble filename var zeroPadFrame = ("000" + frame.ordinal).slice(-3); var filename = output + "frame-"+zeroPadFrame+".png"; // Drop 'data:/base64png,' header frame.data = frame.data.split(',')[1]; // Create the file var file = new Buffer(frame.data, 'base64'); fs.writeFile(filename, file.toString('binary'), 'binary'); // Acknowledge receipt callback(); } // Handle a disconnection from the client function onDisconnect() { console.log('Received: disconnect event from client: ' + client.id); client.removeListener('frame', onFrame); client.removeListener('disconnect', onDisconnect); } });
You can pick up the code from this Gist on GitHub.
And They’re Off!
Here are some results from my 2011 MacBook Air:
Rendering to the Screen
First, with the following line uncommented in the web page script, the frames are output to the screen:
document.body.appendChild(canvas); // adds ~30ms / frame
Total Frames: 30
Total time: 5630ms
ms per frame: 187.66666666666666
That’s not great, but it’s a quick visual verification of what’s being produced. The time per frame varies with every run, but it’s never close to what we want. The web worker dutifully does its job and the server happily snarfs in the data and dumps out a bunch of frame-xxxx.png files in /var/tmp. So at least the plumbing is all in place. Now to optimize…
Rendering Offscreen
Here’s the output without drawing to the screen:
Total Frames: 30
Total time: 4484ms
ms per frame: 149.46666666666667
A little better, but again, way too slow for rock n’ roll.
Skip Extracting the Data For Now
What happens if we comment out the line that extracts the data from the offscreen canvas?
// data: canvas.toDataURL('image/png') // ~116 ms!!!
Total Frames: 30
Total time: 550ms
ms per frame: 18.333333333333332
Wowzers, now we’re talking! That’s totally acceptable. Except for the fact that we’re not sending any frames to the server. So, the gating factor here is extracting the data from the canvas inside the render function. What can we do about that?
Just Grab the Pixels?
We can’t do the extraction in the web worker, because it can’t access the DOM. Is there a faster way to get at the data? How about just grabbing the pixels from the context instead of asking for them in PNG format? I added this code to the render function:
// Get an array of pixels with readPixels var context = canvas.getContext('webgl'); var pixels = new Uint8Array(context.drawingBufferWidth * context.drawingBufferHeight * 4); context.readPixels( 0, 0, context.drawingBufferWidth, context.drawingBufferHeight, context.RGBA, context.UNSIGNED_BYTE, pixels); // Send the rendered frame to the web worker message = { type: 'frame', ordinal: ordinal++, //data: canvas.toDataURL('image/png') // ~116 ms!!! data: pixels }; worker.postMessage(message); // ~2ms
That comes out about 20ms faster than using canvas.toDataUrl():
Total Frames: 30
Total time: 3813ms
ms per frame: 127.1
So, no. There just no hope for getting the data out of the canvas inside the render function.
Render to Prebuilt Frame Buffers?
What if I were to create an array of frame buffers, and just render each frame to frame buffer at the proper index inside the render function?
I added this just prior to defining the render function:
// Prebuild frame buffers for every frame var i; for (i=0; i<cutoff; i++){ frameBuffers.push(new THREE.WebGLRenderTarget(1920,1080)); }
And modified the render call thusly:
renderer.render(scene, camera, frameBuffers[ordinal-1]); // ~40ms
As you might guess, that does help, but not enough.
Total Frames: 30
Total time: 1531ms
ms per frame: 51.03333333333333
The Takeaway
It is not possible to produce thirty HD frames per second using Three.js, AND copy them from the canvas, regardless of whether you try to send them to the server. If I ran this on a faster machine, I might be able to get the time down into the range we need, but who knows what the capabilities of a user’s hardware might be?
To do this right every time on any machine with a GPU capable of running the app, we must break free of the realtime jail. How do we do that? I’ll have to preprocess the audio file, extract the spectrum data, and use that when rendering each frame. This will decouple the render from the audio processing, and remove any constraints on complexity in the scene as well.
It’s worth noting that this is not a major a setback, just a discovery that clears up a couple of points
- This issue would’ve cropped up whether we’d chosen the server or the client for rendering, because it’s the copy operation not the communications that are the issue.
- Because it was the easiest way to handle things in our early sprints, we inherited a mandate to “do this in realtime and just make a movie of it as it plays.” We never considered preprocessing the audio because it would be an extra step that adds complexity – both to the code and the user’s interactions – without a clear benefit. Now it’s clear that when we upload audio for a project, we should preprocess it. This might be something that can be done server side.
Author’s Note: This article is part of a series, wherein my partner and I are trying out the development of our a product ‘out in the open’.
This is diametrically opposed to my typical ‘skunk works’ approach, holing myself up for months on end in a pointless attempt to keep secret something that will eventually become public anyway. We’re not building anything earth-shattering, paradigm-shifting, or empire-building. Just something cool that serves a niche we know and are interested in helping. It’s a 3D music visualizer built in HTML5 / WebGL using Three.js., PureMVC, React, and Node.js. When we’re done, you’ll be able to create a cool video for your audio track and upload it to YouTube.
The benefit of blogging about it as we go is that we get a chance to pass on some of our thought processes as we navigate the hurdles and potholes strewn along our path. Getting those thoughts down while they’re still fresh in mind might guide someone else following a similar path. If we fail owing to these decisions, maybe it’ll help you avoid your own smoking crater. Either way, later on, we’ll be busy chasing different squirrels in some other park.
The previous article in this series is: Persistent Connections with Node.js and Socket.io
The next article is: Breaking Free of the Realtime Jail: Creating an Audio-modulated HD Video with Three.js
This article has been reblogged at the following sites:
DZone: http://bit.ly/can-i-render-in-realtime-and-send-frames-to-the-server
I tried using canvas.captureStream() but that freezes on inactive tabs. When I tried your method I am facing difficulty as it hangs after first socket.emit() but your example is working fine. Any pointers on what may have gone wrong?
I assume you have a Node instance up running the server code. You might put some console.log() messages on the server side to show if the client connected. And open the debugger on your browser and set a breakpoint before it does that socket.emit and inspect your vars to make sure everything’s going well. Does the browser’s devtools window show any errors or warnings in the log?