Creating Video on the Server in Node.js

In the previous article, I detailed how we arrived at the idea of rendering our music visualizer’s WebGL frames in the web browser, and shipping them to the server for assembly into a final video. In this article I’ll be discussing the early progress of my exploration into that approach.

Setting up a Host

We’d originally planned to use Amazon Web Services for hosting, but decided on Google Cloud, since there is no ingress bandwidth charge. There will be a lot of frames pushed down the socket to Node, so that would be a real issue for us on AWS. At a minimum it would impact our pricing model.

So, I launched a Bitnami VM with a Node.js stack. If you choose the smallest disk (10GB), and the smallest machine (f1-micro: 1vCPU, 0.6 GB RAM), it’ll cost you a whopping US$4.95 to operate. I was already familiar with the console, which compares favorably to AWS in nearly every detail, after recently moving my company and personal sites from Godaddy to Google Cloud on this setting. They blaze, compared to my previous provider Godaddy, and cost half as much to host. Plus I can take VM snapshots at any point along the way to ensure that if things go astray on the server (like a spurious sudo rm -rf executed in the wrong folder), I can get back to sanity with the click of a button. The Goog even has a TCO calculator that shows you how much your rig would cost if you deployed it on AWS. Most configurations approach a third of the price.

Building the Microservice

Node.js is a natural choice for microservices when they are developed hand-in-hand with an HTML5 client (particularly if the same developer is working on both in a ‘full-stack’ role). Writing Javascript in both places reduces the cognitive friction associated with shifting your focus between the client code and server code.

And since our client is built on PureMVC, it’s a further comfort for the microservices to employ the same architecture. PureMVC is prescriptive, so once you’ve seen one PureMVC app, you’ve seen them all. And setting one up requires very little effort. For that reason, I won’t describe the scaffolding of the microservice, just the important bits it will need to implement. The only additional PureMVC note is to point you to the npmvc package for Node.js, which works like a charm.

How Do I Create a Video, Anyhow?

I returned to the tip I from the last post, which had convinced me that this could be done easily in Node, the upshot of which was:

To turn the frame sequences into video files, either use the sequence as a source in Adobe Media Encoder or use a command-line tool like ffmpeg or avconv.

So I researched those first. And honestly, I was too shagged out to look at any others once I finished.

The former has been around since 2000, and has generally implemented all the features of the latter. Confusingly, avconv is actually LibAv, a vicious fork of ffmpeg, maintained by a cadre of developers who stormed out on the head of the ffmpeg project. Reportedly, they have reimplemented, sometimes at much cost to their foundation, almost every feature subsequently introduced by ffmpeg on their own (usually introducing new API language to do so) rather than merge ffmpeg changes, even when there is no conflict. In turn ffmpeg has added aliases so that users can use either the ffmpeg API or that of avconv/LibAv. So. Much. Drama.

There might be a better selection than either of these two warring but apparently equivalent projects, but in the end it was ffmpeg I chose because it was well supported by npm packages and the previous post’s tip example had used it. The speedy progress I made once I moved forward with ffmpeg was pleasing.

Installing FFMpeg

As I mentioned, there are a number of npm packages that use ffmpeg, but the first one I suggest looking into is @ffmpeg-installer/ffmpeg.

As soon as you add this module to your project, it will automagically install the proper ffmpeg executable for your system, right in your project’s node_modules folder. This is a right and proper approach, since locally, I need the Darwin executable for my Macbook Air, but on that Google Cloud VM I sat up, I’m running Debian.

That was easy. Now how do I use it?

By glancing at the documentation, you can see there are a positively insane number of options. Command lines can get pretty long, specifying a zillion parameters for video, images, audio, and complex filtering. Zowie, this thing is full-on. Paralyzingly so. Therefore, I recommend you don’t bury your head in the documentation right away. Instead, have a look at the fluent-ffmpeg package. This is the easiest way to talk to it from Node that I found. Please drop a line in the comments if you have other favorites you’d like to share.

Fluent APIs are interfaces where every method call returns the object that is the owner of that method. In many languages, this lets you chain together a series of method calls in a single statement like a pipeline, rather than write as many separate statements, each making a method call on the object. This approach is well-suited for tasks such as configuring a bunch of options on an ffmpeg command.

Step 1

Get an ffmpeg command upon which to set all these options:

var ffmpegPath = require('@ffmpeg-installer/ffmpeg').path;
var ffmpeg = require('fluent-ffmpeg');
ffmpeg.setFfmpegPath(ffmpegPath);
var command = ffmpeg();

Step 2

Determine what options you need to supply. FFMpeg is pretty smart at figuring out how to ‘do the right thing’ with as little direction as possible. For instance, you could supply it with width and/or height and/or aspect ratio for both input and output. But if you provide it with input(s) sized the same as the intended output, you can skip sizing options altogether.

What size and aspect ratio should I use?

We are targeting YouTube primarily, and also our target audience will want a full HD experience, so a quick look at Google’s recommendations tells us 1080p video should be sized 1920×1080. So, for my first experiment, I created a series of 8 labeled PNGs with different foreground and background colors, sized appropriately.

What about framerates?

Output Framerate: Google’s advice on this matter is to choose the same framerate at which the input was recorded. I have total control over that, both in this experiment, and in the client that will be soon be rendering the WebGL craziness and feeding it to this microservice. The industry standard for film is 24 (48 for slow motion), and in video, you see 24, 25, and 30. I chose 30 FPS because it’s about half the effective rate I’ve been seeing in the browser (60), and because it’ll be the most fluid of the three.

Input Framerate: In this case I just want to take these 8 frames and show them each for about 5 seconds, outputting a video. Since the rate is expressed in frames per second, I had to specify the input frame rate as the inverse of 5 per second, 1/5.

Input Files: Here, I used a file sequence expression where I can specify the number of decimal places to be replaced by a zero-padded integer. I chose to number the frames as 001-008, so in that part of the filename I used ‘%03d’, using the standard printf format.

Output File: Again, I followed Google’s advice and chose the mp4 format, with no audio codec for this experiment. I didn’t bother looking into the video codec at this point, because I didn’t have to.

Step 3

Configure the command with the options and run it.

// Use FFMpeg to create a video.
// 8 consecutive frames, held for 5 seconds each, 30fps output, no audio
command
    .input('assets/demo1/Sinewave3-1920x1080_%03d.png')
    .inputFPS(1/5)
    .output('assets/demo1/Sinewave3-1920x1080.mp4')
    .outputFPS(30)
    .noAudio()
    .run();

It created the file, I dropped it into the browser, and it played. Yay! I made a video. Pretty boring though, since it had no audio and was just different colored slides with the same word on them. I didn’t bother uploading it to YouTube and I’m certainly not going to bore you with it here.

Adding Audio, Watermarking, and Progress Reporting

Before uploading anything, I wanted to make sure that I could add audio. And in addition to that, I wanted to test watermarking the video, which we’ll do in support of a freemium business model. Finally, the process will take time, and could fail, so I wanted to find out how to handle that in this second experiment.

The first video had worked out to be 40 seconds long. Where could I find about 40 seconds of audio? As it happens, I have a fistful of noodles on Propellerhead’s Allihoopa site that all run about that length, because their mobile Figure app creates 8-bar ditties for upload and remixing by others. The site plays them through twice, and given reasonable BPMs, that usually works out to be 30 to 40 seconds long.

The track I chose was 35 seconds long, so I used 6 image frames, held onscreen for 6 seconds each, leaving me a second of silence at the end. I looked into crossfading them, but the complexity involved in creating such a command was beyond the scope of what I was trying to do. Just note the complexity of the video filter expression I had to use in order to do the watermarking in the code below. For a watermark, I placed my Sea of Arrows logo in the lower left hand side of the video.

var ffmpegPath = require('@ffmpeg-installer/ffmpeg').path;
var ffmpeg = require('fluent-ffmpeg');
ffmpeg.setFfmpegPath(ffmpegPath);
var command = ffmpeg();
var timemark = null;

// 6 consecutive 1920x1080 frames, held 6 seconds each, 30fps, with m4a audio, watermarked
command
 .on('end', onEnd )
 .on('progress', onProgress)
 .on('error', onError)
 .input('assets/demo2/folds-of-spacetime_%03d.png')
 .inputFPS(1/6)
 .videoFilter(["movie=assets/demo2/soa-watermark.png [watermark]; [in][watermark] overlay=10:main_h-overlay_h-10 [out]"])
 .input('assets/demo2/folds-of-spacetime.m4a')
 .output('assets/demo2/folds-of-spacetime.mp4')
 .outputFPS(30)
 .run();

function onProgress(progress){
 if (progress.timemark != timemark) {
 timemark = progress.timemark;
 console.log('Time mark: ' + timemark + "...");
}

function onError(err, stdout, stderr) {
 console.log('Cannot process video: ' + err.message);
}

function onEnd() {
 console.log('Finished processing');
}

Running this yields the following output:

Time mark: 00:00:04.26…
Time mark: 00:00:10.26…
Time mark: 00:00:10.28…
Time mark: 00:00:16.26…
Time mark: 00:00:22.26…
Time mark: 00:00:28.26…
Time mark: 00:00:34.20…
Time mark: 00:00:35.86…
Finished processing

Process finished with exit code 0

The output video uploaded to YouTube without a hitch, and plays back in full HD. Bada bing.

Next Steps

I need to create a socket in the microservice and connect to it with telnet and get a conversation going.
Figure out what the protocol for uploading images and audio is going to be.
Then I need to push some images across the socket from the browser. I’ll probably want to build a web worker process for doing this, so that the render loop in the visualizer doesn’t get slowed down by the socket communications. I may do a POC instead of baking this into our client on the first go, so that I can test offscreen rendering. The client may not have enough resolution to render the scene in 1080p, and anyway, I suspect it’ll be more efficient to render it offscreen and just report progress in the GUI.

I’ll report in again after the next major milestone.

Author’s Note: This article is part of a series, wherein my partner and I are trying out the development of our a product ‘out in the open’.

This is diametrically opposed to my typical ‘skunk works’ approach, holing myself up for months on end in a pointless attempt to keep secret something that will eventually become public anyway. We’re not building anything earth-shattering, paradigm-shifting, or empire-building. Just something cool that serves a niche we know and are interested in helping. It’s a 3D music visualizer built in HTML5 / WebGL using Three.js., PureMVC, React, and Node.js. When we’re done, you’ll be able to create a cool video for your audio track and upload it to YouTube.

The benefit of blogging about it as we go is that we get a chance to pass on some of our thought processes as we navigate the hurdles and potholes strewn along our path. Getting those thoughts down while they’re still fresh in mind might guide someone else following a similar path. If we fail owing to these decisions, maybe it’ll help you avoid your own smoking crater. Either way, later on, we’ll be busy chasing different squirrels in some other park.

The previous article in this series is: Should I Render Three.js to Video on the Client or Server?

The next article in this series is: Persistent Connections with Node.js and Socket.io

This article has been reblogged at the following sites:

DZone: http://bit.ly/create-video-with-nodejs