Creating Intelligent Music Applications in the Browser

Deep Learning | 24 December 2018

After the introduction of Google’s TensorFlow.js, it has become a lot easier to make use of browser (client-side) to do Deep learning. There are handy approaches (as discussed here) on deploying deep learning models using Keras and TensorFlow.js.

Note: To learn more about TensorFlow.js and its applications, kindly visit this link.

Why Music and ML?

Music generation has already began to catch the eyes of machine learning devs and there are numerous projects that are getting pushed in GitHub every week. Although there exist a barrier between AI Researchers and AI Developers such as complex mathematics that involve derivations and jargons, there is still hope for an AI enthusiast to use code and some music knowledge to create exciting applications that was a dream few years back.

Leveraging the capabilities of TensorFlow.js, we now have Google’s Magenta.js using which any developer with knowledge on javascript and music could create a music application that has intelligence built in to it.

I loved the concept behind Google’s Magenta team.

When a painter creates a work of art, she first blends and explores color options on an artist’s palette before applying them to the canvas. This process is a creative act in its own right and has a profound effect on the final work. Musicians and composers have mostly lacked a similar device for exploring and mixing musical ideas, but we are hoping to change that - Read more

Why browser for ML?

Although one might feel that browsers are light-weight apps that wont handle data intensive algorithms such as deep neural networks, by leveraging the capabilities of WebGL in browsers such as Google Chrome, we could create, train and deploy deep neural net models right in the browser itself without any server requests.

Another advantage of using browser to create AI applications is that you could easily send your creations to your friends or family using nothing more than a simple link (as given below)!

Using Magenta's Pre-trained Models

Magenta.js is a javascript library that is built on top of TensorFlow.js which provides music-oriented abstractions for developers. Google’s Magenta Team research, create and train deep learning models such as Long-Short Term Memory nets (LSTMs), Variational Auto-Encoders (VAE) etc., for music generation, and serve those models as pre-trained models for an AI enthusiast like me to use it for free.

By using a pre-trained magenta model, we could build creative music applications in the browser using deep learning. Some of the note-based music models that are provided by Magenta are MusicVAE, MelodyRNN, DrumsRNN and ImprovRNN. Using these pretrained models, we could use their magenta.js API to create cool music apps.

Note: The prerequisites for making an application using Magenta.js include knowledge on HTML, CSS and JavaScript.

Generating Drum Patterns using DrumsRNN

In this tutorial, I will show you how to create an intelligent music application that I call DeepDrum & DeepArp using javascript and Google’s magenta.js in the browser. First, we will focus on generating drum patterns using magenta’s drums_rnn. Similar approach is used to create arpeggio patterns using magenta’s improv_rnn.

DeepDrum & DeepArp using Google Magenta's DrumsRnn & ImprovRNN

The core algorithm behind these two models is the special type Recurrent Neural Network named Long-Short Term Memory network (LSTM). You can read more about the inner workings of an LSTM network in this excellent blog post by Christopher Olah.

How to generate drum patterns

Step 1: To include Magenta.js for your music application, you simply need to include the following script in your html head tag.

codeindex.html
1
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/@magenta/music@1.4.2/dist/magentamusic.min.js"></script>

Step 2: A pretrained magenta model can easily be loaded into a javascript environment using the js-checkpoints (that magenta team has made publicly available) which automatically loads the model along with config files in a single line of code.

codeapp.js
1
let drums_rnn = mm.MusicRNN("https://storage.googleapis.com/download.magenta.tensorflow.org/tfjs_checkpoints/music_rnn/drum_kit_rnn");

There is a tradeoff between model package size and accuracy as inference of a pre-trained model is happening live in the browser (client-side).

Step 3: Next, we need to initialize the model to make use of its methods and attributes.

codeapp.js
1
drums_rnn.initialize();

Step 4: Our drum pattern generator will work like this - You provide some random input seed pattern first and the deep neural network (DrumsRNN) will generate (predict) the next sequence of patterns.

Based on General MIDI Level 1 Percussion Key Map, we restrict our focus to have 9 individual drumkit instrument (as given below) to play the sequence generated by DrumsRNN.

  1. kick
  2. snare
  3. hihat closed
  4. hihat open
  5. tom low
  6. tom mid
  7. tom high
  8. clap
  9. ride

Hence, we define an array named seed_pattern to hold the ON time step of each instrument (in an array) at every time step.

For example, I have initialized the seed_pattern as shown below. This means, for a seed_limit of 4 time steps, we assign the input pattern like this -

  • kick should be ON at 1st and 3rd time step.
  • snare shouldn’t be turned ON within seed_limit.
  • hihat closed should be ON only at 3rd time step.
  • and so on..

Notice we start first time step as 0 in code. Also notice, the position of each drumkit instrument in seed_pattern array. 0th position correspond to kick, 1st position correspond to snare and so on.

codeapp.js
1
2
3
4
5
6
7
8
9
10
11
12
var seed_pattern = [
	[0, 2], 
	[], 
	[2], 
	[], 
	[2], 
	[], 
	[0, 2], 
	[], 
	[1, 2], 
	[]
];

With the input seed_pattern and seed_limit defined, we ask our drums_rnn to continue the sequence for us. Before doing that, we need to be aware of quantization of the input values as well as the generation limit player_length.

Step 5: To quantize the note sequence, we feed the input seed_pattern into a javascript object as shown below.

codeapp.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
let cur_seq = drum_to_note_sequence(seed_pattern);

//---------------------------------
// drum to note sequence formation
//---------------------------------
function drum_to_note_sequence(quantize_tensor) {
	var notes_array = [];
	var note_index = 0;
	for (var i = 0; i < quantize_tensor.length; i++) {
		var notes = quantize_tensor[i];
		if(notes.length > 0) {
			for (var j = 0; j < notes.length; j++) {
				notes_array[note_index] = {};
				notes_array[note_index]["pitch"] = midiDrums[notes[j]];
				notes_array[note_index]["startTime"] = i * 0.5;
				notes_array[note_index]["endTime"] = (i+1) * 0.5;
				note_index = note_index + 1;
			}
		}
	}

	return mm.sequences.quantizeNoteSequence(
	  {
	    ticksPerQuarter: 220,
	    totalTime: quantize_tensor.length / 2,
	    timeSignatures: [
	      {
	        time: 0,
	        numerator: ts_num,
	        denominator: ts_den
	      }
	    ],
	    tempos: [
	      {
	        time: 0,
	        qpm: tempo
	      }
	    ],
	    notes: notes_array
	   },
	  1
	);
}

The way I figured out what’s inside mm.sequences.quantizeNoteSequence is through the browser’s console and some help by looking at the code of few demos in Magenta’s website. Values like timeSignatures, tempos and totalTime need to be set according to one’s preferences. You could even assign these values dynamically.

The main thing you need to take care here is the conversion of our input seed_pattern into musical quantization format that Magenta accepts which includes defining each drumkit instrument’s pitch, startTime and endTime.

pitch of a drumkit note is the MIDI value of that note which could be obtained from this mapping. Remember we use the standard General MIDI Level 1 Percussion Key Map.

startTime and endTime are quantization values that defines the start and end time for a single note.

For example, for our first time step, kick will have the following values.

  • pitch - 36
  • startTime - 0
  • endTime - 0.5

Step 6: Once you have encoded the input seed_pattern to Magenta’s quantization format, you can ask drums_rnn to continue the sequence as shown below.

codeapp.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const player_length = 32;
const temperature_drum = 1;

predicted_sequence = await drums_rnn
		.continueSequence(cur_seq, player_length, temperature_drum)
		.then(r => seed_pattern.concat(note_to_drum_sequence(r, player_length)));

//---------------------------------
// note to drum sequence formation
//---------------------------------
function note_to_drum_sequence(seq, pLength) {
	let res = [];
	for (var i = 0; i < pLength; i++) {
		empty_list = [];
		res.push(empty_list);
	}
    for (let { pitch, quantizedStartStep } of seq.notes) {
      res[quantizedStartStep].push(reverseMidiMapping.get(pitch));
    }
    return res;
}

First, we use continueSequence() function of drums_rnn to predict the next sequence values for all our drumkit instruments and store it in a variable named predicted_sequence.

These predictions will be based on the same old magenta’s quantization format having MIDI-mapped pitch values, start time and end time. But when we need to convert the MIDI-mapped pitch values to real-time drumkit values, we use reverseMidiMapping() that is available here. Notice how we map values from [35-81] range to [0-8].

We define an array named res and store the predicted sequence values based on its quantizedStartStep. We then concatenate the predicted sequence with the input seed_pattern to generate a beat!

How to proceed from here?

These are the core steps involved in using Google Magenta’s pretrained model to generate sequences for music generation. You can use the same steps to generate arpeggio patterns using improv_rnn pre-trained model.

You can check the entire code that I have used to build this music application here.

Note: If you still don't understand the steps mentioned here, I highly encourage you to do console.log() at each step of the code and understand the steps completely.

Cool Demos

People working in this domain are musicians, artists, creative coders, programmers and researchers who have built extremely amazing demos that you can find here and here.

Resources

In case if you found something useful to add to this article or you found a bug in the code or would like to improve some points mentioned, feel free to write it down in the comments. Hope you found something useful here.