Computerized Generative ART — Music, Part 1

A Sai Vinith
6 min readMay 7, 2021
Photo by Zac Bromell on Unsplash

We all hear music day-to-day and we know that it takes days or sometimes even months to produce a single track. Music Theory in itself is a very complicated subject and each instrument takes years to master.

Machine ART

Ever since computers were made available, people have tried to figure out what more the machine was capable of. Eventually, scientists(who were lazy enough to create art without much brainpower or labor) began experimenting with math and code to produce papers like Deepmind, Wavenet, GANs, VAEs, etc. Thus the revolution in machine-generated art began. Today, we can train a network to generate a whole movie script or make abstract art that could be sold for millions.

So, today we shall go through a simple tutorial on how to generate drum sequence using pre-trained models.

Music can be represented in different ways but we’ll stick to the midi plot in this post.

MIDI(Music Instrument Digital Interface) format describes the music using a notation containing the musical notes and timing, but not the sound or timbre of the sound. So, the representation by itself doesn’t have any sound; it needs to be played by instruments. You can learn more about midi here. An example of a midi plot can be seen below.

Time (V) pitch plot of a midi file

Google’s Magenta

As we speak, Google is making humongous strides in deep learning and TensorFlow’s one of them. TensorFlow has been important for the data scientist community for being an open-source Deep learning Framework and its usability unlike its counterparts Theano, ONNX, etc. Magenta, which is based on Tensorflow, can be seen the same way: it enables anyone with a little interest in music/image generation to use the state of the art machine learning models. Musicians or anyone with a computer can install it and start generating music from the command line.

You can learn more about magenta here.

Image from here

We’ll be giving magenta a simple drum sequence and it’ll generate a sequence that’s inspired by the provided sequence. We’ll be working with the underlying score rather than direct audio.

Magenta has many models with different configurations that are pre-trained. Models are networks specific for one task. For example, the Drums RNN model is an LSTM network with attention configuration for producing drum sequences, while Melody RNNis also an LSTM network but produces melody sequenced instead of percussion patterns.

Configuration changes how the data is encoded for the network, and how the neural nets are configured. Drums RNN has two different configurations. The ‘one_drum’ configuration encodes sequence to a single class whereas the ‘drum_kit’ maps the sequence to nine drum instruments and configures the attention length to 32. Each configuration comes with one or more pre-trained models. You can check out all the models and their code here.

Let’s get into action

Initially, we’ll write the code to generate music in python. All of the code can be implemented in Google collab. Later on, we’ll see how to call Drums RNN directly from the command line since it’s faster, easier to use, and requires no programming knowledge.

Python Code:

We’ll use magenta 1.1.7 and We import the necessary libraries.

We’ll then download the bundle using ‘magenta.music’ package which has many useful tools. Magenta calls pre-trained models ‘bundles’.

We’ll initialize a generator class with the drum_kit configuration using the drums generator. Magenta has defined all the models and we’ll just need to import what we need.

We need to declare tempo(given in QPM in the below plot). Tempo is the speed at which the score is played and it’s embedded directly into the midi file. So, the number of notes or generation length doesn’t change. Let’s take an example to make it easier to understand.

So, a quarter is a bar separated into four — if we have 16 steps in a bar, a quarter will have 16/4, i.e., 4 steps. So, if our tempo is set to 120 QPM, we have 120 quarters/60 seconds = 2 quarters per second which means we play 1 bar every 2 seconds. We need to provide magenta start time and end time, so we calculate all these.

So, in the next code snippet, we’ll declare a few numbers which are nothing but a mathematical deduction of the below-seen plot of a jazz drum sequence. Think of the primer as the test data which needs to be fed in so the model can predict the next sequences.

This plot can be decomposed into a list —

If you’re wondering what the numbers on the left-hand side depict, they are percussion classes meaning each number corresponds to one of the percussive instrument's sounds as specified by the MIDI org. Here, ‘36 corresponds to Bass Drum 1’, ‘38 to Acoustic snare’, ‘44 to Pedal Hi-Hat’, and ‘51 to Ride Cymbal 1’. You can refer to the full table here(it’s the last table).

We can calculate the time of the primer in seconds since it’s only 1 bar(16 steps we declared earlier correspond to 1 bar). We then calculate the generation start time and end time. The generation start time should be right after the primer end time but the generation end time depends on the number of bars we define. We’ll declare the number of bars as 3 so we’re generating a 6-second sequence for the 2 seconds that we provided.

We can now configure our generator with the start and end times. It can take temperature as a parameter too. Temperature can be defined to make our generated sequence random from the given primer. We’ll set the temperature to 1.1 for a bit of randomness. There are other parameters too that can be self-explored.

We can now generate the drum sequence with the primer sequence as the input. We’ll save the sequence to a midi file and plot the midi file using the visual midi library.

The Generated midi plot with 4 bars and 8 seconds is as follows:

and you can access the midi file here. You’ll need to download audacity to listen to the midi file.

Command-line:

For the command-line interface to work, first you’ll need to install magenta 1.1.7 and visual_midi libraries on your system using pip, or you could download and install anaconda, and create an environment with the aforementioned libraries.

We activate the environment where magenta is installed and then we grab the drums bundle from the official magenta website. We call drums_rnn_generate to generate the sequence with the default primer and we create the midi file using the generated sequence. Use audacity to listen to the midi file and generating music using the command-line interface is that simple.

The TL;DR; we learned about magenta and wrote a simple code to generate music.

In the further parts, we’ll discuss latent spaces, VAEs — how they’re useful in generating music that’s meaningful and not random, and we’ll see how to train our own data to generate our own style of music.

You can learn more about RNN’s here.

Thanks, and you could always follow me on Linkedin and check out my GitHub.

--

--