MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks


Authors

^ Corresponding author.

Abstract

Human usually composes music by organizing elements according to the musical form to express music ideas. However, for neural network-based music generation, it is difficult to do so due to the lack of labelled data on musical form. In this paper, we develop MeloForm, a system that generates melody with musical form using expert systems and neural networks. Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form. MeloForm enjoys the advantages of precise musical form control by expert systems and musical richness learning via neural models. Both subjective and objective experimental evaluations demonstrate that MeloForm generates melodies with precise musical form control with 97.79% accuracy, and outperforms baseline systems in terms of subjective evaluation score by 0.75, 0.50, 0.86 and 0.89 in structure, thematic, richness and overall quality, without any labelled musical form data. Besides, MeloForm can support various kinds of forms, such as verse and chorus form, rondo form, variational form, sonata form, etc.

Baseline Samples

Melody from MeloForm Description

Section A: (phrase a1: 00:00-00:09; 00:09-00:17; 00:17-00:25);

Section B: (phrase b1: 00:25-00:33; 00:33-00:41; 00:41-00:49);

Section A: (phrase a1: 00:49-00:57; 00:57-1:05);

Section B: (phrase b1: 1:05-1:13; 1:13-1:21);

Section A: (phrase a1: 1:21-1:29; 1:29-1:39).

Section A: (phrase a1: 00:00-00:07; 00:07-00:15);

Section B: (phrase b1: 00:15-00:23; 00:23-00:31);

Section A: (phrase a1: 00:31-00:39; 00:39-00:47);

Section A: (phrase a1: 00:00-00:08; 00:08-00:16);

Section B: (phrase b1: 00:16-00:24; 00:24-00:32);

Section A: (phrase a1: 00:32-00:40; 00:40-00:48);

Section A: (phrase a1: 00:48-00:56; 00:56-01:04);

Melody from Music Transformer Description

Section A: (phrase a1: 00:00-00:09);

Section B: (phrase b1: 00:09-00:15);

And section B gets repeated until the end.

Section A: (phrase a1: 00:00-00:04; 00:04-00:08);

Section B: (phrase b1: 00:08-00:12);

Section C: (phrase c1: 00:12-00:15);

And section C gets repeated until the end.

Section A: (phrase a1: 00:00-00:03; 00:04-00:06);

Section B: (phrase b1: 00:06-00:18);

Section C: (phrase c1: 00:18-00:26);

And section C gets repeated until the end.

Conclusions

Although music transformer captures the repetitive patterns, but there are too much (the remaining parts are almost the repetitions.). And inside the phrases, it is hard to figure out the theme.

Melody from MELONS Description

prime

melody continuing the prime

Prime: 00:00-00:17;

In 00:36, only a part of prime repeated again. There is another repeated pattern in 00:21, 00:49, 01:09. Although there are some repeated segments, the theme is divergent, and the arrangement is out of control.

prime

melody continuing the prime

Prime: 00:00-00:17;

This melody can be divided into 4 sections: 1) 00:00-00:32: The prime get repeated right after the first 8 bars. 2) 00:32-00:64: there are some short-distance repeated patterns in this section. 3) 00:64-01:36: same with 2). 4) 01:36 - end: same with 2) and 3). Although there are much repeated patterns, connection with prime is lost in the following generated parts.

prime

melody continuing the prime

Prime: 00:00-00:17;

Although there are some short repetitive patterns, the prime is lost in the following generated parts and the theme is not obvious.

Conclusions

Although MELONS explicitly construct bar-level structure graph, the connection with prime is lost when distance is too long. Also, the arrangement of fragments is out of control, which may distract listeners from remerbering the main theme. The prime is given from 8 bars of human composed music, which is also limited.

Melody from POP909_lm Description

Section A: (phrase a1: 00:00-00:08);

Section B: (phrase b1: 00:08-00:16; 00:16-00:24; 00:24-00:32; 00:32-00:40);

Section A: (phrase a1: 00:40-00:48);

Section A: (phrase b1: 00:48-00:56; 00:56-1:04);

Section A: (phrase a1: 00:00-00:08);

Section B: (phrase b1: 00:08-00:16; 00:16-00:24);

Section A: (phrase a1: 00:24-00:32);

Section B: (phrase b1: 00:32-00:40; 00:40-00:48; 00:48-00:56);

Section A: (phrase a1: 00:00-00:08; 00:08-00:16; 00:16-00:24; 00:24-00:32);

Section B: (phrase b1: 00:32-00:40; 00:40-00:48; 00:48-00:56);

Section B: (phrase b1: 00:56-1:04; 1:04-1:12; 1:12-1:20);

Conclusions

Although pop909_lm can control musical form in some degree, different sections are similar, and the theme is not obvious.

MeloFrom that extends musical form variations

Example 1:

Verse and Chorus Form: A(a1,a1,a1)B(b1,b1,b1)A(a1,a1)B(b1,b1)A(a1,a1)

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

Melody
Melody + Accompaniment
Piano Roll Representation
Here is the piano roll for the melody.

Musical Form Analysis:

This is a melody with verse and chorus form as A(a1,a1,a1)B(b1,b1,b1)A(a1,a1)B(b1,b1)A(a1,a1), where A represents verse and B represents chorus. As seen in the piano roll representation, the average pitch in chorus is much higher than that in verse, and the rhythm pattern is more intense in chorus by having more note density. The motif for phrase a1 and b1 is labeled in blue and orange box respectively, from which the following bars are derived.


Example 2:

Rondo Form: A(a1,a1)B(b1,b1)A(a1,a1)C(c1,c1)A(a1,a1)

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

Melody
Melody + Accompaniment
Piano Roll Representation
Here is the piano roll for the melody.

Musical Form Analysis:

This is a melody with rondo form as A(a1,a1)B(b1,b1)A(a1,a1)C(c1,c1)A(a1,a1). Refrained section A alternates with contrasting sections B and C.


Example 3:

Variational Form: A(a1,a1,a1)A'(a1',a1',a1')A''(a1'',a1'',a1'')

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

Melody
Melody + Accompaniment
Piano Roll Representation
Here is the piano roll for the melody.

Musical Form Analysis:

This is a melody with variational form as A(a1,a1,a1)A'(a1',a1',a1')A''(a1'',a1'',a1''). We firstly generate a melody with musical form as A(a1,a1,a1)A(a1,a1,a1)A(a1,a1,a1) by expert systems, then refine each section separately by neural networks. By refining the same section in different inference step, the melodies in same section can be variant in pitch distribution with rhythm patterns kept. This is only one way for variation, we will release more methods for variational forms together with releasing datasets.


Example 4:

Sonata Form: A(a1,a1,a2,a2)B(b1,b1,b1,b1)A'(a1,a1,a1,a1)

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

Melody
Melody + Accompaniment
Piano Roll Representation
Here is the piano roll for the melody.

Musical Form Analysis:

This is a melody with sonata form as A(a1,a1,a2,a2)B(b1,b1,b1,b1)A'(a1,a1,a1,a1), which are exposition (i.e., A), development (i.e., B), and recapitulation (i.e., A') sections. In exposition section, phrase a1 is firstly introduced in C major scale. To transpose the key to A minor for constructing another phrase a2, we leverage V-I chord progression to reach to the tonic chord in A minor. Specifically, phrase a1 ends in chord 'Em' and phrase a2 starts with 'Am' chord, which construct an 'V-I' harmony progression in A minor that helps transpose to A minor scale. Then we create melodies in development section still in A minor scale, with higher average pitch and more rhythm density for building up more tensions. Finally, we use the same strategy for going back to phrase a1 in C major by transposing. Besides building up this chord progression template, we also set the tonality token as "MAJ" for a1 and "MIN" for a2 and b1 to have model generate phrase in the specific scale.