MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks
Authors
- Peiling Lu (Microsoft Research Asia) peil@microsoft.com
- Xu Tan^ (Microsoft Research Asia) xuta@microsoft.com
- Botao Yu (Nanjing University) btyu@smail.nju.edu.cn
- Sheng Zhao (Microsoft Azure Speech) szhao@microsoft.com
- Tao Qin (Microsoft Research Asia ) taoqin@microsoft.com
- Tie-Yan Liu (Microsoft Research Asia) tyliu@microsoft.com
^ Corresponding author.
Abstract
Human usually composes music by organizing elements according to the musical form to express music ideas. However, for neural network-based music generation, it is difficult to do so due to the lack of labelled data on musical form. In this paper, we develop MeloForm, a system that generates melody with musical form using expert systems and neural networks. Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form. MeloForm enjoys the advantages of precise musical form control by expert systems and musical richness learning via neural models. Both subjective and objective experimental evaluations demonstrate that MeloForm generates melodies with precise musical form control with 97.79% accuracy, and outperforms baseline systems in terms of subjective evaluation score by 0.75, 0.50, 0.86 and 0.89 in structure, thematic, richness and overall quality, without any labelled musical form data. Besides, MeloForm can support various kinds of forms, such as verse and chorus form, rondo form, variational form, sonata form, etc.
Baseline Samples
Melody from MeloForm | Description |
---|---|
Section A: (phrase a1: 00:00-00:09; 00:09-00:17; 00:17-00:25); Section B: (phrase b1: 00:25-00:33; 00:33-00:41; 00:41-00:49); Section A: (phrase a1: 00:49-00:57; 00:57-1:05); Section B: (phrase b1: 1:05-1:13; 1:13-1:21); Section A: (phrase a1: 1:21-1:29; 1:29-1:39). | |
Section A: (phrase a1: 00:00-00:07; 00:07-00:15); Section B: (phrase b1: 00:15-00:23; 00:23-00:31); Section A: (phrase a1: 00:31-00:39; 00:39-00:47); | |
Section A: (phrase a1: 00:00-00:08; 00:08-00:16); Section B: (phrase b1: 00:16-00:24; 00:24-00:32); Section A: (phrase a1: 00:32-00:40; 00:40-00:48); Section A: (phrase a1: 00:48-00:56; 00:56-01:04); |
Melody from Music Transformer | Description |
---|---|
Section A: (phrase a1: 00:00-00:09); Section B: (phrase b1: 00:09-00:15); And section B gets repeated until the end. | |
Section A: (phrase a1: 00:00-00:04; 00:04-00:08); Section B: (phrase b1: 00:08-00:12); Section C: (phrase c1: 00:12-00:15); And section C gets repeated until the end. | |
Section A: (phrase a1: 00:00-00:03; 00:04-00:06); Section B: (phrase b1: 00:06-00:18); Section C: (phrase c1: 00:18-00:26); And section C gets repeated until the end. | |
Conclusions |
Although music transformer captures the repetitive patterns, but there are too much (the remaining parts are almost the repetitions.). And inside the phrases, it is hard to figure out the theme. |
Melody from MELONS | Description |
---|---|
prime melody continuing the prime |
Prime: 00:00-00:17; In 00:36, only a part of prime repeated again. There is another repeated pattern in 00:21, 00:49, 01:09. Although there are some repeated segments, the theme is divergent, and the arrangement is out of control. |
prime melody continuing the prime |
Prime: 00:00-00:17; This melody can be divided into 4 sections: 1) 00:00-00:32: The prime get repeated right after the first 8 bars. 2) 00:32-00:64: there are some short-distance repeated patterns in this section. 3) 00:64-01:36: same with 2). 4) 01:36 - end: same with 2) and 3). Although there are much repeated patterns, connection with prime is lost in the following generated parts. |
prime melody continuing the prime |
Prime: 00:00-00:17; Although there are some short repetitive patterns, the prime is lost in the following generated parts and the theme is not obvious. |
Conclusions |
Although MELONS explicitly construct bar-level structure graph, the connection with prime is lost when distance is too long. Also, the arrangement of fragments is out of control, which may distract listeners from remerbering the main theme. The prime is given from 8 bars of human composed music, which is also limited. |
Melody from POP909_lm | Description |
---|---|
Section A: (phrase a1: 00:00-00:08); Section B: (phrase b1: 00:08-00:16; 00:16-00:24; 00:24-00:32; 00:32-00:40); Section A: (phrase a1: 00:40-00:48); Section A: (phrase b1: 00:48-00:56; 00:56-1:04); | |
Section A: (phrase a1: 00:00-00:08); Section B: (phrase b1: 00:08-00:16; 00:16-00:24); Section A: (phrase a1: 00:24-00:32); Section B: (phrase b1: 00:32-00:40; 00:40-00:48; 00:48-00:56); | |
Section A: (phrase a1: 00:00-00:08; 00:08-00:16; 00:16-00:24; 00:24-00:32); Section B: (phrase b1: 00:32-00:40; 00:40-00:48; 00:48-00:56); Section B: (phrase b1: 00:56-1:04; 1:04-1:12; 1:12-1:20); | |
Conclusions |
Although pop909_lm can control musical form in some degree, different sections are similar, and the theme is not obvious. |
MeloFrom that extends musical form variations
Example 1:
Verse and Chorus Form: A(a1,a1,a1)B(b1,b1,b1)A(a1,a1)B(b1,b1)A(a1,a1)
Melody
Melody + Accompaniment
Piano Roll Representation
Musical Form Analysis:
This is a melody with verse and chorus form as A(a1,a1,a1)B(b1,b1,b1)A(a1,a1)B(b1,b1)A(a1,a1), where A represents verse and B represents chorus. As seen in the piano roll representation, the average pitch in chorus is much higher than that in verse, and the rhythm pattern is more intense in chorus by having more note density. The motif for phrase a1 and b1 is labeled in blue and orange box respectively, from which the following bars are derived.
Example 2:
Rondo Form: A(a1,a1)B(b1,b1)A(a1,a1)C(c1,c1)A(a1,a1)
Melody
Melody + Accompaniment
Piano Roll Representation
Musical Form Analysis:
This is a melody with rondo form as A(a1,a1)B(b1,b1)A(a1,a1)C(c1,c1)A(a1,a1). Refrained section A alternates with contrasting sections B and C.
Example 3:
Variational Form: A(a1,a1,a1)A'(a1',a1',a1')A''(a1'',a1'',a1'')
Melody
Melody + Accompaniment
Piano Roll Representation
Musical Form Analysis:
This is a melody with variational form as A(a1,a1,a1)A'(a1',a1',a1')A''(a1'',a1'',a1''). We firstly generate a melody with musical form as A(a1,a1,a1)A(a1,a1,a1)A(a1,a1,a1) by expert systems, then refine each section separately by neural networks. By refining the same section in different inference step, the melodies in same section can be variant in pitch distribution with rhythm patterns kept. This is only one way for variation, we will release more methods for variational forms together with releasing datasets.
Example 4:
Sonata Form: A(a1,a1,a2,a2)B(b1,b1,b1,b1)A'(a1,a1,a1,a1)
Melody
Melody + Accompaniment
Piano Roll Representation
Musical Form Analysis:
This is a melody with sonata form as A(a1,a1,a2,a2)B(b1,b1,b1,b1)A'(a1,a1,a1,a1), which are exposition (i.e., A), development (i.e., B), and recapitulation (i.e., A') sections. In exposition section, phrase a1 is firstly introduced in C major scale. To transpose the key to A minor for constructing another phrase a2, we leverage V-I chord progression to reach to the tonic chord in A minor. Specifically, phrase a1 ends in chord 'Em' and phrase a2 starts with 'Am' chord, which construct an 'V-I' harmony progression in A minor that helps transpose to A minor scale. Then we create melodies in development section still in A minor scale, with higher average pitch and more rhythm density for building up more tensions. Finally, we use the same strategy for going back to phrase a1 in C major by transposing. Besides building up this chord progression template, we also set the tonality token as "MAJ" for a1 and "MIN" for a2 and b1 to have model generate phrase in the specific scale.