We used approx. 4Gb of midi-files for training of the so called Variational Recurrent Autoencoder Supported by History (VRASH, see the detailed description in the scientific preprint). This architecture has a number of serious advantages. In particular, one can use the author of a certain track or its style as inputs for the content generation and could expect that automatically generated output would resemble the given author or style.
In the tasks of such kind the quality and size of the training data set is crucial for the subjective quality of the output. As long as the network is trained it can produce a huge number of tracks with different meta-parameters but the quality of each track can vary and overall performance depends severely on the diversity of the melodies in a training set.
Check out our projects made with Pianola technology here: The Future of Jazz and NeuroScriabin Concert. You can also read our research preprint on arxiv.org.
Thanks to Kirill @innubis Anastasin for the logo: