Music Analysis
Josiah Boning
TJHSST Senior Research Project
Computer Systems Lab, 2007-2008
Purpose
● Apply machine learning algorithms to audio
data
– Neural Networks
● Autonomously identify what is music
Background
● Bigarelle and Iost (1999)
– Music genre can be identified by fractal dimension
● Basilie et al. (2004)
– Music genre can be identified by machine learning
algorithms
– Used discrete MIDI data
Methods
● Data Processing
– Spectral Analysis: Fourier Transform
– Fractal Dimension: Variation and ANAM Methods
● Machine Learning
– Feed-Forward Neural Network
Fourier Transform
Fourier Transform
Fractal Dimension
● “a statistical quantity that gives an indication of
how completely a fractal appears to fill space” --
Wikipedia
● Audio data is set of discrete sample points, not
a function
● Therefore, fractal dimension can only be
estimated
Fractal Dimension
lim
0
2− log 1
b−a∫
a
b
∣max f t
∣x−t∣
−min f t
∣x−t∣∣dx
log
lim
0
2− log 1
b−a ∫
x=a
x=b
[ 12 ∫
t1=0
∫
t2=0
∣ f xt1− f x−t 2∣dt1dt2 ]
1/
dx
log
● Variation Method:
● ANAM Method:
Fractal Dimension
● Variation and ANAM methods are two methods
of calculating/estimating the same value
● Should yield similar results
● They don't...
Machine Learning
● Neural networks
● Feed-Forward
Neural Network Data Structures
typedef struct _neuron {
double value;
struct _edge* weights;
double num_weights;
} neuron;
typedef struct _edge {
struct _neuron* source;
double weight;
} edge;
// sizeof(neuron) == 20
// sizeof(edge) == 16
Neural Network Pseudo-Code
For each layer:
For each node:
value = 0
For each node in the previous layer:
value += weight * value of other node
value = sigmoid(value)
Neural Network Challenges:
Memory
● Audio data: 44100 samples/sec
● Processing 1 second of data
● 44100 input, 44100 hidden nodes, 1 output
node
– Memory: (44100 * 2 + 1) * 20 bytes = 1.7 MB
● 44100 ^ 2 + 44100 edges
– Memory: (44100 ^ 2 + 44100) * 16 byte