The post Birdsong.report B1 Model Classification Quality appeared first on Dmitry A. Grechka.
]]>I made per class classification quality analysis (don’t know why I did not do it before!) and the results are both impressive and distressing.
The metrics are calculated on 2174 validation samples.
They are not class balanced, but it is a random subset of whole available data, it reflects the distribution of classes in whole dataset.
First of all, some of the birds are classified really well.
Let’s look at per-bird Precision (Positive Predictive Value) plot, which indicates how many of predicted species are indeed that species.
12 species have the values higher than 0.75.
Among 87 predicted nightingales, 84 were indeed nightingales (the highest PPV=0.9655 among all birds)
Sensitivity is even more impressive for some species.
The skylark has the highest sensitivity value of 0.988 (84 out of 85 presented skylark songs were detected)!
But here comes the distressing part: the network did not detect 7 of presented species at all!
You may see them having 0 sensitivity value at the lower part of the plot.
But before proceeding to the discussion about these faults, let’s see the last two plots: F1-scores for different classes and the combined plot for all 3 metrics.
Not bad!
Interesting that sensitivity is almost always higher than precision. Why is it so? Did it happen by chance?
The network did not detect 7 known species in any of 243 presented recordings.
For instance, we can explore the Woodpecker, Coal Tit and House Sparrow. Validation set contained 70 recordings of Great Spotted Woodpecker, 47 of House Sparrow, 44
of Coal Tit. None of them were classified correctly.
How did the network classify them?
The network considered all of the Coal Tit samples as Eurasian Blue Tit or Great Tit. They are all tits so maybe it is normal for the network to be confused.
Similar situation is with the House Sparrow, it is classified as Eurasian Tree Sparrow.
Well, maybe it is too early to train the network to learn spices from the same family.
The situation is different with the woodpecker, it is classified as Swallow. That is really strange.
Consider yourself:
As for me they sound very different, so this is a direction for investigation.
It seems that the network is not ready for distinguishing the species from the same family. Maybe in the next release I’ll leave only one tit and one sparrow.
As for other species, network performs reasonably well, so I’ll try to add some other new species.
The post Birdsong.report B1 Model Classification Quality appeared first on Dmitry A. Grechka.
]]>The post Anomaly detection system. Case study appeared first on Dmitry A. Grechka.
]]>There is a coldproof box with electronics (e.g. smart house control center).
We need to detect any environmental anomalies inside the box-case like overheating, coldproof failure or any other.
Infer the parameters of the model with respect to the gathered measurements.
At this point we get the ability to evaluate the likelihood of every single measurement by substitution the measurement values vector to the probability density function of the modeled multivariate variable.
When the likelihood value crosses the threshold (the measurement is too unlikely) we can admit that the system is in an anomaly state.
How to choose the threshold?
I recorded per minute measurements of air temperature and relative humidity sensors inside the box for 40 consequent days. This is about 60000 data points.
Do all of the recorded values correspond to normal operation?
I set the experiment of the accident (considered to be anomaly): the coldproof cover is not locked, so the cold out of the box air can flow inside the box
This is how the drop in temperature time series look like.
The red points reflect the box-opened case. While the black ones reflect the closed cover.
If we plot the same set of red points along with all other ~60 000 points on Temperature/RelHum plane we can see the following:
What we actually see is that in case of an accident, the drop in air temperature with the same amount of moisture in the air results in increase of relative humidity.
This picture suggests the following hypothesis: the event of box-case closure failure comes along with negatively correlated set of points.
If we mark a series of negatively correlated points as anomaly (calculating correlation inside running window over time-series) we get the following:
The density of observed measurements does not look like any well known distribution
Rather it looks like mixed set of normal distributions (with several peeks)
However, the correlation between air temperature and relative air humidity is strong during normal operation. Pearson correlation is 0.91!
That’s why ignoring correlation would be too rough. We need to model observed measurements as a multivariate variable,
This simplest variant is multivariate normal (Even though separately the distributions are not quite normal).
Multivariate normal distribution is parametrized with means vector and covariance matrix.
After calculating these parameters and plotting the density function we get the following.
The black curve here is kde while the red one is normal distribution model.
Although the separate plots above (temperature and rel. humidity) show that the model does not fit real density very well, joint probability distribution is well shaped. Just as a set of measurement points presented in above plots!
There are 57200 non-anomaly points and 147 hand-labeled anomaly points (see plots with red and black points above) in total in our disposal. I split the data as follows.
All non-anomaly points are split into training, validation and test sets in 60% / 20% / 20% ratio.
Anomaly points are split into validation and tests sets in 50% / 50% ratio.
Training set (containing only non-anomaly points) has already been used in the previous section when we chose the means vector and the covariance matrix for the multivariate distribution.
Now we use validation set (11440 non-anomaly and 74 anomaly points) to infer the threshold (epsilon) for flagging a measurement as anomaly. As the anomaly points count is much less then non-anomaly (as it should be in every anomaly detection system :-) ) it is wise to use F-measure to score the performance of classification. We will use the F1.
Optimisation run achieved a F1-score of 0.756. It corresponds to the density value of 0.00089. Everything is ready now for the operation of the anomaly detection. If measured temperature and humidity substituted into density function is less then 0.00089 we consider this as anomaly.
Now it is time to use the third dataset, a test dataset.
We will evaluate how the model performs of the previously unseen testset. The results are:
## Confusion Matrix and Statistics
##
## Reference
## Prediction anom norm
## anom 47 0
## norm 26 11440
##
## Accuracy : 0.9977
## 95% CI : (0.9967, 0.9985)
## No Information Rate : 0.9937
## P-Value [Acc > NIR] : 1.912e-10
##
## Kappa : 0.7823
## Mcnemar's Test P-Value : 9.443e-07
##
## Sensitivity : 0.643836
## Specificity : 1.000000
## Pos Pred Value : 1.000000
## Neg Pred Value : 0.997732
## Prevalence : 0.006341
## Detection Rate : 0.004082
## Detection Prevalence : 0.004082
## Balanced Accuracy : 0.821918
##
## 'Positive' Class : anom
This corresponds to F1 = 0.783
The prediction looks like this:
Good!
The anomaly detection system is ready.
The post Anomaly detection system. Case study appeared first on Dmitry A. Grechka.
]]>The post My bird song classifier appeared first on Dmitry A. Grechka.
]]>When I first came up with an idea of building a bird song classifier I started to google for the training dataset.
I found xeno-canto.org and the first thing that caught my attention was spectrograms.
(Sepctrogram is visual representation of how spectrum evolves through time. The vertical axis reflects frequency, the horizontal represents time. Bright pixels on the spectrogram indicate that for this particular time there is a signal of this particular frequency)
Well, spectrograms are ideal for visual pattern matching!
Why do I need to analyse sound when we have such expressive visual patterns of songs? That was my thoughts.
I decided to train neural net to classify spectrograms.
Almost immediately I found corresponding paper on the Web: Convolutional Neural Networks for Large-Scale BirdSong Classification in Noisy Environment. That’s perferct! The idea should work if someone had similar idea!
The spectrogram classification is rather different from real world image (e.g. photos) classification.
First, the patterns in the spectrogram do not change their scale while the objects in real world photos can be at different distances from the camera resulting in being in different scales.
Second, the image dimensions. In the spectrogram vertical dimension is frequency. And the sound patterns of bird song are usually localised in frequency domain. This means that the patterns usually appear around the same vertical part of the image. We could utilise it. For instance we do not need to care much about equivariance along vertical dimension.
Horizontal dimension is the time. We can utilise it too by applying time specific techniques of classification. For instance LSTM cells perform really well in classifying time evolving data. We can apply LSTM to spectrogram through “scanning” along horizontal dimension, making LSTM consume time frames.
As there are so many differences from classification of real world images, using visual pattern recognition tricks like inception modules are not such useful here.
So I decided to make the network and teach it from scratch.
Basically it is Deep Convolutional network with LSTM cell instead of dense layers.
It consumes spectrogram of a recording as an input, splited by one second long intervals. So every image has the same height and width.
The input frame is passed through convolutional layers. This transforms 256×128 image into 16x8x64 data cube (four convolutional layers with 2×2 max pooling reduce the size of the image by two four times, resulting data cube also has 64 channels). Then this data cube is consumed by LSTM cell.
Finnaly the output layer with softmax produces 30 probabilities each of those corresponds to a separate bird.
For the training set I use 8853 independent recordings that gave 74642 separate training samples (not intersecting, ten seconds long recordings – ten spectrogram frames, one second long each).
For now the model achieved accuracy of 67% on the unfamiliar data. I consider it pretty good, as random guess would give 3.33%.
Right now I have even more promising network architecture in my mind, but first I’ll deploy the current model, so everyone will be able to try infer the bird song with it.
The post My bird song classifier appeared first on Dmitry A. Grechka.
]]>The post Xeno Canto top 30 European birds appeared first on Dmitry A. Grechka.
]]>I decided to start with classification of 30 most commonly recorded European birds.
Through Xeno Canto API I downloaded all available top quality recording for these 30 birds (9875 separate mp3 files 20Gb in total).
The most common species among top quality recording is Great Tit. It has 903 files with total duration of more than 13 hours.
The longest total duration of the recordings holds Common Blackbird with more than 27 hours of recordings.
The longest single recording (1 hour 18 minutes) is Marsh Warbler recorded by Volker Arnold in Heide, Germany.
The shortest single recording is of 0.764 seconds (Red Crossbill).
The average duration of the recording is 1 min 35 seconds.
95% of the files are shorter then 4:10.
And 1% of the files are longer then 9:45.
All of the files are mp3.
They vary in sampling rate. The most abundant is 44.1 kHz.
Most of the files are stereo while some are not.
The API does not provide an ability to extract other species that may present within each recording, however site itself give that ability. It’s a pity because it could bring serious impediments to the machine learning process. Hope, I’ll find the way to cope with it.
The post Xeno Canto top 30 European birds appeared first on Dmitry A. Grechka.
]]>The post Multilayer perceptron learning capacity appeared first on Dmitry A. Grechka.
]]>The plot shows the minimum value of loss function achieved across different training runs.
Each dot in the figure corresponds to a separate training run finished by stucking in some minimum of the loss function.
You may see that the training procedure can stuck in the local minimum regardless of the hidden units number.
This means that one needs to carry out many training runs in order to figure out real network architecture learning capacity.
The lower boundary of the point cloud depicts the learning capacity. We can see that learning capacity slowly rises as we increase the number of hidden layer units.
Learning capacity is also reflected into the achieved accuracy.
In this plot, as in the previous one, each of the dot is separate finished training run.
But in this one Y-axis depicts classification accuracy.
Interesting that the network with even 10 units can correctly classify more than a half of the images.
It is also surprising for me that the best models cluster together forming a clear gap between them and others.
Can you see the empty space stride?
Is this some particular image feature that is ether captured or not?
The post Multilayer perceptron learning capacity appeared first on Dmitry A. Grechka.
]]>The post Solar wind computational model appeared first on Dmitry A. Grechka.
]]>The model has the following parametres:
The overall average solar wind velocity at arbitrary distance from the Sun (x) at arbitrary time moment (t) is the following.
Where
is velocity of i-th particle burst generated by i-th coronal hole.
is density of i-th particle burst generated by i-th coronal hole
Indicator function that chooses only particle burst that are at right location at right time moment.
For this computational experiment I fitted 2^{nd} order ascending polynomial functions as Vel(a) and Den(a)
The model time resolution is one minute. Thus 1 model time tick is 1 minute of real time.
The model space resolution is 1.0×10^{8}m. 1 model space step (bin) is 1.0×10^{8}m which is approximately 10 Earth diameters, and 1 AU is ~ 1500 model space steps.
The maximum coronal hole relative area can be 20% of the visible Sun area.
The maximum solar wind burst velocity considered to be 1 bin per 1 tick which is 1.0×10^{8}m per minute which is 1666.(6) km/s.
As coronal hole area data used to emit bursts I used continuous subset of observations of 2015 year (between 40 to 8000 hours from the beginning of 2015 year, which is about 2 January 2015 – 30 November 2015)
Reference observations of solar wind velocity start little bit later to wait until first bursts reach the Earth. Thus references observations are per hour observations of 2015 year (between 150 and 8000 hours from the beginning of the year, which is about 7 January 2015 – 30 November 2015)
I consider that 5 days since the first bursts are emitted is enough to establish continuous flow of bursts at Earth distance.
I did several runs of model parameter fitting using MCMC fitter developed in MSR Cambridge in collaboration with our lab.
I will evaluate 5 parameter sets that gave the maximum likelihood of observed values (MLE).
The following figure demonstrates five fitted functions of velocities that give maximum likelihood.
Corresponding fitted burst density function are
If we draw simulations using the functions above we get
Crosses indicate real observations of the solar wind velocities.
Solid lines of difference colours represent the estimated mean solar wind velocities. Shaded areas represents mean +/- 1 noise sigma intervals.
As a result, even without explicit evaluation of error rates it can be seen that solar wind computational model behaves poorly for now. Simulated solar wind varies significantly less in magnitude than real wind measurements.
It can indicate that the model miss some important features. It stops the fitter to reduce noise and fit the variability better.
I can foresee the following causes of poor fitting:
You can reproduce the experiment using docker: “docker run -d -e seed=1 dgrechka/solarwindmodelfitter:0.3”
In addition the latest docker image of the fitter is here.
R report for figures generation.
This is a part of large solar wind prediction experiment, which I describe in a separate post.
The post Solar wind computational model appeared first on Dmitry A. Grechka.
]]>The post Solar wind simulation: particle bursts engine appeared first on Dmitry A. Grechka.
]]>The Solar Wind Particle Burst Engine models the solar wind by simulating bursts of particles emitted by coronal holes. The idea is to simulate continuous flow of particles using discrete representation of the world. We can represent the wind as a finite number of particle bursts. The world space is one dimensional. It is represented with finite number of bins. The bins are enumerated with index. Greater the index, greater the distance of the bin from the Sun. Therefore the bin with index 0 corresponds to the Sun surface. Each bin at every particular time moment can “contain” zero or more particle bursts. At every world time tick (the time is modelled in a discrete way in form of integer ticks) the particle bursts move out of the Sun by leaving one bin and getting into another.
Since we can characterise each burst with some emergence time (tick) and some velocity, each simulation run can be entirely described as a set of bursts. The velocity of every burst is constant, we consider it as a function of coronal holes at the burst emergence time moment.
Here you can see the demo run of burst engine:
Three standalone bursts are emitted at 0, 10 and 20 simulation ticks with corresponding velocities of 0.2, 0.5 and 1.0 bin/tick.
The Y axis shows the average velocity of all the bursts in the bin.
In addition, this simulation can also be represented as burst “traces”:
The colour of this colormap reflects the average velocity of all bursts at space-time point.
Furthermore, if we emit a burst at every simulation tick we can model the continuous flow of the solar wind. The only requirement here is not to have velocity exceeding 1.0 bin/tick. To emit the burst at every tick we can interpolate the properties of closest bursts. Hence, for the example above we can get the following “continuous” solar wind flow (20 bursts acquired from linear interpolation of 3 standalone bursts).
The traces for this configuration now look like this:
We can see that discrete nature of the model is exposed in the late steps (50 and greater). This happens because there are nor more bursts are emitted after final burst at tick 20.
Looking at ticks interval 0 to 20 we can see that the simulated wind is “continuous” (e.g. without sudden drops of average velocity to zero)
As a result, we can construct the simulation run by advancing the model world induced by evaluation of real coronal holes. For example, for year 2015 the wind can be modelled like:
Trace view in this case transforms from “traces” into a velocity magnitude “field”:
You can notice that most of the time the flow indeed looks like continuous. Sometimes the flow scrambles (e.g. see time moment around tick 9150) which indicates that the way in which I design the engine can be not well suited for modelling real world.
I will discuss the burst velocity function of coronal holes, space bin sizes, model tick length as well as converting velocity units from real world to model and vise versa in additional post where I perform inference of most of these parameters.
This is a part of large solar wind prediction experiment, which I describe in a separate post.
The post Solar wind simulation: particle bursts engine appeared first on Dmitry A. Grechka.
]]>The post NCEP/NCAR Reanalysis 1 downloader appeared first on Dmitry A. Grechka.
]]>The post NCEP/NCAR Reanalysis 1 downloader appeared first on Dmitry A. Grechka.
]]>The post MSU SINP Solar Wind Velocity Forecast appeared first on Dmitry A. Grechka.
]]>We will compare “out-of-sample” error of MSU SINP model, Wind Pulses simulator and machine learning regression model. In this post we will assess current SINP solar wind predictions.
We need to define error measure for models comparison.
As we predict quantitative value of solar wind velocity, the good error measure could be RMSE. We will also calculate R^{2} for each of the predictions to analyse how much variance is described by a model.
Our testing data set will be days 300 – 360 of year 2014. We will use this interval to evaluate prediction error rates for every model.
SINP predictions contain lots of predicted values equal to 300. Their presence produce significant oscillations between value of 300 and other higher values (see figure below). Such oscillations can’t happen in real life. The simplest way to eliminate these oscillations is to filter out predictions equal to 300. These points with value 300 will not be accounted during interpolation of predictions.
RMSE of SINP model is 93.3.
RMSE of null model is 93.4.
R^{2} of SINP model is 0.0025192.
(null model here is average observed wind speed velocity over entire 2014 year)
Complete R report is available here.
The post MSU SINP Solar Wind Velocity Forecast appeared first on Dmitry A. Grechka.
]]>The post Solar wind observations by EPAM ACE appeared first on Dmitry A. Grechka.
]]>We have per minute solar wind observations (density, temperature and velocity) recorded by EPAM instrument of Advanced Composition Explorer (ACE) spacecraft. (see this link for data files)
This data from ACE can be used for two purposes. First, we can use this particular archive of observations (e.g. 2015 year) to fit the prediction model parameters. Second, we can use very recent measurements coming from ACE as predictors for forecasting the wind velocity for near future.
Having per minute measurements are too fine grained. Using solar wind observations in such form brings the noise to the machine learning algorithms which is bad.
I want to aggregate the data hourly. For each hour I want to get separate “trend” value and noise level for each of three data variables (density, temperature, velocity).
Thus we will get
for every hour.
We will estimate trend by running median over the data. And in this case noise can be estimates by subtracting trend from original signal.
Then the data is aggregated in hourly manner.
Here I present the plots for the first week and for the first month of 2015 year.
It is worth to notice that temperature trend values go beyond zero sometimes. This implies that the data should be additionally coerced before usage in machine learning.
The resulting data sets are published here.
Complete R report is available here.
The post Solar wind observations by EPAM ACE appeared first on Dmitry A. Grechka.
]]>