Anomaly detection system. Case study

The problem

There is a coldproof box with electronics (e.g. smart house control center).
We need to detect any environmental anomalies inside the box-case like overheating, coldproof failure or any other.

The plan

  1. The first thing to do is to gather some measurements data of normal system operation.
  2. Define a probabilistic model of normal operation.
  3. Continue reading “Anomaly detection system. Case study”

Extracting possible solar wind predictors

This is a part of study described in the dedicated post.

I am going to perform data clean up and feature extraction for Solar wind model fitting. The major predictor of the solar wind is considered to be coronal holes characteristics (e.g. see this paper)

I’ve got two CSV data sets that contain quantitative features extracted from the Sun images with computer vision algorithms.
One file is “green” (193nm) spectrum portion originated features, another one is “red” (211nm) spectrum portion originated features
Continue reading “Extracting possible solar wind predictors”

Solar wind prediction

I’m going to do an experiment of predicting the solar wind speed near the Earth based on the series of the Sun images.

Skobeltsyn Institute of Nuclear Physics of Moscow State University publishes observation and prediction data on space weather. Solar wind prediction is also published there. My experiment is to try to build more accurate prediction based on the same initial data from SINP MSU.

The experiment is the following:

  1. Take the data from SINP MSU
  2. Prepare features data to be used as predictors
  3. Prepare observational data to be used as reference values
  4. Calculate error rates for current SINP MSU model
  5. Designing the computational model for the solar wind
  6. Fit Pulse-based model of solar wind, calculate error rates
  7. Fit machine learining regression
  8. Compare the error rates of each of the models

For each of this experiment phases I will publish a separate post.

Toradex Oak sensors on FreeBSD

As the world moves toward the Internet of Things there are lots of cheap environmental sensors available at the market.
When it just started several years ago I spotted the Toradex company that sells embedded devices. I caught sight of the sensors series called Toradex Oak sensors. The Toradex supplied Microsoft Robotics Studio libraries for them which was right enough for my student project. So I ordered two.

Now I’m building the monitoring system for a summer house based on Raspberry PI. And these sensors made by Toradex suits well for gathering environmental data.
The official site provides a sample of using the sensors on linux. But I have a FreeBSD.
So I started to think about constructing a simple solution to gather the data on BSD.

The sensors identify themselves as HID devices. After short investigation I found that FreeBSD provides usbhidctl utility to communicate with HID. That looked promising as it did not require linux emulation. With a single command we are able to fetch all the immediate values from the sensor!

Another task was data storage engine. My colleague Eugene suggested me using collectd or statsd to organize storage. Both of them appeared to be able to store the data and to stream the data to remote host for further storage. I decided to use collecd as it is in C so my Rapberry PI box will have minimal package set.

Finally I ended up with the script that is invoked by collectd. The script enumerates HID USB devices, finds Toradex sensors, gets the values from them, applies proper units transformation and returns the data as the string compatible with collectd.

I share it here. So you can download it, modify and extend for your needs.

Open the post to access downloads.

GHCN v3 in SQL

The Global Historical Climatology Network-Monthly (GHCN-M) dataset by NCDC is particularly important data set if your research deals with climate data. It is widely accepted. Its major advantage is quality control and a variety of data sources combined together. I used it several times as reference data for validation of calculated climate surfaces. It is also great for uncertainty assessment of climate interpolation methods.

But it is distributed as text files of specific format only. And you will have to write a parser to fetch the data.

This week I decided to load the GHCNv3 into MySQL to make it flexible for fetching. I can fetch different subsets of the data into CSV files just with composing a proper select query. That made a significant speed up in experiments with interpolation techniques.

I share these SQL scripts to enable others researchers to load GHCN v3 into their own SQL servers. You can restore GHCN at your server and perform requests to it. Just download the script, execute it. And you are able to get the data you need. Fast :)

The scripts do not contain CREATE DATABASE statements. Thus create an empty database by hand and then execute the proper script.

Atmospheric pressure data archive

For those who want to practice some data processing skills and time series model fitting I publish the following archives:

2011 whole year atmospheric pressure archive
2012 whole year atmospheric pressure archive
2013 whole year atmospheric pressure archive

The files are compressed CSV. Each line of the files is one minute average of sensor measurements reporting the values every 5 seconds.

The sensor is located at 55.73080°N 37.42206°E at altitude 221m
I use Toradex OakP v1.2a sensor

The data is raw in a sense that it could contain gaps and slight time shifts.

Feel free to use it. Any acknowledgements are appreciated if you use the data in your research ;-)

Atmospheric pressure in Moscow

My first blog entry is about one of my small prototype web apps.

This is real-time chart of atmospheric pressure measured in my flat.

Several years ago I was interested in atmospheric data analysis. I studied time series models and wanted to build some real predictions using real data.

A year ago I created the page with the chart. The recent week data is  here. I use Toradex OAK Pressure sensor to gather the data.

If you live in Moscow, you may be aware if it is cyclone or anticyclone over us :)