An intuitive introduction to Gaussian processes

February 1, 2019

Christopher Fonnesbeck did a talk about Bayesian Non-parametric Models for Data Science using PyMC3 on PyCon 2018. In this talk, he glanced over Bayes’ modeling, the neat properties of Gaussian distributions and then quickly turned to the application of Gaussian Processes, a distribution over infinite functions. Wait, but what?! How does a Gaussian represent a function? I did not understand how, but the promise of what these Gaussian Processes representing a distribution over nonlinear and nonparametric functions really intrigued me and therefore turned into a new subject for a post. Read more

Algorithm breakdown: Why do we call it Gradient Boosting?

November 19, 2018

We were making a training at work about ensemble models. When we were discussing different techniques like bagging, boosting, and stacking, we also came on the subject of gradient boosting. Intuitively, gradient boosting, by training on the residuals made sense. However, the name gradient boosting did not right away. This post we are exploring the name of gradient boosting and of course also the model itself! Intuition Single decision tree Gradient boosting is often used as an optimization technique for decision trees. Read more

Build Facebook's Prophet in PyMC3; Bayesian time series analyis with Generalized Additive Models

October 9, 2018

Last Algorithm Breakdown we build an ARIMA model from scratch and discussed the use cases of that kind of models. ARIMA models are great when you have got stationary data and when you want to predict a few time steps into the future. A lot of business data, being generated by human processes, have got weekly and yearly seasonalities (we for instance, seem work to less in weekends and holidays) and show peaks at certain events. Read more

Algorithm Breakdown: AR, MA and ARIMA models

September 26, 2018

Time series are a quite unique topic within machine learning. In a lot of problems the dependent variable $y$, i.e. the thing we want to predict is dependent on very clear inputs, such as pixels of an image, words in a sentence, the properties of a persons buying behavior, etc. In time series these indepent variables are often not known. For instance, in stock markets, we don’t have a clear independent set of variables where we can fit a model on. Read more

Deploy any machine learning model serverless in AWS

September 16, 2018

When a machine learning model goes into production, it is very likely to be idle most of the time. There are a lot of use cases, where a model only needs to run inference when new data is available. If we do have such a use case and we deploy a model on a server, it will eagerly be checking for new data, only to be disappointed for most of its lifetime and meanwhile you pay for the live time of the server. Read more

Generative Adversarial Networks in Pytorch: The distribution of Art

July 16, 2018

Generative adversarial networks seem to be able to generate amazing stuff. I wanted to do a small project with GANs and in the process create something fancy for on the wall. Therefore I tried to train a GAN on a dataset of art paintings. This post I’ll explore if I’ll succeed in getting a full hd new Picasso on the wall. The pictures above give you a glimplse of some of the results from the model. Read more

Clustering data with Dirichlet Mixtures in Edward and Pymc3

June 5, 2018

Last post I’ve described the Affinity Propagation algorithm. The reason why I wrote about this algorithm was because I was interested in clustering data points without specifying k, i.e. the number of clusters present in the data. This post continues with the same fascination, however now we take a generative approach. In other words, we are going to examine which models could have generated the observed data. Through bayesian inference we hope to find the hidden (latent) distributions that most likely generated the data points. Read more

Algorithm Breakdown: Affinity Propagation

May 18, 2018

On a project I worked on at the ANWB (Dutch road side assistence company) we mined driving behavior data. We wanted to know how many persons were likely to drive a certain vehicle on a regular basis. Naturally k-means clustering came to mind. The k-means algorithm finds clusters with the least inertia for a given k. A drawback is that often, k is not known. For the question about the numbers of persons driving a car, this isn’t that big of a problem as we have a good estimate of what k should be. Read more

Transfer learning with Pytorch: Assessing road safety with computer vision

April 12, 2018

For a project at Xomnia, I had the oppertunity to do a cool computer vision assignment. We tried to predict the input of a road safety model. Eurorap is such a model. In short, it works something like this. You take some cars, mount them with cameras and drive around the road you’re interested in. The ‘Google Streetview’ like material you’ve collected is sent to a crowdsourced workforce (at Amazon they are called Mechanical Turks) to manually label the footage. Read more

Computer build me a bridge

January 14, 2018

In earlier posts I’ve analyzed simple structures with a Python fem package called anaStruct. And in this post I’ve used anaStruct to analyze a very non linear roof ponding problem. Modelling a structure in Python may seem cumbersome in relation to some programs that offer a graphical user interface. For simple structures this may well be the case. However now we’ve got a simple way to programmatically model 2D structures, I was wondering if we could let a computer model these structures for us. Read more

(c) 2018 Ritchie Vink.