Docker? It’s Easy If You Do It Smart

Context

We all must have heard of docker but why and where exactly we need to use it, is still confusing. Also this question arises why we need to adopt this new technology if virtual machines already do the job.

Well, first of all we need to understand that, docker containers are not replacement of virtual machines. Both have got their own use case. In reality, both are complementary technologies—as hardware virtualization and containerization each have their distinct qualities and can be used in tandem for combinatoric benefits. Continue reading

Learning Machine Learning – Part 1

What is Machine Learning(ML)? As per a definition given by Tom Mitchell, Machine Learning is the ability of a computer program to improve its Performance(P) at a given task(T) using prior experience(E).

ML problems can be broadly classified into Supervised and Unsupervised learning. These categories have further sub-categories.

  • Supervised – You are given a data set and there is a known relation between input and output. The computer program uses that test data and to learn the relation and use it to predict the output for any given input.
    • Regression – In these set of problems, the output is a continuous function of input, eg. Given a picture of a person, we have to predict their age.
    • Classification – Here, the output is discrete. eg. Given a picture of a person, we have to identify their race/gender etc.
  • Unsupervised – The computer program is not fed with test instances. It first identifies all different groups/classes that the data can be ‘classified’ into. And then use that knowledge to predict where a particular data instance will fit best into.
    • Clustering
    • Non-clustering

Now that we are done with definitions, lets take up a simple regression problem and dive into the mathematics involved to arrive at an algorithm(Gradient Descent).

Problem – Given the age(x) of a house,  predict its price(y).

Lets assume we are given a data set of 10,000 houses with their age and current market price. So test data for our ML program will be of the form (xi, yi) where i ∈ [1,10000]. Now we will feed these data instances to our learning algorithm and come out with a predictor function, h(x) = y = θ0 + θ1x, where θ0, θ1 are variables that we need to find such that the predicted value of y is closest to the actual y.

h(x) is known as hypothesis function.

A diagram will make things easier…

points_for_linear_regression1-1

This is a plot of y against x for all the test instances. Our objective is to find a straight line such that average distance of each data point from the line is minimized. That line can be represented by the equation, y = θ0 + θ1x, where θand θare respectively, the y-intercept and the slope.

points_for_linear_regression1

To find such line, we will use the mean squared error method.

\operatorname {MSE}={\frac {1}{n}}\sum _{{i=1}}^{n}({\hat {Y_{i}}}-Y_{i})^{2}

where Y hat is the predicted value for the ith  instance and Y is the actual value.

Lets call this function, our cost function J(θ0, θ1).

doc-24-nov-2016-5-43-pm

WiFi direct – android local networking

In my earlier blog post I discussed data sharing between two android devices in same network using NSD. In this post we will see communication between two non-connected android devices (can be connected to same or other network, doesn’t really matter) via WiFi direct. Devices should be in WiFi range. I will start with a bit of theory about WiFi direct and then we will see how it is implementable using android APIs (Sample app source code git link at the end of post). Continue reading