Handling Categorical Features in Machine Learning

Introduction: Every dataset has two type of variables Continuous(Numerical) and Categorical. Regression based algorithms use continuous and categorical features to build the models. You can’t fit categorical variables into a regression equation in their raw form in most of the ML Libraries. If it is not included in the modeling, then you do not get an accurate model. It’s crucial to learn the methods of dealing with such variables. There are many machine learning libraries that deal with categorical variables in various ways. Approach on how to transform and use those efficiently in model training, varies based on multiple conditions, including the algorithm being used, as well as the relation between the response variable and the categorical variable(s). Here I take the opportunity to demonstrate the various methods prevalent and incorporated in the popular Machine Learning Library in Spark, i.e.Mllib for handling categorical variables. Continue reading

Using Docker Containers as Mininet Hosts

Talentica believes in continuous learning and innovation. We the Talenticians have always been encouraged to undertake learning and experimenting with emerging technologies. With the same objective, we have setup an internal R&D group working on upcoming areas. Software defined networking (SDN) is one the areas where we are developing proficiency.
         As a part of SDN group, we are working on a measuring Hadoop Map Reduce shuffle phase network transfer and also possible traffic engineering solutions for optimizing shuffle phase network transfer. While working on the same, we encountered several challenges, this blog is highlighting solution to one the challenges we faced. Continue reading

SPOT-FLEET and Auto Scaling

What is Spot-Fleet?

An AWS Spot Fleet is a collection of EC2 servers purchased at reduced rates by bidding at lowest price within AWS. Spot Fleet requests are made using the Spot Fleet API or through the command-line interface (CLI). Users request a Spot Fleet by specifying the maximum desired price per instance hour, target capacity and launch specifications such as the number of instances desired, instance types and availability zones. Prices of Spot Instances constantly change, but EC2 attempts to maintain the requested target capacity of the Spot Fleet. EC2 terminates Spot Instances as prices rise above user-defined settings and launches replacement instances at the new lowest-priced option available for an another acceptable instance type. AWS automates management of these instance collections. Spot Fleet requests are active until cancelled by a user. Continue reading

SDN Captive Portal

Captive portal is a networking solution which performs authentication of the users before granting them network access. It secures the network from unwanted and unauthorized access by providing a landing web page where all the browser’s requests from an unauthenticated user are redirected and authentication is performed. Traditionally a captive portal solution is implemented on the networking device itself by enabling the device to redirect the browser’s requests from unauthenticated users. In a SDN environment we can separate the logic from the device to a centralized controller platform and write our own applications for networking solutions. In this post we explain a captive portal solution for Software Defined Networking (SDN) infrastructure. Continue reading

Best Ways to Select an OPD Partner for a Startup

Having spent a considerable time on the other side of the table, and being evaluated for the role of OPD Partner by 50+ startup founders, I have learned some of the clever ways to hire the best suited Outsourced Product Development Team for a startup.

This blog of mine is an outcome of all the experience I have gained from my decades of working with some of the best minds in the industry. So, here are the Best Ways to Select an OPD Partner: Continue reading