Self-driving cars, voice recognition systems, Amazon product recommendations, Netflix movierecommendations, Facebook friend recommender, Google language translations, cancer biopsy results mapped to survival rates, image recognition, network security algorithms, trading signal analysis are just a handful many machine learning that are already in practice today.
Imagine a toddler picking a toy apple and trying to take a bite out of it. After doing it for a few times, eventually the child will learn that it is a toy and will soon be able to distinguish a toy apple from a real apple, but also learns how to apply that knowledge to other toys. Machine learning has some similarities to the learning that we just noticed the child growing through.
Machine Learning is a technique where we carefully teach (train) the machine in recognizing patterns and subtleties that are natural for our subconscious brain, but are hard to put in an equation. For example, it is hard to develop a mathematical equation that can recommend movies to a person, but a friend of the same person could probably make a recommendation as he knows his friend and his tastes reasonably well. Machine Learning can act as this friend by knowingly doing what the friend did unknowingly, to observe likes and dislikes (data) of that person.
Machine Learning is usually done two ways:
1. Supervised Learning
2. Unsupervised Learning
More on the above two techniques comes here…
Data comes in many formats. Structured data is in RDBMS systems. Unstructured data comes as images, video, audio, text, email, SMS, phone calls, blogs, news articles.
If we look at all the data from the beginning to 2003 (5 exabytes), it is now created every 10 minutes. That is how much data we produce these days. How much data is sufficient for machine learning to work well?
According to Professor Andrew NG from Stanford, when you have large amounts of data, and data covering multiple dimensions (attributes), even a simple machine learning algorithm can do a great job in predicting the outcomes.
Machine Learning Algorithms Pro/Con Analysis:
Bias – variance tradeoff (2),
A low bias model is complex like a high order polynomial and may have high-variance when generalized. A high bias model is like linear regression with poor fit to training data but low variance when generalized to actual data.
The bias–variance tradeoff is a central problem in supervised learning. Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data. Unfortunately, it is typically impossible to do both simultaneously. High-variance learning methods may be able to represent their training set well, but are at risk of overfitting to noisy or unrepresentative training data. In contrast, algorithms with high bias typically produce simpler models that don’t tend to overfit, but may underfit their training data, failing to capture important regularities.
Models with low bias are usually more complex (e.g. higher-order regression polynomials), enabling them to represent the training set more accurately. In the process, however, they may also represent a large noise component in the training set, making their predictions less accurate – despite their added complexity. In contrast, models with higher bias tend to be relatively simple (low-order or even linear regression polynomials), but may produce lower variance predictions when applied beyond the training set.