This will be a series of introduction to Machine Learning, starting from the very foundation of machine learning and then moving into topics within the field. I will cover datasets, principal component analysis, probability, linear regression, neural networks and much more.

What is Machine Learning and what is the foundation of it?

Machine Learning builds on the likes of Alan Turing, who proposed the famous question of 'Can we construct a machine that can do the same things a human can do?' Of course we know today that none has every passed this test, though you could argue that a machine can do specific tasks that humans also can do, but not all tasks that humans can do. In essence, Machine Learning builds on Turing's idea, in the sense that Machine Learning is the implementation of his idea. what is Machine Learning?

Machine Learning is the use of algorithms to learn a machine to do specific things, and the whole concept is completely based on large amounts of data. This data is historic data, and from this data, we use machine learning to predict future situations of the same type.

An example of this would be car sales. Given the historic data of 2012-2018 in a country or state, could you predict future sales of cars for the fiscal year of 2019?

In Machine Learning there is this concept of 'a large dataset is good' and 'an enormous dataset is even better'. A machine is naturally better at interpreting such large amounts of data, because to a human brain, it is too complicated to understand.

What is supervised learning?

Predicting a quantitiy based on other quantities is what we call 'supervised machine learning'. It is useful to distinguish between the two types of supervised learning, namely regression and classification.

  • Regression: determine the output value from the input variables. An example would be with an input variable as advertising expenditure and an output variable as amount of sales. We would call the output variable a continuous output variable. This means we are given some observed values x and are predicting a continuous reponse y.
  • Classification: determine which class a new data object belongs to. An example would be hand-written digits. We have a set of images of hand-written digits and have to determine what number is contained in a new image. This means we are given some observed values x and are predicting a discrete response y.

What is unsupervised learning?

Trying to label some information from a dataset is what we call 'unsupervised machine learning'. An example would be 'which animal is in this picture?', since it is not something we can just sit down and use supervised learning for, without it being a tedious process of manually labeling thousands of pictures and then maybe at the end get a result. There are many unsupervised learning processes or models, but here is a few:

  • Clustering: Clustering refers to dividing observations into clusters, e.g. categories of the size of clothes that would fit a person. The sizes small, medium and large could be clusters. This refers to actually dividing observations into groups, not just classifying where a new observation fits into such a group (as in classification, supervised learning).
  • Anomaly detection: This is the process of figuring out which observations significantly deviate from the other observations in that dataset. Detecting such an anomaly could be viewed as an observation on a graph, that is an outlier. Or a point that is so far away from all others points that we call it an anomaly.
  • Association rule learning: Often you want to know a rule based on previous actions, this is what association rule learning is for. You would perhaps recommend video Y, if a person has seen video X on YouTube, because that video can be deemed as of interest to that person. This is also applicable in large scale, think 'what items are people most likely to buy in a supermarket?' and then you would discover other items, that the customers of that supermarket would probably be interested in.

What is a model in Machine Learning?

A model is simply something that learns a prediction rule from the data you feed that model. An example is recognizing hand-written digits; where you have $10000$ examples of each digit $0,1,...,9$. Then we would have $100000$ examples, which we denote as $N$.

We feed this data of the digits to a model, perhaps with the notation $A$ or $X$ for the matrix which holds the $100000$ examples of digits and $b$ or $y$ for the matrix that holds the possibilities of digits, $0,1,...,9$.

Then the output of the model is a function $f(x)$, which we call the prediction rule. We would usually want to predict how many errors this rule makes, and that is referred to as the generalization error. Then we can input test data to test our model, and from there, if our model proves to be good enough, use it with new data.

How is the process of doing Machine Learning?

This can be the process of doing Machine Learning:

Finding the right dataset is possibly something that you would use much time on. Especially if the data is coming from outside a company, if you need to find it on the internet.

You often want to get to know more about your dataset before you start modelling. This is often done through an exploratory data analysis, where you seek to describe the data, manipulate (add, remove, tweak) data and visualize data to get a feeling for what you are working with. Data visualization can also be helpful when interacting with customers or when explaining something about the data to someone else in the company you are working for.

When your knowledge of the data you are working with is sufficient, you would start modelling the data. This usually involves some analysis of how you should model the data, what will make a good and a bad model?

Afterwards, you evaluate the model and find out if the model is good or bad, and whether it can be tested using training data. If the model is good, you would perhaps deem it ready for use by your customer or whomever the receiver is.