Fundamentals of Machine Learning

Fundamentals of Machine Learning

Unveiling the Magic of Machine Learning: A Beginner's Guide

Hare Krishna ☀ In this blog I'll be introducing machine learning concepts that every machine learning prodigy should know it.This blog is completely beginner-friendly and also consists of application of machine learning and real-life examples, and many more just stick with me

Let's get started !!!

Introduction to Machine Learning

What is machine learning?

[Machine Learning is the] field of study that gives computers the ability to learn

without being explicitly programmed.

—Arthur Samuel, 1959

In the traditional method, you write a program and provide data then the computer generates the output for you this is what we have learned in school but thing gets interesting when you provide the computer a data and output then machine learn from these inputs and generates a program that solve problems

Why machine learning is required?

For example, In your email, there are many spam emails and you want to filter it out then you start manually doing it. Moving spam emails you start to notice that these spam emails repetitively mention the words "BIG", "HUGE", "LARGE", and "MEGA" You find it tedious to filter out these emails then machine learning comes into the picture Now, you provide machine a data of these email and output that these are fake or spam machine will learn and starts to filter this emails with program

Thus machine learning reduces manpower and swiftly executes tasks its main functions are :

  1. Problems for which existing solutions require a lot of hand-tuning

  2. Complex problems for which there is no good solution at all using a traditional

    approach

  3. Fluctuating environments

  4. Getting insights about complex problems and large amounts of data.

Types of Machine Learning

There are 4 types of Machine Learning :

  1. Supervised Learning

    It's like teaching a computer using examples. You show it a bunch of labeled pictures of cats and dogs, and it learns to tell them apart.

    This is the most commonly used type of machine learning. It's used in things like spam email filters, recommendation systems (like Netflix or Amazon suggesting movies or products), and image recognition systems.

  2. Unsupervised Learning

    This is like when the computer figures out patterns on its own. Imagine you give it a bunch of mixed-up colored beads, and it groups them based on their colors.

    While not as common as supervised learning, it's used in clustering similar items together, like grouping customers with similar buying habits for marketing.

  3. Semi-supervised Learning

    It's a bit like having a few answers in a big test. Instead of all the answers, you're given some, and you have to figure out the rest by using hints and patterns you've learned.

  4. Reinforcement Learning

    Imagine teaching a computer to play a video game. It learns by getting rewards when it does something right and punishments when it does something wrong, just like you learn by getting points in a game. It's like training a dog to do tricks, but with a computer and a game instead of a real dog.

    This can also be quite complex, as it involves an agent learning from trial and error while interacting with an environment. It's often used in advanced applications like training robots or self-driving cars.

Intel case study

Intel, a major chip manufacturer, employs machine learning uniquely. Many computers use Intel chips for core processing. Intel's Nervana chips, designed for data center servers, leverage machine learning for significant data processing tasks.

Key Machine Learning Concepts

Data, Features, and Labels

Data refers to information or facts that are collected, stored, and processed for various purposes.

The main challenge you can face in ML is 'Bad Algorithm' or 'Bad Data' and the best is you can improve a Bad Algorithm with GOOD data data >>>> algorithm

Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor-quality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. It is often well worth the effort to spend time cleaning up your training data. The truth is, most data scientists spend a significant part of their time doing just that.

For example: The customer didn't fill up the column of age which leads to missing data. Now there are options to ignore these missing values or fill these data with median or average values according to class

As the saying goes: garbage in, garbage out. Your system will only be capable of learning if the training data contains enough relevant features and not too many irrelevant ones. A critical part of the success of a Machine Learning project is coming up with a good set of features to train on. This process is called feature engineering

Models and Algorithms

An algorithm is like a set of instructions that we give to a computer to help it learn or perform a specific task. step-by-step instructions on how to learn and generate programs from data just like while cooking a dish from indigent we refer to a cooking recipe here in ML we write these recipes for the machine to digest data and learn from it.

A model in machine learning is like a smart assistant that learns from data. Previously you gave instructions to follow now the machine learned that this is a cat and no it is not a cat this learning of the machine is a Model

So, in simple terms, an algorithm is the recipe, and the model is the computer's learning from the recipe to do tasks like recognizing cats, predicting the weather, or playing games.

Data Preparation

Collecting Data

I recently tweeted about a famous open data repositories

There are also 4 ways to gather data:

  1. CSV(comma separated values)

  2. JSON/SQL

  3. Fetch from API

  4. Web scraping

    In the real world companies will provide you datasets so don't have to worry about data but many ML projects practice on CSV files How to fetch data from the above methods can be a separate blog if you want a blog Please comment !!

Cleaning Data

Data cleaning is the process of removing or correcting inaccurate, corrupt, or improperly formatted data and removing duplication within a dataset.

There are many ways to clean data using pandas I will show some of them then you can try them on !!

Step 1: Detecting missing value using pandas and seaborn

import pandas as pd
import seaborn as sns

pd.isnull() #find out all missing values
sns.heatmap() #visulize all missing values helps in large data

#step 2:
pd.interpolate() #ittakes average of upper and lower vlue and returns 
pd.dropna() #directly drop coloum of missing values

Evaluating Machine Learning Models

How to Measure Model Performance

There are various techniques to measure model accuracy but I'll be discussing checking model performance using a machine learning library (i.e sklearn)

from sklearn . metrics import (
    accuracy _ score,
    precision score,
    recall score,
    f1 score,
    roc_auc_score
)
accuracy = accuracy_score(y_test, y_pred)
precision =precision_score(y_test, y_pred)
recall= recall_score(y_test, y_pred)
f1 = fl_score(y_test, y_pred)
roc_auc= roc_auc_score(y_test, y_pred)

print(f'Accuracy: {accuracy}' )
print(f'Precision: {precision}' )
print(f'Recall: {recall}' )
print(f'Fl-score: {fl}')
print(f'ROC AUC: {roc_auc}')

Output:

  1. Introduction to Deep Learning

Deep learning is a subset of machine learning that focuses on artificial neural networks, which are inspired by the human brain. It involves training complex models, called deep neural networks, to recognize patterns and make predictions from large amounts of data. These networks have multiple layers (hence "deep"), each learning different features from the data.

Application of DL

One prominent real-life application of deep learning is in Autonomous Driving. Deep neural networks are used in self-driving cars to analyze data from sensors, cameras, and lidar to make real-time decisions, such as detecting pedestrians, recognizing road signs, and steering the vehicle safely. This technology aims to improve road safety and reduce accidents.

Interesting Fact
The concept of deep learning has been around for decades, but its recent resurgence is mainly due to advances in hardware (e.g., GPUs) and the availability of vast amounts of data.

Next Steps and Resources

In this blog, you will not find hardcore machine learning algorithms although I have explained foundational knowledge that will act as the basis of your machine learning career

Here are some Resources that I found useful:

Youtube

Complete 100 days of ML

Machine Learning Specialization by Andrew Ng

End-To-End ML Project

Books

Language - R

The Hundred-page Machine Learning Book ~ Andriy Burkov

An Introduction to Statistical Learning ~ Daniela Witten, Gareth M. James, Trevor Hastie, Robert Tibshirani

Language - Python

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow ~ Aurélien Géron

Roadmap

Roadmap by Campusx

Conclusion

In conclusion, we've embarked on a journey to understand the essence of machine learning, delving into its significance, types, key concepts, and the crucial role of data preparation. We've explored the art of measuring machine learning model performance, and we've even ventured into the fascinating world of deep learning and its real-world applications.

To all the budding machine learning enthusiasts and students out there, remember that machine learning is vast, ever-evolving, and full of exciting opportunities, challenges are the stepping stones to success. Embrace the journey, stay curious, and remember - persistence pays off. Your future in this dynamic field is bright, keep learning and keep pushing your boundaries! 🚀🤖

Feel free to connect:

X (Twitter), LinkedIn

Hare Krishna 🙏