A plain-English guide to AI, Machine Learning, Deep Learning, and everything in between.
Every week, another headline drops about Artificial Intelligence (AI). ChatGPT. Gemini. Claude. Sora. Open Claw. The pace is relentless, and so is the jargon. Machine Learning, Deep Learning, Generative AI, LLMs, Foundational Models... they get tossed around as if everyone already knows what they mean.
This post is for anyone who wants to actually understand what these terms mean - not in a textbook way, but in a real, practical way. By the end, I hope you will know exactly what each of them is, how they relate to each other, why it matters, and what all the fuss is about.
Artificial Intelligence is the broadest term of the bunch. At its core, AI is simply about making computers mimic human intelligence: thinking, reasoning, perceiving, and learning. It is not just about robots or sci-fi supercomputers. Even a robot vacuum that maps your floor and decides where to clean next is running a form of AI. It is less about complexity and more about the intent: building software that can make decisions.
Under this umbrella, researchers have developed two major approaches to building intelligent systems: Machine Learning and Deep Learning.
✏️ AI is not a new term or technology. The Dartmouth Summer Research Project on Artificial Intelligence (DSRPAI), held in the summer of 1956, is widely recognised as the birthplace of artificial intelligence as a formal academic field. The conference assembled experts from various disciplines to discuss the potential for machines to simulate human intelligence (https://admissions.dartmouth.edu/follow/3d-magazine/dartmouth-and-dawn-ai).
Machine Learning (ML) is a subset of AI. Instead of programming a computer with explicit rules ("if this, then that"), you feed it data and let it figure out the patterns on its own.
Think of it like teaching someone to cook, not by writing out every instruction, but by having them cook hundreds of meals, taste the results, and adjust over time. They develop intuition from examples, not memorisation.
ML has three main styles of learning:
Supervised Learning: The model learns from labelled examples - like photos tagged 'cat' or 'dog'. You show it what to expect.
Unsupervised Learning: No labels. The model finds its own patterns, like sorting photos by visual similarity without being told what to look for.
Reinforcement Learning: The model learns by trial and error, earning rewards for good decisions and penalties for bad ones. This is how AI learns to beat humans at chess or Go.
Deep Learning is a specialised subset of Machine Learning, and it is the technology powering most of the AI breakthroughs you hear about today.
It is inspired by the structure of the human brain. Our brains are made up of billions of interconnected neurons that form and strengthen connections over time (how we learn). Deep Learning mimics this with Artificial Neural Networks (ANNs): several layers of mathematical 'neurons' that process data and adjust their internal connections until they get things right. By adjusting the weights (connection strength) between these layers, the model learns to identify complex patterns within massive data volumes
✏️ "Learning" for a Deep Learning model is simply adjusting mathematical weights between layers until the path to the right answer is the strongest.
Unlike traditional software with strict if-then rules, Deep Learning models identify patterns in enormously complex data (faces in photos, spoken words, translated sentences) without being told the rules.
Deep Learning requires massive amounts of data and significant computing power to train effectively, but the results have been transformative: real-time translation, facial recognition, voice assistants, and self-driving cars. Deep learning also uses supervised, unsupervised, and reinforcement learning, but it is a specialised subset of Machine Learning that specifically uses multi-layered artificial neural networks to execute them.
Think of deep learning as not just teaching someone to cook, but to understand complex flavour profiles on a deeper level. The 'chef' learns to innovate dishes by combining fundamental techniques learned layer-by-layer, moving beyond instructions into creative cooking.
For a long time, training AI required human beings to manually label data. Every photo, every sentence, every data point needed a tag. This was a bottleneck; models could only learn as fast as humans could label.
Then came the Transformer architecture, and everything changed.
Transformers introduced Self-Supervised Learning, where the data labels themselves. The model takes a sentence, hides a word, and tries to predict it:
"The cat sat on the " and the model guesses: 'mat'
Because the answer is already in the original text, no human labelling is needed. This unlocked the ability to train on trillions of words scraped from the internet - and that massive scale is what gave us the 'emergent' intelligence we see in today's AI.
Transformers also changed how text is processed. Before them, AI read text word by word, left to right. Transformers process the entire sentence at once, grasping context and relationships between words regardless of where they appear.
Note: Transformers are not a subset of Deep Learning, but rather a type of Deep Learning architecture - a specific way of implementing Deep Learning.
With Transformers enabling training at massive scale, a new type of model emerged: the Foundational Model.
Previously, AI was purpose-built. You would train one model to translate French, another to detect spam, and another to summarise legal documents. Each was a specialist, built from scratch.
Foundational Models flipped this. They are enormous, general-purpose models trained on vast, diverse datasets, covering language, code, science, literature, and more. Instead of being built for one job, they serve as a powerful base that can be fine-tuned into specialists without starting over.
Think of it like hiring someone with a doctorate in everything: medicine, law, engineering, and literature. You do not retrain them from scratch for each new role; you give them context, and they adapt.
Once a Foundational Model exists, there are two main ways to make it useful for a specific purpose, and understanding the difference matters.
Fine-tuning is like sending the model back to school for a specialist course. You take the general-purpose model and continue training it on a curated, domain-specific dataset (medical records, legal documents, customer service transcripts). The model's internal weights actually change. The result is a specialist model that performs significantly better in that narrow domain.
Prompting is how most of us interact with AI, no retraining involved. You simply give the model instructions, context, or examples in your message, and it adapts its response accordingly. It is the difference between teaching someone a new skill versus just giving them a detailed briefing before a meeting.
Fine-tuning is done by developers building AI-powered products. Prompting is something anyone can do, and doing it well is a skill worth developing.
There is a common misconception worth clearing up: when you chat with an AI, you are not training it.
Training is the intensive, expensive, months-long process where the model learns from vast datasets. It happens once (or periodically), requires enormous computing infrastructure, and is done by the teams building the model.
Inference is what happens every time you send a message or request help with a task. The model's weights are fixed; it simply applies everything it already learned to generate a response to your input. Your conversation does not change the model for you or anyone else.
Why does this matter? Because it means the model is not getting smarter from your interactions in real time. It also means two people using the same model get the same underlying intelligence; the difference in output quality comes down to how well they prompt it.
Large Language Models (LLMs) are a specific type of Foundational Model, trained specifically on text. Their job is to understand and generate human language.
During training, an LLM processes enormous volumes of text and becomes extraordinarily good at predicting the next word in a sequence. When you ask one to write a story, it is not 'thinking' in the human sense: it is calculating, at lightning speed, the most statistically likely sequence of words given everything it has learned.
The result is responses that feel remarkably coherent, contextual, and even creative. Because the training data is so vast and varied, LLMs have absorbed huge amounts of world knowledge, writing styles, logic patterns, and conversational norms.
ChatGPT, Claude, Gemini, and Llama are all LLMs.
You will often hear models described by their size, but what does "large" actually mean in "Large Language Model"? The answer is parameters.
Parameters are the individual mathematical weights inside a neural network, the millions (or billions, or trillions) of numerical dials that get adjusted during training until the model produces accurate outputs. The more parameters a model has, the more nuance and complexity it can capture.
GPT-3 had 175 billion parameters, and GPT-4 has ~ 1.76 trillion parameters. Modern frontier models are estimated to have significantly more. When researchers talk about "scaling up" AI, this is largely what they mean: more parameters, more data, more compute.
GenAI is the capability that gets all the headlines. GenAI describes what a model does (creates new content); every meaningful modern GenAI system (ChatGPT, Midjourney, Sora, Claude) is built using Deep Learning. GenAI can be classified either by the model architecture (how they are built) or by the output (what they create).
GenAI models can generate:
Text - blog posts, code, essays, emails (think ChatGPT or Claude)
Images - realistic photos or illustrations from a text prompt (Midjourney, DALL-E)
Video - full video clips from a written description (Sora, Veo)
Audio - music, voice cloning, sound effects
The key shift: traditional AI analyses what already exists. Generative AI invents something that did not exist before.
For most of AI's history, models were single-purpose by design: one model for text, another for images, another for audio. That's changing fast.
Multimodal models can process and generate across multiple formats simultaneously. You can show one model a photo and ask it to describe what is in it, paste in a document and ask it to summarise and then rewrite it as a presentation, or speak to it and get a spoken response back.
Models like GPT-4o, Gemini, DeepSeek, OpenClaw, and Claude are all multimodal. This is the direction the entire field is moving, away from single-purpose tools and towards general-purpose AI assistants that can see, read, hear, and respond in kind.
These are not competing technologies; they are 'nested' layers, each building on the last. Here is the full picture at a glance:
AI is the high-level concept and goal (mimic human intelligence).
Machine Learning is a subset of AI that uses supervised, unsupervised, and reinforcement learning to figure out patterns without explicit programming.
Deep Learning is a specialised subset of Machine Learning that uses several layers of artificial neural networks to identify (self-learn) patterns in massive datasets.
Foundational Models are a type of general-purpose model built using the Deep Learning (most commonly the Transformer Architecture) and massive data sets.
Large Language Models are a type of specialised Foundational Model designed to understand and generate human language. LLMs can perform GenAI tasks, i.e., generate text or code, but not all LLM usage is generative.
GenAI is not a type of model - it is a capability; It describes what a model does when it produces something new that did not exist before (text, audio, image, or video). Any model that can create new content is performing GenAI. The Model is the What and GenAI is the Act (e.g, A car is the what, driving is the act).
✏️ Foundational Models and LLMs are the models. Depending on how they are used, they can classify, predict, summarise, or translate existing data. When those same models are used to create something new, that act of creation is what we call Generative AI. GenAI is not a separate system; it is simply what we call it when a model generates something original.
Imagine you are navigating a road trip. The goal, getting from A to B, is Artificial Intelligence. You want the machine to make intelligent decisions.
Machine Learning is the GPS learning from millions of past drivers: which routes are fast, which get congested, and when to reroute. It improves based on data.
Deep Learning is the GPS's core engine, the technology that processes live traffic feeds, satellite images, and real-time signals all at once. It handles complexity that simple rules never could.
Foundational Models are the GPS platform itself, a powerful, general-purpose navigation system that works globally across cities, terrains, and languages. From that base, it can be adapted: a logistics version for lorries, a hiking version for trails, a cycling version for bike lanes.
Generative AI is when the GPS stops just routing you and starts creating: suggesting a personalised road trip itinerary, writing turn-by-turn narration with local trivia, or generating a custom travel guide for your journey.
And an LLM? That's the natural language interface, the bit that understands when you say "avoid motorways and stop somewhere nice for lunch" and turns that into a coherent plan.
You do not need to be a data scientist to benefit from understanding these concepts. Whether you are evaluating AI tools for work, making sense of the news, or just trying to keep up with the conversations happening around you, knowing the basics helps.
These technologies are reshaping how we work, communicate, create, and make decisions. The people who understand even the fundamentals will be far better equipped to use them wisely, ask the right questions, and spot the hype from the substance.
Photo by Immo Wegmann on Unsplash
Photo by Milad Fakurian on Unsplash
Photo by Solen Feyissa on Unsplash
Photo by notorious v1ruS on Unsplash
Photo by Antony Freitas on Unsplash