
I'm excited to share that I've published my first AI book on Amazon! This book is designed to help beginners understand artificial intelligence concepts and applications in an accessible way.
Whether you're completely new to AI or looking to expand your knowledge, this book provides practical insights and real-world examples to guide your learning journey.

Authored by Claire Choi
Artificial Intelligence (AI) might sound like science fiction, but it's very real and all around us—from the recommendations Netflix gives you, to voice assistants like Siri or Alexa. Simply put, AI is when machines do tasks that normally need human intelligence. This means a computer can learn from experience, adapt to new information, and make decisions or predictions.
How is this possible? Through machine learning, a field of AI where we teach machines by example instead of giving them step-by-step rules. And within machine learning is a powerful approach called deep learning, which uses structures inspired by the human brain to learn from huge amounts of data. In this book, we'll explore how these networks work, how deep learning enables computers to learn from mistakes and improve, and how modern AI can even create new content (like text and images) using a technology called transformers.
Don't worry— we'll keep things simple and use easy examples. Whether you're a teenager imagining a future in tech or just curious about AI, this guide will help you understand the basics and even glimpse the kinds of AI careers out there waiting for you. So let's embark on this AI adventure together!

Before diving deeper, it's important to know how these terms relate to each other. Think of them like nesting dolls (one inside another): Deep learning sits inside machine learning, which in turn sits inside the broad field of AI.
Artificial Intelligence (AI) is the widest term; it means any technique that lets computers perform tasks that typically require human intelligence. This could be solving a puzzle, recognizing a face in a photo, or understanding spoken language. Early AI systems were often explicitly programmed with rules by humans. For example, a simple AI for playing tic-tac-toe might have all the game rules and strategies coded in.
Machine Learning (ML) is a subset of AI where the computer isn't given hard-coded rules; instead, it learns from data and experience. In ML, we provide examples (data) and the desired outcome, and the algorithm figures out patterns. It's like showing a child lots of pictures of cats and dogs, and eventually the child learns to tell them apart without you ever defining "pointy ears" or "fur patterns." In fact, traditional ML often required humans to hand-pick the important features (like "has whiskers" or "ear shape") for the algorithm to look at.
Deep Learning (DL) is a special type of machine learning that uses neural networks with many layers (hence "deep") to automatically learn features from the raw data. In deep learning, we don't need to hand-code "what to look for"; the neural network learns to discover the features by itself as it trains. This is inspired by how the human brain works, with layers of neurons processing information. Deep learning has proven extremely powerful for tasks like image recognition, speech translation, and more, especially as we feed these networks large amounts of data. In fact, deep learning algorithms often get better and better as you give them more data to learn from.
AI is the idea of a smart computer. ML is one way to make a computer smart— by learning from data. Deep learning is a way to do ML using layered neural networks. In the next sections, we'll focus on deep learning and neural networks, since they are behind many of the AI breakthroughs today.
At its core, the process of teaching an AI (especially via machine learning) is not magic—it's a lot like learning by example. Instead of programming a rigid set of rules, we provide data and let the system find patterns and adjust itself to get the right answers. Here's a simple way to break down a typical machine learning or deep learning workflow:
It starts with lots of data. We collect examples related to the task. For instance, if we want an AI to recognize handwritten numbers, we gather thousands of images of handwritten digits (0 through 9) along with labels telling which digit is in each image. Data is the fuel for AI learning. The more relevant, high-quality data we have, the better the AI can potentially learn. In our example, the images and their correct labels are the "experience" we'll give the AI model.

The AI model tries to learn patterns using a model (often a neural network). We choose a model architecture suitable for the task—for deep learning, this is a neural network, which we'll explain soon. We then train the model by feeding it the data. The model makes an initial guess and checks how far off it was from the correct answer. It then adjusts its internal parameters (the "knobs" or weights in the network) to do better next time. This process of adjusting based on errors is repeated many times, called training.
You can imagine a student learning how to throw a basketball. At first they miss, but each time they see how far off the throw was and correct their aim a bit. Over time, the throws get more accurate. Similarly, the neural network gradually improves at tasks like recognizing a "7" versus a "1". This improvement method is often referred to as "learning from mistakes," and in neural networks it's powered by an algorithm known as backpropagation.

Training is the process where an AI model learns to recognize patterns from data. During this stage, the model is fed a large set of labeled examples—data that includes both the input and the correct output. For example, in a handwriting recognition task, the model might be given thousands of images of handwritten digits, each labeled with the correct number they represent (0 through 9).
As it processes each example, the model makes a prediction and then checks how far off that prediction is from the correct answer. This difference is called the error or loss. The model then uses algorithms like backpropagation to adjust its internal parameters (called weights) in a way that reduces this error over time.
Importantly, no one tells the model exactly what features to look for — such as "a 5 usually has a line on the top" or "a tumor tends to be a bright, irregular region." Instead, the model discovers these patterns on its own by identifying which features consistently help it make accurate predictions across many examples. The more data and feedback the model receives, the better it becomes at recognizing the subtle patterns that matter.
This process involves many iterations over the data—often called epochs — during which the model continually updates itself to become more accurate. Training is computationally intensive and usually done using powerful hardware (like GPUs) and specialized software frameworks.
In short, training is about learning from the past and seeing enough examples to understand the underlying patterns and structures in data.

Once the model has been trained, meaning it has adjusted its internal parameters to minimize prediction errors, it's ready to be used in the real world. This stage is called inference.
During inference, we give the model new, unseen data, and it applies what it learned during training to make predictions. For example:
What makes this phase powerful is that the model is generalizing—applying its knowledge to situations it hasn't encountered before. This is the true strength of machine learning: the ability to extract patterns during training that are robust enough to work in new, often slightly different scenarios.
Inference is typically much faster than training and can be done on less powerful devices like smartphones or embedded systems. It's also the phase that powers real-world applications like image recognition, language translation, recommendation systems, and medical diagnostics.
In essence, inference is about applying knowledge—using what the model has learned to solve new problems in real-time.

To sum it up: AI learns a task by finding patterns in data, using those patterns to make predictions. The more it practices (with more data), generally the better it gets. This is fundamentally different from traditional programming, where a programmer writes specific instructions for every scenario. In AI, we let the computer write its own rules (model parameters) by learning from examples. It's as if instead of giving a student the answers, you give them lots of practice problems and let them figure out how to solve them.
One of the key tools that enable deep learning is the artificial neural network. These are mathematical models inspired by the neurons in our brains.

Imagine a single neuron in a neural network as a tiny decision-maker. It takes some inputs as numbers, does a calculation, and produces an output. In a real brain, a neuron might take signals from other neurons and decide whether to fire its own signal. In an artificial neural network, a "neuron" does something similar: it multiplies each input by a weight (a strength value), sums them up, adds a bias (another adjustable number), and then passes the result through an activation function (which is just a rule that decides the neuron's output, often introducing non-linearity).
If that sounds complex, think of it this way: each neuron looks at the inputs, and based on how important it thinks each input is (the weights), it gives out an output signal. The activation function can be seen as the neuron's "trigger"—for example, only firing strongly if the combined input is above a certain threshold.


A neural network is basically a whole bunch of these neurons connected together in layers. The input layer takes in the raw data (like all the pixel values of an image, or the measurements of a house for price prediction). Then we have one or more hidden layers of neurons that process the information. Finally, there's an output layer that produces the final prediction or classification. We call them "hidden" layers because we don't directly see their outputs; they're in-between the input and output. When a network has many layers (e.g., dozens), it's a deep neural network, hence its name, "deep learning".


Each connection between neurons has a weight, which is like a dial we can turn to adjust how much one neuron influences another. During training, the learning algorithm adjusts all these weights little by little to make the overall network better at its task. A network with the right weights can, for example, take the raw pixels of a photo and eventually spit out "This is a cat" or "This is a dog" at the output.
How do all these layers work together? It's useful to think of each layer as extracting something from the data and passing it on. In an image recognition network, the first layer of neurons might look for very simple patterns (like edges or blobs of color). The next layer takes those patterns and looks for combinations that form slightly more complex shapes (maybe corners or circles). Another layer up might combine shapes into familiar object parts (an eye, a wheel), and so on. By the final layers, the network has formed a high-level understanding ("this combination of features looks like an ear, whiskers, and fur—it's likely a cat!"). In essence, as data flows through layers, the network gradually builds an understanding by composing simpler features into complex ones.
Researchers often say a deep network "untangles" or "uncrumples" data; it transforms the raw input step by step into a more meaningful output, much like uncrumpling a balled-up paper to reveal a clear picture.
An example: Suppose we have a neural network to read handwritten digits. The input layer has 784 neurons (for a 28x28 pixel image, each neuron gets one pixel's grayscale value). Hidden Layer 1 might have, say, 128 neurons. Each of those might activate in response to certain simple patterns in the pixels (some neurons will specialize in detecting a vertical line, others a curve, etc.). Hidden Layer 2 might take signals from those 128 and combine them to detect bigger patterns (maybe an "O" shape or a loop). Finally, the output layer has 10 neurons (one for each digit 0-9) and whichever neuron fires the strongest essentially "votes" for that digit as the prediction. When properly trained, one output neuron will reliably fire for "this looks like a 5" while others stay low. That's the network's answer.
Neural networks are powerful because they can approximate very complex relationships in data. However, they are often a "black box," meaning even though we can measure how accurate they are, it's not always obvious why they made a specific decision on a new case. They basically encode knowledge in the weights and structure, which aren't simple rules a human can read off. Despite that, we know how they learn and how to train them, which is our next topic.
Training a neural network is where the magic happens. It's the process that turns a network with random guesses into one that actually works. Let's outline the steps in a simple way:
Gather training data: As mentioned, we need a lot of examples. For instance, to train a network to recognize cats vs. dogs, you'd gather thousands of cat photos and dog photos, and label them accordingly. The network's job will be to take a photo and output either "cat" or "dog". The collection of examples with known answers is called a training set.

Initialize the network: At first, the neural network's weights are set randomly. This means initially the network is like a student guessing answers on a test without any knowledge—its outputs will likely be wrong.
Forward pass: You feed a training example (say, a cat photo) into the network. The data flows through the network's layers, producing an output. Initially, this output might be very off; for example, the network might output 0.4, 40% confidence that the image is a cat, and 0.6, 60% confidence that it's a dog. The network would guess "dog", since it had a higher score.


Calculate error (loss): We then see how far off the prediction was from the truth. Since the correct answer for this image is "cat", we want the network's output for "cat" to be 1 and "dog" to be 0 ideally. The current output (0.4 cat vs 0.6 dog) is quite wrong. We compute a number called a loss or error to quantify this difference. There are various ways to calculate loss, but conceptually it's higher when the prediction is bad and lower when the prediction is good.

Backward pass (adjusting weights): Now the network tries to learn from its mistakes. Using a technique called backpropagation, the error is propagated backwards through the network. Essentially, the network figures out for each neuron and each weight: "If I had tuned this weight a little higher or lower, would the error have been smaller?" It's like figuring out which knobs to turn and in which direction to reduce the mistake. Each weight is then nudged slightly in the direction that reduces the error for this example. This is often done using an algorithm called gradient descent, which finds the steepest direction to move in weight-space to minimize the loss.

Repeat for many examples: One training example isn't enough to learn general patterns. We present the network with another example (say, a dog photo next) and do the same: forward pass to get prediction, calculate error, backward pass to adjust weights. The network will adjust some of the same weights again, maybe in the opposite direction if it made the opposite error on this one. By going through all the examples many times (many epochs of training), the network's weights gradually settle into values that make predictions more and more accurate. It's similar to practicing a skill—repetition on varied examples leads to improvement.

Validation and testing: We also check how the network does on data it hasn't seen during training, called a validation set or a test set. This is important to ensure the network isn't just memorizing answers (overfitting) but actually learning to generalize. If it performs well on new data, then we know it has truly learned the patterns.
Over time and after many adjustments, the network's outputs get closer to what we want. In our example, it will output something like 0.9 cat, 0.1 dog for a cat image, which means it's quite sure it's a cat, and similar confidence for dogs. At that point, we say the model has learned to classify cats vs dogs.
It's amazing that this procedure works, but it has proven effective in countless scenarios. The key is that each neuron's weight is like a little dial we adjust, and we have an objective (minimize the error). With enough data and a well-designed network, these dial adjustments lead to a network that picks up remarkably nuanced patterns.
One important thing to mention is that deep learning models often require a lot of data and computational power to train. Training can take hours, days, or even weeks on powerful computers (often using GPUs which are great for the parallel math operations needed). This is why major advances in deep learning in the 2010s were partly due to the availability of big datasets and stronger computing hardware.
So far, we've talked about AI that learns to recognize patterns and make predictions or classifications. But there's a very exciting, increasingly popular area of AI where it doesn't just recognize things, it creates new things. This is called generative AI.


A transformer is a neural network model that learns to understand context and relationships in sequential data (like sentences) using a mechanism called "attention." It then uses that understanding to generate new data that follows the patterns of the data it was trained on.

Transformers, the powerful architecture behind many modern AI models, come in three main types, each designed for different kinds of tasks.

When translating a sentence from one language to another, we can't just go word-by-word. That's where Encoder-Decoder Transformers come in.

Think about a task like checking the sentiment of a movie review—is it positive or negative? In this case, the model only needs to understand the input, not generate anything new.

Finally, there are models that focus purely on generating text, one word at a time. These are decoder-only Transformers.

Let's break down a few concepts:
Sequential data and context: Language is sequential: the order of words matters to meaning. Transformers use an attention mechanism that allows them to look at the entire sequence at once.
Before a large language model can understand or generate text, the words must first be converted into a form the model can work with—numbers. This is where embeddings come in.

Attention mechanism: This is the heart of the Transformer. At a basic level, attention means the model can focus on different parts of the input when making a decision.

These are transformers that are trained on massive amounts of text—basically all of Wikipedia, books, web articles, etc. Models like GPT-3 (which powers ChatGPT) have billions of parameters.
There are transformer-based models for images and other media too (like vision transformers for image recognition, or transformer-based components in image generators).
You might be wondering, beyond recognizing cats or writing poems, where is AI actually used in the real world? The answer: almost everywhere! Here are some areas and examples:
Phones organize your photos by faces using face recognition. Self-driving cars use AI to perceive the road (recognize pedestrians, other cars, traffic signs).
This involves understanding and generating language. Email services use AI to filter spam. Chatbots and virtual assistants (like Google Assistant, Siri, Alexa) use AI to understand your voice commands and respond.
Netflix suggesting movies, Spotify suggesting songs, Amazon recommending products—these are driven by machine learning models analyzing what you and others like.
AI helps in diagnosing diseases from medical scans (like finding anomalies in X-rays or MRIs), personalizing treatment plans, and even in discovering new drugs by analyzing molecular data.
Banks and credit card companies use AI to detect fraudulent transactions (patterns that look "odd" compared to normal behavior). Stock trading firms use AI algorithms to decide trades in split seconds.
AI is being used by artists and musicians—for instance, generating background music, assisting in graphic design, or even writing screenplays.
In robotics, AI is the "brain" that helps a robot make decisions. For example, a home cleaning robot uses AI to navigate rooms and avoid obstacles.

AI is a booming field, and if this topic fascinates you, you might consider a future career in it. There are a variety of roles in the AI and machine learning industry, each with different focuses.
This is someone who builds and deploys AI models in products. They often take the research that data scientists or researchers have done and implement it into scalable software.
Data scientists are a bit like detectives. They analyze data to find meaningful insights and build predictive models, often to help businesses make decisions.


These are the people pushing the boundaries of what AI can do. They work on developing new algorithms and models—the next transformer, the next breakthrough in reinforcement learning, etc.
Robotics engineers design and build robots that can perform tasks autonomously or semi-autonomously. They combine knowledge of mechanical engineering, electrical engineering, and AI.
Data engineers build and maintain the infrastructure that allows data scientists and ML engineers to work with large datasets efficiently.
NLP (Natural Language Processing) engineers specialize in building systems that can understand, interpret, and generate human language.
Computer vision engineers develop systems that can interpret and understand visual information from the world, such as images and videos.
As AI becomes more prevalent, there's a growing need for experts who can help ensure AI systems are developed and deployed responsibly and ethically.
AI product managers bridge the gap between technical AI capabilities and business needs, helping to define and develop AI-powered products.
Congratulations! You've now taken your first steps into the fascinating world of artificial intelligence. We've covered the fundamental concepts that power modern AI systems, from basic machine learning to advanced neural networks and generative AI.
Remember, AI is not just about technology—it's about solving real-world problems and creating value for people. Whether you're interested in pursuing a career in AI or simply want to understand the technology that's shaping our world, the knowledge you've gained here provides a solid foundation.
The field of AI is constantly evolving, with new breakthroughs happening regularly. Stay curious, keep learning, and don't be afraid to experiment with AI tools and technologies. The future of AI is being written now, and you have the opportunity to be part of it.

Ready to dive deeper?
Get the complete book on Amazon for detailed explanations, more illustrations, and comprehensive coverage of all AI topics.
Get the Complete Book on AmazonWatch my AI tutorials, demonstrations, and insights on various artificial intelligence topics and projects.
Tutorials • AI Demonstrations • Educational Content
Tutorials • AI Demonstrations • Educational Content
Tutorials • AI Demonstrations • Educational Content
Tutorials • AI Demonstrations • Educational Content
Tutorials • AI Demonstrations • Educational Content • Shorts
I'm constantly exploring new AI technologies and working on exciting projects. Check back soon for updates on my latest AI experiments and implementations.