Skip to main content

A quick guide to Supervised Machine Learning

Scott Hutchins

In Five key factors blocking widespread AI implementation in organizations, we shared the highlights of our Service Management Unlocked (SMU) webinar. This article is part of a series outlining various terms and concepts associated with artificial intelligence (AI) that were referenced during the webinar and summary blog post summary.

———

Everyone is talking about AI. 

From new Chinese-originated models moving the stock market by hundreds of billions of dollars per day to novel hacking attempts using the AI-generated voice of family members to every software product weaving in AI to automate and improve experiences. 

Even talking about AI can be overwhelming. It’s easy to get lost, and all-to-common to just throw your hands in the air and give up. So. Much. Noise.

At Xurrent, we don’t want to add more noise: our AI products are authentic, creating real impact. They are sophisticated yet easy to understand. 

Before we dive into the main guts of this article, Supervised Machine Learning, let’s level set on … Machine Learning.

What is Machine Learning (and why should I care?)?

Machine Learning is a branch of artificial intelligence that focuses on building systems that can learn and improve from experience without being explicitly programmed for every scenario.

Said another way, machine learning teaches computers to recognize patterns in data and use those patterns to make decisions or predictions.

A key strength of machine learning is its power to handle complex patterns that would be impossible to program manually, as well as its capacity to adapt to new data and situations over time.

Real-world example: Instead of giving children strict rules for every situation, parents and caregivers often help them learn general principles that can be applied to novel situations.

There are two main types of Machine Learning:

1. Supervised
2. Unsupervised

These are the building blocks for more sophisticated AI systems like generative AI (GenAI), which merge pre-trained models with unsupervised clustering.

Let’s look a bit more closely at Supervised Machine Learning.

What is Supervised Machine Learning?

Supervised Machine Learning is a type of artificial intelligence (AI) — and a subset of Machine Learning — where we teach computers to learn from labeled examples.

The “supervised” part comes from the fact that we provide both the input data and the correct output during training.

In many ways, this is similar to how a student learns from solved problems (we “provide both the input data and the correct output during training”).

Here’s how it works (with an example below each step from home price prediction: Think Zillow’s Zesimate feature*):

First, start with a dataset where you know both the input features and their corresponding correct outputs (labels).

Gather data on recently sold houses, including features like square footage, number of bedrooms, location, and year built (inputs), along with their actual selling prices (labels).

Then, the machine learning algorithm studies these examples (like a student would) to find patterns between the inputs and their labels. The algorithm tries explicitly to minimize the difference between its predictions and the actual labels through a process called optimization.

The algorithm learns which features most strongly influence house prices.

Next, it then creates a mathematical model based on the patterns found. This model is essentially a function that maps inputs to outputs, defined by parameters that are adjusted during training.

The algorithm creates a formula that weighs each feature’s importance.

Finally, it uses this model to make predictions on new, never-seen-before data.

The model estimates a reasonable price based on the house’s features, even though it’s never seen this particular house before.

*This is for example purposes only. We do not know precisely how Zillow’s Zestimate model works.

In order for a Supervised Machine Learning model to be effective, the input data must be high quality. Equally as important, the algorithm has to be the “right one” to solve the specific problem. The model should be validated by testing it on a separate dataset to ensure it generalizes well and isn’t simply memorizing the training data.

In the case of Supervised Machine Learning, when you select an algorithm, tuning, and input data, the artifact you get is a model. So you train to get a model.  And then, you deploy that model to generate inferences in real-time.

Important note about algorithms: If you find yourself in a conversation with a data scientist, it’s quite likely they’ll delve into the numerous different types of algorithms used in Supervised Machine Learning, each of which might be best for handling different use cases or data sets. You may even hear terms like genetic algorithms, random forests, support vector machines, or encoders.

But at the end of the day … the best models are those that most accurately predict outputs for all new, unseen data.

Real-world examples of Supervised Machine Learning include:

  • Home price predictions — see above
  • Email spam detection
  • Image recognition (identifying objects in photos)
  • Medical diagnosis (predicting diseases from patient data)
  • Credit card fraud detection
  • Weather forecasting
  • Voice recognition — see below

Going a bit deeper into voice recognition …

Input = Audio waveforms of spoken words

Output (Labels) = Text transcriptions of what was said

The best voice recognition models learn to map sound patterns to specific words and phrases, ideally seamlessly handling variations in accents, speaking speeds, and background noise. 

Use cases: Virtual assistants, transcription services, and accessibility tools

And that’s just the tip of the Supervised Machine Learning iceberg:

How Xurrent uses Supervised Machine Learning

Xurrent purposefully avoided using Supervised Machine Learning in its initial AI releases. Now that you’ve read the above, we hope you can see why.

If Xurrent led our AI efforts using Supervised Machine Learning, that would have meant training models based on customer data. This, in turn, would require conversations about: 

  • What data can be used in model training
  • What level of permission would data scientists have to see raw data
  • What anonymization schemes should be deployed

… and many other conversations.

Our goal was to prove the value of AI to our customers before having these often data-sensitive discussions. 

So, we took a different approach. 

We simplified our customers’ workload by using a technique called Retrieval Augmented Generation (RAG). RAG leverages a large language model (LLM) that was trained (at great expense) by Anthropic, a company that trains LLMs with massive public data sets but not Xurrent customer data. The pre-trained Anthropic models have no knowledge of customer intricacies, so when asking the LLM questions (aka, prompt engineering), we share relevant details in the prompt.

We’ll do this because a custom-trained Supervised Machine Learning model is lightning-fast and really cheap to run. This benefits Xurrent customers by keeping costs low and performance high. With a generic LLM like ChatGPT, you are “paying” for a lot of irrelevant knowledge and potential processing when using it to perform

Stay tuned for more on the when, where, and how of Supervised Machine Learning at Xurrent.