You know you need to use some kind of Machine Learning (ML) to solve one of your business challenges, but just what kind of ML model do you need? There are so many ML models and algorithms to choose from and it’s vital you pick the right model, if you want to get accurate results quickly enough for the business to make use of them.
Here are some of the factors you should consider when deciding which model to go with.
1. What’s your use case?
One factor that affects your choice of model is what kind of output you’re trying to get from your ML application.
- When you want to forecast the future by using the relationships between variables — such as predicting sales of ice-cream based on long-term weather forecasts — you need a regression model.
- When you want to identify rare or unusual data points — spotting odd behaviour that may indicate a credit card transaction is fraudulent, for example — you need a model that provides anomaly detection.
- When you’re trying to separate superficially similar data into groups that have common characteristics — if you want to segment customers so you can target your marketing more effectively, for instance — you need a clustering model.
- When you’re trying to place items in to categories — identifying defective products coming off a production line, for example — you need a classification model.
2. What does your data look like?
Does your data come with clear labels? For example, if you’re trying to identify defective products coming off a production line, you can probably create a library of images of products that would pass quality assurance and products that would fail. If that’s the case, then you can use models that rely on supervised learning, feeding the model during training with data that’s labelled with both inputs and outputs so it can identify how inputs and outputs are connected. Then, when your model is presented with an image of the product that it’s never seen before, it can look at the input and decide if it would pass or fail.
If you don’t have clear output labels for any of your data, but are looking for intrinsic patterns and hidden structures, then you need a model that supports unsupervised learning. For example, a retailer may want to improve the promotions it offers by finding out what accessories customers tend to buy after they’ve made an initial purchase. The retailer can then create product bundles or suggest to new customers that they should add those specific accessories to their order.
3. How much data do you have?
The amount of data you’re working with can have an impact in several ways. The first issue to consider is how much data you have available to train (and test) your model. Some models can produce decent results after being trained with just a small number of samples, while others require a large volume of training data to provide an effective solution.
The second issue is how much data you’ll want to feed into the production version of your model. If you’re going to be processing a lot of data, you may be better off choosing a simpler algorithm and spending more time and data upfront to train it. With most models, you can also make choices about how the model will behave, such as how many iterations it will perform
Finally, you also need to think about the number of variables — or features — associated with each item of data in your data set that you want the model to consider. Some models are better suited to handling data with many features, while others may provide answers eventually but will take much longer to train and will run slowly in production.
4. How accurate do you need the results to be?
The accuracy of a model is its ability to provide the correct output for any given input. Once your model has been trained with your training data, you use a separate set of test data to see if the model gives you the right answer when presented with data it hasn’t seen before.
Sometimes you don’t need pinpoint accuracy, If you’re segmenting customers, you might not be worried if a few edge-case customers end up in the wrong group. If you’re doing quality assurance on physical products that need to be machined to a specific engineering tolerance, then you need to be sure you’re finding all the items that don’t meet that standard.
So you should always consider whether a simpler algorithm can provide results that are good enough for your use case, especially if you can spend more time and data training the model initially. You should also be aware of the risk with more complex algorithms that they will “overfit” the data, finding patterns and linear relationships that don’t actually exist.
5. How quickly do you need to provide the results?
With most ML models, there’s a trade-off between speed and accuracy, so you need to decide which is most important for your specific use case. In general, simpler algorithms will be less accurate but execute more quickly, while complex algorithms will be slower to run but produce more accurate results.
However, the speed and accuracy of a model also depend on parameters that affect how the algorithm behaves, such as error tolerance or the number of iterations performed. By changing the parameters of the model, you can look for the right mix of accuracy and speed. You can also increase the accuracy of your model without compromising production run times by providing more training data and spending more time up front on training.
The importance of choosing the right model is one reason why Google’s Cloud ML tools support a wide range of algorithms out of the box. BigQuery ML lets you pick from — and customise — nearly a dozen models that, between them, cover both supervised and unsupervised learning and can handle regression, clustering, classification and anomaly detection. You can also take advantage of custom ML models already trained for particular tasks by using the APIs for tools such as Google’s Vision AI, Video AI and Cloud Natural Language. And if you’re still unsure which model to use, Cloud AutoML provides an an easy-to-use graphical interface that will automatically identify and leverage the best option for your problem from a number of common models.
Working with our data analytics and AI team
Our Data, Analytics and AI practice brings together a highly committed team of experienced data scientists, mathematicians and engineers. We pride ourselves in collaborating with and empowering client teams to deliver leading-edge data analytics and machine learning solutions on the Google Cloud Platform.
We operate at the edge of modern data warehousing, machine learning and AI, regularly participating in Google Cloud alpha programs to trial new products and features and to future-proof our client solutions.
We have support from an in-house, award winning application development practice to deliver embedded analytics incorporating beautifully designed UIs. We are leaders in geospatial data and one of the first companies globally to achieve the Google Cloud Location-based Services specialisation.
If you'd like to find out more about how we can help you build your own modern data and analytics platform, why not take a look at some of our customer stories or browse our resources. Needless to say, please get in touch with our team if you'd like more practical support and guidance.