Syen Nik Thu, Mar 5, '20 9 min read

Using machine learning to unlock the potential of computer vision

Computer vision (CV) is the ability for a computer (or machine) to see and interpret real-world environments. While it is still very much in its infancy, more or less limited to object recognition, computer vision or CV algorithm is now a commodity, and organisations across several industries are turning to it to help with operational improvement.

To understand how machine learning can improve the effectiveness and unlock the potential of computer vision, let’s take a quick look at the current landscape of computer vision.

The state of computer vision in 2020

Unlike most ubiquitous technology, computer vision hasn’t become synonymous because it is yet to reach the optimum balance between accuracy, speed, and computational requirements. Take speech recognition software, for example. While it was around for several years, its accuracy was terrible, the speed was slow, and it required too much processing power. It was only in the last five years that Siri, Google Home, and Alexa took off. Voice assistants went from headlining trade shows and featuring only in Powerpoint presentations, to being in people’s pockets, cars, and stereos around the world.

The actual value of computer vision is its ability to provide real-time value. However, the algorithms that produce these real-time insights currently require far too much computational power. A self-driving car using object avoidance software would be useless if it takes one minute (let alone one second) to analyse a situation and respond accordingly. But it’s improving every day.

To start realising the benefits and seeing meaningful value from the technology, we are using a two-pronged approach with machine learning to improve the effectiveness of computer vision.
 

Using machine learning to improve the functionality of computer vision.

Google Lens does an excellent job of giving you relevant search results from what it recognises. Popular instant messaging apps come with text recognition functionality to extract writing from the photos. Apple Photos also does a great job at grouping photos and faces into categories/albums. But we’re still just scratching the surface.

Machine learning algorithms can lift the performance of computer vision apps to a totally different level, particularly when automated. The computer vision functions mentioned above are good at recognising objects but not for processing further details or descriptions about the object, such as telling the difference not just between shorts and skirts, but what kind of shorts or skirts, the colour, the pattern, the brand, and so forth. Once this is achieved, businesses can, for example, use it to come up with the best, personalised clothing choices for the occasion, plus have these garments ready for you to purchase online.

However, people have very high expectations when it comes to technology – chiefly they don’t like waiting or being provided with something they didn’t want. Therefore, training computer vision systems through machine learning is essential for improving both the speed and accuracy of computer vision. It won’t be long before one of the big guns pulls it off (possibly by buying a very successful start-up) until CV will be ubiquitous.

So, this is one of the ways that machine learning algorithms help computer vision algorithms perform faster, more accurately, and see more details. The next method is what’s generated as a result of computer vision.

 

Applying machine learning to the data generated from computer vision.

Machine learning can unlock the potential of computer vision by discovering value and insights from the data created during the computer vision process. Significant ‘secondary’ data is generated as a result of processing and storing what is seen, which typically shows itself as text-based, structured data or traditional databases and tables.

Take a high-tech manufacturer, for example, where cameras observe the safety of heavy machinery operators. Each time a machine gets stuck (using motion detection) or an operator intervenes to fix something (people detection), structured data is stored that tells us, down to the sub-second, when any of these happens.

Computer vision enables you to detect a lack of motion on the machine and send notifications to humans. But with machine learning, the data allows you to identify bottlenecks and suggest improvements by retrospectively visualising how long each interruption was and analysing which part of the processing chain was the least efficient.

Another use of the data is future-looking to avoid a complete breakdown that would cause a major interruption to the business. This is to look at the past pattern of usage, interruption, and breakdowns to predict when maintenance should be scheduled. An example of this is what airlines have for aeroplane services through predictive maintenance.

Another example is using the CCTV store in a tech retail shop (e.g. Noel Leeming). Rather than using invasive face-recognition because of privacy matters, computer vision is better suited to detect people’s behaviour and not who they are. Data is generated based on each step, their dwell time around the store, identifying which products they were interested in, and so forth, again down to sub-second. All this data can be shown on a screen, so salespeople know when and how to provide shoppers with the best customer experience – including answering any questions about products they were looking at but didn’t purchase (cross-sell opportunities). This could extend to being served with recommendations or offers based on people who also had the same instore journey as them.

And finally, one last example from the sporting world. Considerable data is generated in a tennis match, which includes the exact 3D location a ball is hit (where on the court and height), the speed of the ball, where the opponent is, number of shots in a rally, and so forth. Machine learning can come up with the best strategy or combination of shots to put a player in the best chance of winning a rally and therefore, a match. On top of this, other data sets such as temperature, humidity, time of day can be included in computer vision analysis to provide players and coaches with additional competitive advantages. Such analysis is already happening in team sports, so it’s a case of when not if and whether people want to play catch up or lead the field.

Interested in hearing how our machine learning approach to computer vision could benefit your business? We’d love to talk.


Talk to us about making data meaningful