The Ultimate Guide to Understanding
Computer Vision

Introduction to Computer Vision

Computer vision is central to a vast number of the recent advances in consumer, medical and military technology. Technological advances from cars that can drive themselves to algorithms that can recognize skin cancer are largely a result of computer vision.

Simply put, making use of software to analyze the world’s visual content is an enormous revolution in computing.

Computer Vision - AI Technologies

Computer vision algorithms already existed in various forms as early as the 1960s, but they have advanced to far more sophisticated levels in very recent history. And reaching these more sophisticated levels of the technology has been enabled by progressions in machine learning as well as improvements in computing capabilities, data storage, and high-quality input devices. In particular, the integration of machine learning and computer vision has yielded some astonishing results, especially in facial and image recognition.

For instance, Facebook has integrated computer vision, machine learning and their endless data pool of photos to acquire highly-accurate facial recognition results. That’s how Facebook is able to suggest who to tag in your photo.


Computer Vision Definition


Computer
Vision
History


Computer Vision Explained


Computer Vision Categories


How Does Computer Vision Work?


Computer Vision Components


Computer Vision Applications


Computer Vision
Tools

Computer Vision Definition

What is the definition of Computer Vision?

Simply put, computer vision is the process of using a computer or any machine to understand and interpret imagery (both videos and photos). It is the science of giving computers or another machine the ability to see (visualize, interpret and recognize).

Seeing is more than just the process of recording light in a form that can be played back (for example, recording something using a video camera). Vision is also a core component of intelligence. When computers and other machines are said “to see” or to have vision, it implies that they have artificial intelligent systems that can acquire information from images, videos or multi-dimensional data, analyze that information and then make decisions.

Would you like to learn more about commonly used words in Computer Vision Terminology? Visit our Computer Vision Glossary to discover must know industry terms. 

Computer Vision Terminology

What are terminology associated with Computer Vision?

Machine Learning (ML): This is where AI equips a machine to learn and improve its processes automatically without human intervention.

Machine Perception: This is the science community’s term to refer to the capacity of a machine to take in visuals and process those images, much like humans perceive the world around us using our senses.

Image Processing: This is where the computer can develop a description of an image as a product of its own analysis.

Visual Computing: This is the over-arching, generic name for all computer science dealing with images and 3D models, including video and image processing, augmented reality, and more.

Convolutional Neural Networks (CNN): These are the computer’s artificial neurons used for image processing.

Deep Learning: This is a subset of machine learning (which is a subset of AI) where the neural networks are layered to support huge computing power to process data more efficiently, including discoveries in unstructured data.

Pixels: In the context of images and computing, pixels are the smallest unit of a digital picture.

Optical Character Recognition (OCR): This is the digital recognition of printed characters (symbols or language-based) using visual computing and, more specifically, image processing.

Computer Vision History

What is the history of Computer Vision?

The history of computer vision starts as far back as the 1950s. Key developments have taken place each decade since, but it was only in the last 15 years that this technology has acquired a greater footprint in our day-to-day lives with new technologies and devices on the market.

1950s—Engineers and computer scientists succeeded in the development of two-dimensional imaging for recognizing a statistical pattern.

1960s—In 1960, Larry Roberts, a PhD student at MIT, began studying 3D machines. In his thesis, he discussed the possibility of bringing out 3D geometric information from 2D views. Later studies around computer vision in artificial intelligence were based on his work.

1970s—It was realized that it was necessary to tackle objects in the real world in the growing technology of computer vision. In 1978, there was a major breakthrough when David Marr, at the MIT Intelligence Lab, came up with the bottom-up approach to scene understanding. This approach was built on the idea of taking a 2D sketch to come up with a 3D image.

1980s—Optical Recognition Character (OCR) systems were developed. OCR systems were used in various industrial applications such as reading and verifying symbols, letters and numbers. Smart cameras were developed in the late 1980s.

1990s—LED lights were developed. Sensor function and control architecture advanced.

Today—Computer vision has advanced to highly sophisticated levels seen in every-day technology. We have computer vision systems that can understand and analyze imagery, and these systems are incorporated into a growing majority of software, websites and devices.

Computer Vision News

Latest developments in Computer Vision

News

The field of computer vision is continually growing with new technology advancements, software improvements, and products. Staying up to date with the latest computer vision news is important to stay on top of this rapidly growing industry. We cover the latest in artificial intelligence news, chatbot news, computer vision news, machine learning news, natural language processing news, speech recognition news and robotics news.

Computer Vision Explained

What is Computer Vision?

Teaching a computer or any other machine to “see” is no easy task. You may mount a camera on a computer, but that won’t make it see.

For computers to see the real world or even images we input, they rely on computer vision and image recognition.

Computer vision is what enables the barcode scanner to “see” a bunch of stripes in a UPC. And it’s as a result of computer vision that Apple’s Face ID is able to recognize a face staring at its camera. Basically, a computer or any machine uses computer vision to process and understand raw visual input. And more holistically, computer vision involves applying computational techniques to visual models.

Much of the latest advancement leading, most notably, to the heighted accuracy of computer vision is due to a special type of algorithm called Convolutional Neural Networks. This is a combination of different algorithms which work well and have shown to achieve astonishing accuracy on image-related tasks.

Computer Vision Categories

What are categories of Computer Vision?

Image Recognition and Facial Recognition

1. Facial Recognition

Facial recognition is the process of recognizing one or more people in videos or images by interpreting and comparing patterns. Facial recognition algorithms extricate facial features and compare them to a database to find a match.

First, facial recognition uses computer vision to obtain discriminative features from facial images. It then uses machine learning or pattern recognition techniques to classify faces by modelling their appearance.

2. Image recognition

Image recognition is the process of recognizing a feature or an object from a digital image or video. The concept of image recognition is widely used in many applications and device including security surveillance, toll booth monitoring and factory automation.

How Does Computer Vision Work?

Explain how Computer Vision works?

Computer vision mimics how human eyes and brains work to both identify and process images. The processing components of computer vision are:

  • Image acquisition
  • Image processing
  • Image analysis and interpretation

Image Acquisition

This is the process of transforming the analog world into the digital world. The real world is translated into binary data and interpreted as digital images. There are different tools for creating these datasets, including digital compact cameras and DSLR, webcams and embedded cameras, and consumer 3D cameras and DSLR. Usually, the data collected by these devices need to be post-processed so that they can be exploited efficiently in the steps that follow.

Image Processing

Image processing involves the initial, low-level processing of images. Algorithms are used to deduce low-level information on parts of the images from the binary data obtained in the earlier image acquisition. This kind of information is identified by point features, segments or image edges.

Image processing involves advanced techniques and applied mathematics algorithms, including the following steps:

1. Edge Detection

In image processing, the edge detection technique is used to identify the boundaries of objects in an image. An edge is a curve that takes a quick path in image intensity. Edges are usually identified with boundaries of objects in an image. Finding edges assists not only in detecting images but also in correctly interpreting more complex situations where objects may be overlapping. Edge detection methods include canny, Roberts and Fuzzy logic methods.

2. Segmentation

This is the process of splitting an image into several parts. Segmentation is used to recognize objects or any other relevant information in digital images. An image is partitioned into distinct regions, each containing pixels with similar characters. This segmentation builds on low-level processing in order to transform the image into one or more high-level images that the computer can further analyze. Segmentation methods include:

  • Thresholding methods
  • Color-based segmentation
  • Transform methods
  • Texture method

3. Classification

Image classification is the process of allocating measurable space over to pixels in a digital image. There are two major techniques for classifying images:

  • Supervised method
  • Unsupervised method
a. Supervised image classification

In supervised image classification, information classes are gathered from the image. These are referred to as training sites. The image processing software then uses the training sites and applies them to the whole image.

Supervised image classification uses the spectral signatures identified from the training sites to classify the image. An image is classified according to what it resembles most in the training set. Basically, supervised image classification involves the following three steps:

i. Selecting training sites
ii. Generating a signature file
iii. Classifying the image
b. Unsupervised Classification

Unsupervised image classification involves analyzing a group of pixels and categorizing them according to the computerized groupings in their image values. As compared with supervised image classification, unsupervised method does not need analyst intervention (i.e., user intervention). The basic logic in unsupervised image recognition is that values within a specific cover type should behave with similar gray levels, whereas data in different classes should have different gray levels.

Unsupervised image classification steps are;
i. Generating clusters
ii. Assigning classes

Other image classifications include:

  • Object-oriented image classification
  • Parallelepiped classification
  • Maximum likelihood classification
  • Minimum distance classification

4. Detection and Matching of Features

The process detecting and matching of features is divided into three steps:

i. Detection: Interesting or easily-matched feature points from each image are identified.

ii. Description: The local appearance of every feature point is described in a way that is invariant under changes in scale, translation, in-plane rotation and illumination. We end up with a descriptor vector for every feature point.

iii.  Matching: To identify similar features, the descriptors are compared across all images.

Image Analysis and Interpretation

After all that processing and work, image analysis and interpretation is the last step in computer vision. It involves analyzing the data from the previous steps to make decisions accordingly, for example in new drone technology or even the suggestions from Facebook of which friends you might want to tag in a newly uploaded image. This final step of image analysis and understanding applies high-level algorithms.

Computer Vision Companies

Discover innovative Computer Vision startups and companies

AI Technologies Companies

It takes bold visionaries and risk-takers to build future technologies into realities. In the field of computer vision, there are many companies across the globe working on this mission. Our mega list of artificial intelligence, chatbot, machine learning, natural language processing, and computer vision companies, covers the top companies and startups who are innovating in this space.

Computer Vision Key Components

What are components of Computer Vision?

The success of computer vision depends on a few key components. These include:

  • Lightning
  • Lenses
  • Vision processing
  • An image sensor
  • Communications

Lighting

Lighting is fundamental to the success of a computer vision system. To create images, computer vision systems analyze the light reflected from an object, not by analyzing the objects themselves. And lighting illuminates the object to be analyzed by exposing its features to the camera.

Specific lighting techniques can enhance some features of an object while also negating others, for example by outlining a part that conceals surface details to enable its edges to be measured.

Common lighting methods include:

1. Backlighting

Backlighting is used where edge or external measurements are required. It enhances the outline of the object. It also helps in detecting shapes and to make dimensional measurements more accurate.

2. Axial diffuse lighting

Axial diffuse lightning involves lighting the optical path from the side.

3. Structured lighting

Structured lighting involves projecting a light pattern, plane, grid or more complex shape onto an object at a known angle. Structured lighting assists in calculating volume, obtaining dimensional information, and giving contrast-independent surface inspections.

Other lighting methods include dark-field illumination, bright-field illumination, strobe lighting and diffuse dome lighting.

Lenses

The function of the lens is to capture the image and present it to the image sensor. The quality and the resolution of the image captured depends on the optical quality of the lens used.

Image Sensor

The function of an image sensor in a computer vision system is to convert the captured light into a digital image.

Vision Processing

Vision processing is where a computer converts an image into a description in order to make a decision. This process may take place internally in a standalone computer vision system or externally in a PC-based vision system.

Communications

In computer vision, “communications” refers to the pathways of information traveling from one component of the machine or computer to another part, or even from the interpretation of natural language user input for the vision system to learn from new data or reinforcement.

Generally, computer vision system use “off-the-shelter” communication components. This means they must be easily and quickly linked to other machine elements to facilitate image processing on a near-instantaneous basis. And the linking of these components can either be done through a discrete I/O signal or through data sent over a serial connection logging information. However, some computer vision systems use a high-level protocol like Ethernet/IP.

Computer Vision Applications

What are applications of Computer Vision?

Medical industry

One of the major applications of computer vision is in the medical industry—i.e., medical computer vision or medical imaging. This is used to obtain information from the human body for the purposes of diagnosing, treating or monitoring medical conditions.

Image data from a patient is obtained in the form of x-ray, ultrasonic, microscopy, angiography and tomography images. And data gathered from these images can be used in detecting arteriosclerosis and tumors, among many other ailments and abnormalities. Computer vision also aids in the field of medical research.

Military

There are a number of computer vision applications used today in military operations. First is enemy detection. A second application is in missile guidance. Sophisticated systems used in missile guidance can actually send a missile to a place rather than to a specific target—the target is located (through computer vision systems) once the missile is in the area.

Autonomous vehicles

This is one of the latest applications of computer vision. Autonomous vehicles are more than just self-driving cars, too—they include other land-based vehicles (from trucks to small robots with wheels) as well as aerial vehicles. And vehicles may be fully autonomous (with no driver), or partly autonomous (where the computer vision system supports the pilot or the driver in certain situations).

Other new applications of computer vision in today’s technology include:

Computer Vision Tools

What are Computer Vision tools?

These are tools, platforms or resources that assist in or make the process of programming for computer vision easier.

These are several of the major tools in computer vision:

OpenCV

OpenCV is an open source library for computer vision. It has Java, C++ and Python interfaces. It also supports Android, Windows, Mac OS, Linux, and iOS.

SimpleCV

This is also an open source framework for creating computer vision applications. Through SimpleCV you can access other high-source libraries such as OpenCV without having to first learn about file formats, buffer management, color spaces or bit depths.

SciPy

SciPy is a Python open source library that is used in technical computing and scientific computing.

MeshLab

An open source library, this database is helpful for processing and editing 3D triangular meshes.

DLib

DLib is a modern toolkit for C++ that contains tools and algorithms for building complex software to solve real-world problems.

For the past few years, computer vision has caught the attention of participants all over different industries due to its multiple applications.

By 2022, the computer vision software and hardware market is projected to reach $48.6 billion. Device manufacturers, software developers, and component and semiconductor companies can consider investing in computer vision to capitalize on future opportunities.

There also exist numerous research questions in computer vision that have not been addressed. There is a great deal of room for researchers, programmers and software developers to conduct further studies in computer vision.

This technology will only become more relevant in coming years.