How Computer Vision Works | Artificial Intelligence and Machine Learning

Name: How Computer Vision Works | Artificial Intelligence and Machine Learning
Uploaded: Feb 2, 2022
Duration: 366 s

Acadaimy13.9K subscribers

8.5K views

Feb 2, 2022

6:06

Have you ever wondered how self driving cars such as Tesla are able to navigate all kinds of roads with such ease, precision, and accuracy, essentially allowing individuals to sit back and relax in a car that has no human human operating it? How is a car able to see the road make sense out of the entities that it's seeing and maneuver accordingly? A car obviously doesn't have eyes like a human does, so how can it see things like pedestrians, stop and go lights, stop signs, road lines, other road signs, etc. Well, it turns out that cars do have eyes, but not in the form that we're familiar with. Computer vision is the field of artificial intelligence that gives machines the ability to see the environment around them. It trains computers to understand and interpret the world. Utilizing digital images from cameras, videos, and machine learning models. Machines are given the power to identify and classify objects, giving them the ability to react to all that they "see" Now, before we get into the nuances of how this fascinating field works, let's travel back in time and observe computer vision in its early days. In the early 1950s, the first experiments for computer vision were conducted. Neural networks were used to detect object edges and interpret simple handwritten text. As big data surged in the 1990s, large sets of images of people and things emerged on the Internet, and machines could identify specified objects in photos and videos. Computer vision has now flourished given the abundance of photos and videos, also known as big data in our society, as well as the advanced hardware, software, and algorithms. So how exactly does computer vision work? Well, think of it like a puzzle. You have with you several different kinds of pieces that all fit together in some way to form a complete image. You look at the edges and the individual elements of each puzzle piece to perceive which components fit together and approximately where they should be placed to create a cohesive whole. This is analogous to the process that a machine, specifically neural networks, go through when trying to understand visual images. They identify edges and borders and try to identify model sub components. Instead of being given a complete image,as humans are usually at top of the puzzle box, they are trained using hundreds of thousands of similar images, and these images are available thanks to the big data that's available in society, as we touched on in the beginning of this video. Now if you'd like to learn more about big data, I have a video that covers the basic components of this concept. And if you'd like to know more about how the machine actually trains its algorithm using the hundreds of thousands of images it receives through machine learning, I'd recommend you check out my video on this subfield of artificial intelligence. Right now, let's take a look at how a specific architecture of neural networks ,called convolutional neural networks, powers computer vision. So we all know that images are made up of big grids of pixels. Each Pixel has a designated color on the red green blue scale where the primary colors are combined in various ways to represent diverse colors. To identify features in images, computer vision considers small patches of pixels through mathematical notation called kernel or filter, which contain values for Pixel wise multiplication. So if you've watched my video on neural networks, you might recall that an artificial neuron, the basic component of neural nets, takes in a series of inputs, multiplies the inputs by specified weights and biases, and uses back propagation to learn from mistakes. These input weights are analogous to kernel values where neural nets learn useful kernels that are able to recognize unique features and images. Convolutional Neural nets utilize a range of preexisting neurons to process each new image. With each layer, the prior image is digested and manipulated by different learned kernels and a new image is output. This output is then processed by the next layer of neurons resulting in repeated convolutions. Connecting this to our puzzle example, for example, the first convolution might discover edges. Then the next layer might convolve on the edge features to detect simple shapes made up of edges or corners. Then the next layer might convolve on the corner features and utilize neurons that can detect simple entities such as noses and mouths. This process repeats and grows in complexity with every passing layer until the machine reaches a layer that can recognize all parts of the image, for example the eyes, nose, ears and mouths and deem the image of face. So in summary, computer vision works by acquiring an image, processing the image and understanding the image. For more videos on your journey towards mastering AI, be sure to Subscribe to youtube.com/acadaimy. Thanks for watching!

Download

0 formats

No download links available.