Artificial Intelligence through Neural Networks
In order to define AI, we must first define the concept of intelligence in general. A paraphrased definition based on Wikipedia is:
Intelligence can be generally described as the ability to perceive information, and retain it as knowledge to be applied towards adaptive behaviors within an environment or context.
AI is used for other recognition tasks (pattern, text, audio, image, video, facial, …), autonomous vehicles, medical diagnoses, gaming, search engines, spam filtering, crime fighting, marketing, robotics, remote sensing, computer vision, transportation, music recognition, classification, and so on.
Biological Neural Networks Overview
The human brain is exceptionally complex and quite literally the most powerful computing machine known.
The inner-workings of the human brain are often modeled around the concept of neurons and the networks of neurons known as biological neural networks. According to Wikipedia, it’s estimated that the human brain contains roughly 100 billion neurons, which are connected along pathways throughout these networks.
In plain english, a single neuron will pass a message to another neuron across this interface if the sum of weighted input signals from one or more neurons (summation) into it is great enough (exceeds a threshold) to cause the message transmission. This is called activation when the threshold is exceeded and the message is passed along to the next neuron.
Artificial neural networks (ANNs) are statistical models directly inspired by, and partially modeled on biological neural networks. They are capable of modeling and processing nonlinear relationships between inputs and outputs in parallel. The related algorithms are part of the broader field of machine learning, and can be used in many applications as discussed.
Artificial neural networks are characterized by containing adaptive weights along paths between neurons that can be tuned by a learning algorithm that learns from observed data in order to improve the model. In addition to the learning algorithm itself, one must choose an appropriate cost function.
The cost function is what’s used to learn the optimal solution to the problem being solved. This involves determining the best values for all of the tunable model parameters, with neuron path adaptive weights being the primary target, along with algorithm tuning parameters such as the learning rate. It’s usually done throughoptimization techniques such as gradient descent or stochastic gradient descent.
These optimization techniques basically try to make the ANN solution be as close as possible to the optimal solution, which when successful means that the ANN is able to solve the intended problem with high performance.
Architecturally, an artificial neural network is modeled using layers of artificial neurons, or computational units able to receive input and apply an activation function along with a threshold to determine if messages are passed along.
Deep learning, while sounding flashy, is really just a term to describe certain types of neural networks and related algorithms that consume often very raw input data. They process this data through many layers of nonlinear transformations of the input data in order to calculate a target output.