Artificial Intelligence/sigmoid function
Expert: Saurabh Kudesia - 12/10/2005
QuestionI have read that the sigmoid function is used in neural networks because with this function we can obtain non-linearity.
My question has two legs.
1.Why do we need non-linearity in neural networks
and
2.How the sigmoid function provides us with non-linearity.Give an example if you can.
Thank you.
AnswerHi Dimitris,
A sigmoid function is a mathematical function that produces a sigmoid curve — a curve having an "S" shape. Often, sigmoid function refers to the special case of the logistic function. Besides the logistic function, sigmoid functions include the ordinary arc-tangent, the hyperbolic tangent, and the error function. The integral of any smooth, positive, "bump-shaped" function will be sigmoidal, thus the cumulative distribution functions for many common probability distributions are sigmoidal.
Sigmoid functions are often used in neural networks to introduce nonlinearity in the model and/or to make sure that certain signals remain within a specified range. A popular neural net element computes a linear combination of its input signals, and applies a bounded sigmoid function to the result; this model can be seen as a "smoothed" variant of the classical threshold neuron. In simple terms as gain increases, slope of activation function of neurons decreases.
A reason for its popularity in neural networks is because the sigmoid function satisfies this property:
d/dt sig(t) = sig (t) (1- sig(t))
This simple polynomial relationship between the derivative and itself is computationally easy to perform.
Inputs entering a neuron not only get multiplied by weights, they also get multiplied by the neurons characteristic equation, or transfer function. The sigmoid function is a typical neuronal non-linear transfer function that helps make outputs reachable. The non-linearity is significant. If the transfer function were linear, each of the neuronal inputs would get multiplied by the same proportion during training. This could cause the entire system to "drift" during training runs. That is, the system may lose outputs it has already tracked while attempting to track new outputs. A non-linearity in the system helps to isolate specific input pathways. (Anderson, 1995 p 413) (Nelson, 1990 p.108)
Regards
Saurabh