Backpropagation is a method used in artificial neural networks to calculate the gradients required for calculating weights to be used in the network. It's usually used to train deep neural networks, a term that refers to a neural network with more than one hidden layer.
Backpropagation is a special case of an older and more common technique called automatic differentiation. In the context of learning, backpropagation is generally used by descent gradient optimization algorithms to adjust the weight of neurons by calculating the gradient of loss function. This technique is also sometimes called setback error reversal , since errors are counted on the output and redistributed through the network layer.
The backpropagation algorithm has been repeatedly rediscovered and is equivalent to automatic differentiation in inverse accumulation mode. Backpropagation requires a derivative of a loss function with respect to the output of the network to be known, which usually (but not necessarily) means that the desired target value is known. For this reason it is considered a supervised learning method, although it is used in some unattended networks such as autoencoders. Backpropagation is also a generalization of the delta rule to a multi-layered feedforward network, made possible by using chain rules to iteratively calculate the gradient for each layer. This is closely related to the Gauss-Newton algorithm, and is part of ongoing research in backpropagation neural. Backpropagation can be used with gradient-based optimizers, such as L-BFGS or Newton truncated.
Video Backpropagation
Motivation
The goal of any supervised learning algorithm is to find the function that most maps a set of inputs to the correct output. An example is a classification assignment, in which the input is an animal image, and the correct output is the animal name.
The motivation for backpropagation is to train a multi-layered neural network in such a way that it can learn the appropriate internal representations to enable it to study the mapping of inputs to outputs arbitrarily.
Maps Backpropagation
Loss function
Sometimes referred to as the function cost function or fault function (not to be confused with the Gauss error function), the loss function is a function that maps the values ââof one or more variables to a real number which intuitively represent some of the "costs" associated with those values. For backpropagation, the loss function calculates the difference between the expected network output and output, after the case has spread through the network.
Assumption
Dua asumsi harus dibuat tentang bentuk fungsi kesalahan. Yang pertama adalah dapat ditulis sebagai rata-rata atas fungsi kesalahan , untuk contoh pelatihan individual, . Alasan untuk asumsi ini adalah bahwa algoritma backpropagation menghitung gradien dari fungsi kesalahan untuk contoh pelatihan tunggal, yang perlu digeneralisasikan ke fungsi kesalahan keseluruhan. Asumsi kedua adalah bahwa hal itu dapat ditulis sebagai fungsi dari output dari jaringan saraf.
Contoh fungsi kerugian
Biarkan menjadi vektor dalam .
Pilih fungsi kesalahan mengukur perbedaan antara dua output. Pilihan standar adalah kuadrat jarak Euclidean antara vektor dan :
Perhatikan bahwa faktor dengan nyaman membatalkan eksponen ketika fungsi kesalahan selanjutnya dibedakan.
Fungsi galat di atas contoh pelatihan dapat dengan mudah ditulis sebagai rata-rata kerugian atas contoh individual:
and therefore, a partial derivative with respect to the output:
src: blog.andplus.com
Optimization
The optimization algorithm repeats a two-phase cycle, propagation and weight propagation. When an input vector is displayed to the network, it is forwarded over the network, layer by layer, until it reaches the output layer. The network output is then compared to the desired output, using the loss function. The resulting error value is calculated for each neuron in the output layer. The error values ââare then propagated from the output back through the network, until each neuron has an associated error value that reflects its contribution to the original output.
Backpropagation uses these error values ââto calculate the gradient of the loss function. In the second stage, the gradient is fed to the optimization method, which in turn uses it to update the weights, in an effort to minimize the loss function.
Algorithm
Biarkan menjadi jaringan saraf dengan koneksi, input, dan output.
Di bawah ini, akan menunjukkan vektor dalam , vektor dalam , dan vektor dalam . Ini disebut input , output dan bobot masing-masing.
Jaringan syaraf sesuai dengan fungsi yang, diberi bobot , memetakan input ke output .
Pengoptimalan dilakukan sebagai masukan urutan contoh pelatihan dan menghasilkan urutan bobot mulai dari beberapa bobot awal , biasanya dipilih secara acak.
Bobot ini dihitung pada gilirannya: pertama menghitung hanya menggunakan untuk . Output dari algoritma ini adalah , memberi kita fungsi baru . Perhitungannya sama di setiap langkah, maka hanya kasus dijelaskan.
Menghitung dari dilakukan dengan mempertimbangkan bobot variabel dan menerapkan gradient descent ke fungsi untuk mencari minimum lokal, dimulai dari .
Ini membuat berat minimum yang ditemukan oleh gradient descent.
src: i.ytimg.com
Algoritma dalam kode
Untuk mengimplementasikan algoritma di atas, rumus eksplisit diperlukan untuk gradien fungsi di mana fungsinya adalah .
Learning algorithms can be divided into two phases: propagation and weight propagation.
Phase 1: propagation
Each propagation involves the following steps:
- Propagation progresses through the network to generate the output value (s)
- Cost calculation (error term)
- Propagation of output activations back through the network using training pattern targets to generate delta (the difference between the targeted and actual output value) of all outputs and hidden neurons.
Phase 2: weight update
For each weight, the following steps should be followed:
- Delta output weight and input activation multiplied to find a heavy gradient.
- The ratio (percentage) of the weight gradient is subtracted from the weight.
This ratio (percentage) affects the speed and quality of learning; it's called the learning level . The larger the ratio, the faster the neurons train, but the lower the ratio, the more accurate the training is. The heavy gradient sign indicates whether the error varies directly with, or inversely proportional to, weight. Therefore, the weight must be renewed in the opposite direction, "down" the gradient.
Repeated learning (on new batches) until the network works properly.
Pseudocode
Here is a pseudocode for the gradient gradient gradient gradient to train a three layer network (only one hidden layer):
initializes network weights (often small random values) à à do Example training forEach named ex Prediction = neural-net-output (network, ex) //forward pass Actual = teacher-output (ex) for all weights from the hidden layer to the output layer //pass backwards for all weights from input layer to hidden layer //backward pass resumed à à à à à à à update network weights //input layer not modified by error estimation à à to all instances are properly classified or other termination criteria met à à roll back the network
Lines labeled "reverse pass" can be implemented using a backpropagation algorithm, which calculates the network fault gradient as to the weights that the network can modify.
src: pbs.twimg.com
Intuition
Learn as an optimization problem
To understand the mathematical derivation of the backpropagation algorithm, it's good to develop some intuition about the relationship between the actual output of the neuron and the correct output for a particular training case. Consider a simple neural network with two input units, one output unit and no hidden unit. Each neuron uses a linear output which is the weighted sum of its inputs.
Awalnya, sebelum latihan, bobot akan diatur secara acak. Kemudian neuron belajar dari contoh-contoh pelatihan, yang dalam hal ini terdiri dari satu set tupel di mana dan adalah input ke jaringan dan t adalah output yang benar (output yang akhirnya dihasilkan jaringan diberi masukan). Jaringan awal, diberikan dan , akan menghitung output y yang kemungkinan berbeda dari t (diberikan bobot acak). Metode umum untuk mengukur ketidaksesuaian antara keluaran yang diharapkan t dan output aktual y adalah ukuran kesalahan kuadrat:
-
where E is mismatch or error.
Sebagai contoh, pertimbangkan jaringan pada satu kasus pelatihan: , sehingga input dan masing-masing 1 dan 1 dan output yang benar, t adalah 0. Sekarang jika output aktual y diplot pada sumbu horizontal terhadap kesalahan E pada sumbu vertikal, hasilnya adalah parabola. Minimum parabola sesuai dengan output y yang meminimalkan kesalahan E . Untuk kasus pelatihan tunggal, minimum juga menyentuh sumbu horizontal, yang berarti kesalahan akan nol dan jaringan dapat menghasilkan output y yang sama persis dengan output yang diharapkan t . Oleh karena itu, masalah pemetaan input ke output dapat direduksi menjadi masalah optimisasi untuk menemukan fungsi yang akan menghasilkan kesalahan minimal.
Namun, output dari neuron bergantung pada jumlah tertimbang dari semua inputnya:
-
di mana dan