Softmax function

MARE's Computer Vision Study.: Deep learning study - logistic ...

src: 1.bp.blogspot.com

In probability theory, the output of the softmax function can be used to represent the category distribution - that is, the probability distributions above K are likely different results. In fact, it is a gradient-log-normalizer of the categorical probability distribution. The softmax function is also a LogSumExp function gradient.

Softmax functions are used in various multiclass classification methods, such as multinomial logistic regression (also known as softmax regression) [1], multicell linear discriminant analysis, Bayes naive classifiers, and artificial neural networks. Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of different K linear functions, and the predicted probability for the j 'th class given the vector examples of x and the weighting vector w is:

{\ displaystyle P (y = j \ mid \ mathbf {x}) = {\ frac {e ^ {\ mathbf {x} ^ {\ mathsf { T}} \ mathbf {w} _ {j}}} {\ sum_ {k = 1} ^ {K} e ^ {\ mathbf {x} ^ {\ mathsf {T}} \ mathbf {w} _ {k }}}}}

Video Softmax function

Contoh

If we take inputs [1, 2, 3, 4, 1, 2, 3], the softmax is [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]. The output has most of the weight where '4' is in the original input. This is a function normally used to: to highlight the largest value and to press a value that is significantly below the maximum value. But note: softmax is not an invariant scale, so if the input is [0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3] (summed to 1.6) then softmax would be [0.125, 0.138, 0.153, 0.169, 0.125, 0.138, 0.153]. This shows that for the value between 0 and 1 softmax, in fact, it does not emphasize the maximum value (note that 0.169 is not just less than 0.475, it is also less than the initial value of 0.4).

Computing this example uses simple Python code:

Another example with python using Numpy:

Here's an example of Julia's code:

Maps Softmax function

Neural Network

The softmax function is often used in the final layer of nerve-based network classification. Such networks are generally trained under a log loss regime (or cross-entropy), providing a non-linear variation of multinomial logistic regression.

Karena fungsi memetakan vektor dan indeks spesifik i ke nilai nyata, turunan perlu memperhitungkan indeks:

{\ displaystyle {\ frac {\ parsial} {\ parsial q_ {k}}} \ sigma ({\ textbf {q}}, i) = \ cdots = \ sigma ({\ textbf {q}}, i) (\ ik} - \ sigma ({\ textbf {q}}, k))} Ã‚ Ã‚

Here, the Kronecker delta is used for simplicity (see Derivation of sigmoid function, expressed through the function itself).

See Multinomial logit for the probability model using the softmax activation function.

CS 188: Artificial Intelligence Learning II: Linear Classification ...

src: images.slideplayer.com

Reinforcement learning

Di bidang pembelajaran penguatan, fungsi softmax dapat digunakan untuk mengkonversi nilai ke dalam probabilitas aksi. Fungi yang umum digunakan adalah:

{\ displaystyle P_ {t} (a) = {\ frac {\ exp (q_ {t} (a)/\ tau)} {\ jumlah_ { i = 1} ^ {n} \ exp (q_ {t} (i)/\ tau)}} {\ text {,}}} Ã‚ Ã‚

di mana nilai tindakan ${\ displaystyle q_ {t} (a)} Ã‚ Ã‚$ sesuai dengan hadiah yang diharapkan dari tindakan berikut dan ${\ displaystyle \ tau} Ã‚ Ã‚$ disebut parameter suhu (dalam referensi untuk mekanika statistik). Untuk suhu tinggi ( ${\ displaystyle \ tau \ to \ infty} Ã‚ Ã‚$ ), semi tindakan memiliki probabilitas yang hampir sama dan semakin rendah suhunya, semakin banyak penghargaan yang diharapkan mempengaruhi probabilitas. Untuk suhu rendah ( ${\ displaystyle \ tau \ to 0 ^ {}} Ã‚ Ã‚$ ), kemungkinan tindakan dengan hadiah tertinggi yang diharapkan cenderung 1.

src: i.ytimg.com

Normalisasi Softmax

Sigmoidal or Softmax Normalization is a way of reducing the effect of extreme values â€‹â€‹or outliers in the data without deleting it from the data set. This is useful given the outlier data, which we want to include in the dataset while still maintaining the significance of the data in the standard deviation of the mean. Data is non-linearly transformed using one of the sigmoidal functions.

Fungi sigmoid logistics:

{\ displaystyle x_ {i} '\ equiv {\ frac {1} {1 e ^ {- (x_ {i} - \ mu_ {i})/\ sigma i}}}}} Ã‚ Ã‚

Fungi tangen hiperbolik, tanh:

{\ displaystyle x_ {i} '\ equiv {\ frac {1-e ^ {- (x_ {i} - \ mu i)/\ sigma {i}}} {1 e ^ {- (x_ {i} - \ mu i)/\ sigma i}}}} Ã‚ Ã‚

The sigmoid function limits the range of normalized data to values â€‹â€‹between 0 and 1. The sigmoid function is almost linear near the mean and has a fine nonlinearity at both extremes, ensuring that all data points are within the limited range. It retains the resolution of most values â€‹â€‹in the standard deviation of the mean.

The function of the hyperbolic tangent, tanh, limits the normalized data range to a value between -1 and 1. The hyperbolic tangent function is almost linear near the mean, but has a half-slope of the sigmoid function. Like a sigmoid, it has a subtle monotonic nonlinearity at both extremes. Also, like the sigmoid function, it remains differentiated everywhere and the derivative (slant) is not affected by normalization. This ensures that optimizations and numerical integration algorithms can continue to depend on the derivative to estimate the change in the output (normalization value) that will be generated by input changes in the region near each point of linearization.

lstm - Loss functions that act on real-valued output vectors (and ...

src: i.stack.imgur.com

Relationship with Boltzmann distribution

Fungsi softmax juga merupakan probabilitas sebuah atom yang ditemukan dalam keadaan energi kuantum ${\ displaystyle \ varepsilon _ {i}} Ã‚Â Ã‚Â$ ketika atom adalah bagian dari ensemble yang telah mencapai kesetimbangan termal pada suhu ${\ displaystyle T} Ã‚Â Ã‚Â$ . Ini dikenal sebagai distribusi Boltzmann. Penghuni relatif yang diharapkan dari masing-masing negara adalah ${\ displaystyle e ^ {- \ varepsilon _ {i}/k_ {B} T}} Ã‚Â Ã‚Â$ , dan ini dinormalisasi sehingga jumlah di atas tingkat energi dijumlahkan menjadi 1. Dalam analogi ini, input ke fungsi softmax adalah energi negatif dari setiap status kuantum dibagi dengan ${\ displaystyle k_ {B} T} Ã‚Â Ã‚Â$ .