[2004.09280v3] Towards a theory of machine learningopen searchopen navigation menucontact arXivsubscribe to arXiv mailings

We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function. We argue that the loss function can be imposed either on the boundary (i.e. input and/or output neurons) or in the bulk (i.e. hidden neurons) for both supervised and unsupervised systems. We apply the principle of maximum entropy to derive a canonical ensemble of the state vectors subject to a constraint imposed on the bulk loss function by a Lagrange multiplier (or an inverse temperature parameter). We show that in an equilibrium the canonical partition function must be a product of two factors: a function of the temperature and a function of the bias vector and weight matrix. Consequently, the total Shannon entropy consists of two terms which represent respectively a thermodynamic entropy and a complexity of the neural network. We derive the first and second laws of learning

1 mentions: @q9ac
Date: 2020/07/01 03:51

Referring Tweets

@q9ac 機械学習を統計・熱力学っぽい価値観で捉えて、その学習プロセスで何が起こってるのか考えてみようよ、という話 t.co/MVMML8Is39

Related Entries

Read more [1911.02705v1] Quantum optical levitation of a mirrorcontact arXivarXiv Twitter
0 users, 1 mentions 2020/01/22 11:21
Read more [1108.3896] Localized qubits in curved spacetimescontact arXivarXiv Twitter
0 users, 1 mentions 2020/01/27 17:21
Read more [2003.01612v1] Tunable THz generation and enhanced nonlinear effects with active and passive graphen...
0 users, 1 mentions 2020/03/04 11:21
Read more [2005.05087v1] General relation between spatial coherence and absorptionopen searchopen navigation m...
0 users, 1 mentions 2020/06/01 02:21
Read more [2006.00712v1] Neural ODE and Holographic QCDopen searchopen navigation menucontact arXivsubscribe t...
0 users, 1 mentions 2020/06/29 02:21