Skip to content
$$ \def\bm#1{\boldsymbol{#1}} %%%%% NEW MATH DEFINITIONS %%%%% % % Mark sections of captions for referring to divisions of figures % \newcommand{\figleft}{{\em (Left)}} % \newcommand{\figcenter}{{\em (Center)}} % \newcommand{\figright}{{\em (Right)}} % \newcommand{\figtop}{{\em (Top)}} % \newcommand{\figbottom}{{\em (Bottom)}} % \newcommand{\captiona}{{\em (a)}} % \newcommand{\captionb}{{\em (b)}} % \newcommand{\captionc}{{\em (c)}} % \newcommand{\captiond}{{\em (d)}} % Highlight a newly defined term \newcommand{\newterm}[1]{{\bf #1}} % % Figure reference, lower-case. % \def\figref#1{figure~\ref{#1}} % % Figure reference, capital. For start of sentence % \def\Figref#1{Figure~\ref{#1}} % \def\twofigref#1#2{figures \ref{#1} and \ref{#2}} % \def\quadfigref#1#2#3#4{figures \ref{#1}, \ref{#2}, \ref{#3} and \ref{#4}} % % Section reference, lower-case. % \def\secref#1{section~\ref{#1}} % % Section reference, capital. % \def\Secref#1{Section~\ref{#1}} % % Reference to two sections. % \def\twosecrefs#1#2{sections \ref{#1} and \ref{#2}} % % Reference to three sections. % \def\secrefs#1#2#3{sections \ref{#1}, \ref{#2} and \ref{#3}} % % Reference to an equation, lower-case. % \def\eqref#1{equation~\ref{#1}} % % Reference to an equation, upper case % \def\Eqref#1{Equation~\ref{#1}} % % A raw reference to an equation---avoid using if possible % \def\plaineqref#1{\ref{#1}} % % Reference to a chapter, lower-case. % \def\chapref#1{chapter~\ref{#1}} % % Reference to an equation, upper case. % \def\Chapref#1{Chapter~\ref{#1}} % % Reference to a range of chapters % \def\rangechapref#1#2{chapters\ref{#1}--\ref{#2}} % % Reference to an algorithm, lower-case. % \def\algref#1{algorithm~\ref{#1}} % % Reference to an algorithm, upper case. % \def\Algref#1{Algorithm~\ref{#1}} % \def\twoalgref#1#2{algorithms \ref{#1} and \ref{#2}} % \def\Twoalgref#1#2{Algorithms \ref{#1} and \ref{#2}} % % Reference to a part, lower case % \def\partref#1{part~\ref{#1}} % % Reference to a part, upper case % \def\Partref#1{Part~\ref{#1}} % \def\twopartref#1#2{parts \ref{#1} and \ref{#2}} \def\ceil#1{\lceil #1 \rceil} \def\floor#1{\lfloor #1 \rfloor} \def\1{\bm{1}} \newcommand{\train}{\mathcal{D}} \newcommand{\valid}{\mathcal{D_{\mathrm{valid}}}} \newcommand{\test}{\mathcal{D_{\mathrm{test}}}} \def\eps{{\epsilon}} % Random variables \def\reta{{\textnormal{$\eta$}}} \def\ra{{\textnormal{a}}} \def\rb{{\textnormal{b}}} \def\rc{{\textnormal{c}}} \def\rd{{\textnormal{d}}} \def\re{{\textnormal{e}}} \def\rf{{\textnormal{f}}} \def\rg{{\textnormal{g}}} \def\rh{{\textnormal{h}}} \def\ri{{\textnormal{i}}} \def\rj{{\textnormal{j}}} \def\rk{{\textnormal{k}}} \def\rl{{\textnormal{l}}} % rm is already a command, just don't name any random variables m \def\rn{{\textnormal{n}}} \def\ro{{\textnormal{o}}} \def\rp{{\textnormal{p}}} \def\rq{{\textnormal{q}}} \def\rr{{\textnormal{r}}} \def\rs{{\textnormal{s}}} \def\rt{{\textnormal{t}}} \def\ru{{\textnormal{u}}} \def\rv{{\textnormal{v}}} \def\rw{{\textnormal{w}}} \def\rx{{\textnormal{x}}} \def\ry{{\textnormal{y}}} \def\rz{{\textnormal{z}}} % Random vectors \def\rvepsilon{{\mathbf{\epsilon}}} \def\rvtheta{{\mathbf{\theta}}} \def\rva{{\mathbf{a}}} \def\rvb{{\mathbf{b}}} \def\rvc{{\mathbf{c}}} \def\rvd{{\mathbf{d}}} \def\rve{{\mathbf{e}}} \def\rvf{{\mathbf{f}}} \def\rvg{{\mathbf{g}}} \def\rvh{{\mathbf{h}}} \def\rvi{{\mathbf{i}}} \def\rvj{{\mathbf{j}}} \def\rvk{{\mathbf{k}}} \def\rvl{{\mathbf{l}}} \def\rvm{{\mathbf{m}}} \def\rvn{{\mathbf{n}}} \def\rvo{{\mathbf{o}}} \def\rvp{{\mathbf{p}}} \def\rvq{{\mathbf{q}}} \def\rvr{{\mathbf{r}}} \def\rvs{{\mathbf{s}}} \def\rvt{{\mathbf{t}}} \def\rvu{{\mathbf{u}}} \def\rvv{{\mathbf{v}}} \def\rvw{{\mathbf{w}}} \def\rvx{{\mathbf{x}}} \def\rvy{{\mathbf{y}}} \def\rvz{{\mathbf{z}}} % Elements of random vectors \def\erva{{\textnormal{a}}} \def\ervb{{\textnormal{b}}} \def\ervc{{\textnormal{c}}} \def\ervd{{\textnormal{d}}} \def\erve{{\textnormal{e}}} \def\ervf{{\textnormal{f}}} \def\ervg{{\textnormal{g}}} \def\ervh{{\textnormal{h}}} \def\ervi{{\textnormal{i}}} \def\ervj{{\textnormal{j}}} \def\ervk{{\textnormal{k}}} \def\ervl{{\textnormal{l}}} \def\ervm{{\textnormal{m}}} \def\ervn{{\textnormal{n}}} \def\ervo{{\textnormal{o}}} \def\ervp{{\textnormal{p}}} \def\ervq{{\textnormal{q}}} \def\ervr{{\textnormal{r}}} \def\ervs{{\textnormal{s}}} \def\ervt{{\textnormal{t}}} \def\ervu{{\textnormal{u}}} \def\ervv{{\textnormal{v}}} \def\ervw{{\textnormal{w}}} \def\ervx{{\textnormal{x}}} \def\ervy{{\textnormal{y}}} \def\ervz{{\textnormal{z}}} % Random matrices \def\rmA{{\mathbf{A}}} \def\rmB{{\mathbf{B}}} \def\rmC{{\mathbf{C}}} \def\rmD{{\mathbf{D}}} \def\rmE{{\mathbf{E}}} \def\rmF{{\mathbf{F}}} \def\rmG{{\mathbf{G}}} \def\rmH{{\mathbf{H}}} \def\rmI{{\mathbf{I}}} \def\rmJ{{\mathbf{J}}} \def\rmK{{\mathbf{K}}} \def\rmL{{\mathbf{L}}} \def\rmM{{\mathbf{M}}} \def\rmN{{\mathbf{N}}} \def\rmO{{\mathbf{O}}} \def\rmP{{\mathbf{P}}} \def\rmQ{{\mathbf{Q}}} \def\rmR{{\mathbf{R}}} \def\rmS{{\mathbf{S}}} \def\rmT{{\mathbf{T}}} \def\rmU{{\mathbf{U}}} \def\rmV{{\mathbf{V}}} \def\rmW{{\mathbf{W}}} \def\rmX{{\mathbf{X}}} \def\rmY{{\mathbf{Y}}} \def\rmZ{{\mathbf{Z}}} % Elements of random matrices \def\ermA{{\textnormal{A}}} \def\ermB{{\textnormal{B}}} \def\ermC{{\textnormal{C}}} \def\ermD{{\textnormal{D}}} \def\ermE{{\textnormal{E}}} \def\ermF{{\textnormal{F}}} \def\ermG{{\textnormal{G}}} \def\ermH{{\textnormal{H}}} \def\ermI{{\textnormal{I}}} \def\ermJ{{\textnormal{J}}} \def\ermK{{\textnormal{K}}} \def\ermL{{\textnormal{L}}} \def\ermM{{\textnormal{M}}} \def\ermN{{\textnormal{N}}} \def\ermO{{\textnormal{O}}} \def\ermP{{\textnormal{P}}} \def\ermQ{{\textnormal{Q}}} \def\ermR{{\textnormal{R}}} \def\ermS{{\textnormal{S}}} \def\ermT{{\textnormal{T}}} \def\ermU{{\textnormal{U}}} \def\ermV{{\textnormal{V}}} \def\ermW{{\textnormal{W}}} \def\ermX{{\textnormal{X}}} \def\ermY{{\textnormal{Y}}} \def\ermZ{{\textnormal{Z}}} % Vectors \def\vzero{{\bm{0}}} \def\vone{{\bm{1}}} \def\vmu{{\bm{\mu}}} \def\vtheta{{\bm{\theta}}} \def\va{{\bm{a}}} \def\vb{{\bm{b}}} \def\vc{{\bm{c}}} \def\vd{{\bm{d}}} \def\ve{{\bm{e}}} \def\vf{{\bm{f}}} \def\vg{{\bm{g}}} \def\vh{{\bm{h}}} \def\vi{{\bm{i}}} \def\vj{{\bm{j}}} \def\vk{{\bm{k}}} \def\vl{{\bm{l}}} \def\vm{{\bm{m}}} \def\vn{{\bm{n}}} \def\vo{{\bm{o}}} \def\vp{{\bm{p}}} \def\vq{{\bm{q}}} \def\vr{{\bm{r}}} \def\vs{{\bm{s}}} \def\vt{{\bm{t}}} \def\vu{{\bm{u}}} \def\vv{{\bm{v}}} \def\vw{{\bm{w}}} \def\vx{{\bm{x}}} \def\vy{{\bm{y}}} \def\vz{{\bm{z}}} % Elements of vectors \def\evalpha{{\alpha}} \def\evbeta{{\beta}} \def\evepsilon{{\epsilon}} \def\evlambda{{\lambda}} \def\evomega{{\omega}} \def\evmu{{\mu}} \def\evpsi{{\psi}} \def\evsigma{{\sigma}} \def\evtheta{{\theta}} \def\eva{{a}} \def\evb{{b}} \def\evc{{c}} \def\evd{{d}} \def\eve{{e}} \def\evf{{f}} \def\evg{{g}} \def\evh{{h}} \def\evi{{i}} \def\evj{{j}} \def\evk{{k}} \def\evl{{l}} \def\evm{{m}} \def\evn{{n}} \def\evo{{o}} \def\evp{{p}} \def\evq{{q}} \def\evr{{r}} \def\evs{{s}} \def\evt{{t}} \def\evu{{u}} \def\evv{{v}} \def\evw{{w}} \def\evx{{x}} \def\evy{{y}} \def\evz{{z}} % Matrix \def\mA{{\bm{A}}} \def\mB{{\bm{B}}} \def\mC{{\bm{C}}} \def\mD{{\bm{D}}} \def\mE{{\bm{E}}} \def\mF{{\bm{F}}} \def\mG{{\bm{G}}} \def\mH{{\bm{H}}} \def\mI{{\bm{I}}} \def\mJ{{\bm{J}}} \def\mK{{\bm{K}}} \def\mL{{\bm{L}}} \def\mM{{\bm{M}}} \def\mN{{\bm{N}}} \def\mO{{\bm{O}}} \def\mP{{\bm{P}}} \def\mQ{{\bm{Q}}} \def\mR{{\bm{R}}} \def\mS{{\bm{S}}} \def\mT{{\bm{T}}} \def\mU{{\bm{U}}} \def\mV{{\bm{V}}} \def\mW{{\bm{W}}} \def\mX{{\bm{X}}} \def\mY{{\bm{Y}}} \def\mZ{{\bm{Z}}} \def\mBeta{{\bm{\beta}}} \def\mPhi{{\bm{\Phi}}} \def\mLambda{{\bm{\Lambda}}} \def\mSigma{{\bm{\Sigma}}} % Tensor \newcommand{\tens}[1]{\mathsf{#1}} \def\tA{{\tens{A}}} \def\tB{{\tens{B}}} \def\tC{{\tens{C}}} \def\tD{{\tens{D}}} \def\tE{{\tens{E}}} \def\tF{{\tens{F}}} \def\tG{{\tens{G}}} \def\tH{{\tens{H}}} \def\tI{{\tens{I}}} \def\tJ{{\tens{J}}} \def\tK{{\tens{K}}} \def\tL{{\tens{L}}} \def\tM{{\tens{M}}} \def\tN{{\tens{N}}} \def\tO{{\tens{O}}} \def\tP{{\tens{P}}} \def\tQ{{\tens{Q}}} \def\tR{{\tens{R}}} \def\tS{{\tens{S}}} \def\tT{{\tens{T}}} \def\tU{{\tens{U}}} \def\tV{{\tens{V}}} \def\tW{{\tens{W}}} \def\tX{{\tens{X}}} \def\tY{{\tens{Y}}} \def\tZ{{\tens{Z}}} % Graph \def\gA{{\mathcal{A}}} \def\gB{{\mathcal{B}}} \def\gC{{\mathcal{C}}} \def\gD{{\mathcal{D}}} \def\gE{{\mathcal{E}}} \def\gF{{\mathcal{F}}} \def\gG{{\mathcal{G}}} \def\gH{{\mathcal{H}}} \def\gI{{\mathcal{I}}} \def\gJ{{\mathcal{J}}} \def\gK{{\mathcal{K}}} \def\gL{{\mathcal{L}}} \def\gM{{\mathcal{M}}} \def\gN{{\mathcal{N}}} \def\gO{{\mathcal{O}}} \def\gP{{\mathcal{P}}} \def\gQ{{\mathcal{Q}}} \def\gR{{\mathcal{R}}} \def\gS{{\mathcal{S}}} \def\gT{{\mathcal{T}}} \def\gU{{\mathcal{U}}} \def\gV{{\mathcal{V}}} \def\gW{{\mathcal{W}}} \def\gX{{\mathcal{X}}} \def\gY{{\mathcal{Y}}} \def\gZ{{\mathcal{Z}}} % Sets \def\sA{{\mathbb{A}}} \def\sB{{\mathbb{B}}} \def\sC{{\mathbb{C}}} \def\sD{{\mathbb{D}}} % Don't use a set called E, because this would be the same as our symbol % for expectation. \def\sF{{\mathbb{F}}} \def\sG{{\mathbb{G}}} \def\sH{{\mathbb{H}}} \def\sI{{\mathbb{I}}} \def\sJ{{\mathbb{J}}} \def\sK{{\mathbb{K}}} \def\sL{{\mathbb{L}}} \def\sM{{\mathbb{M}}} \def\sN{{\mathbb{N}}} \def\sO{{\mathbb{O}}} \def\sP{{\mathbb{P}}} \def\sQ{{\mathbb{Q}}} \def\sR{{\mathbb{R}}} \def\sS{{\mathbb{S}}} \def\sT{{\mathbb{T}}} \def\sU{{\mathbb{U}}} \def\sV{{\mathbb{V}}} \def\sW{{\mathbb{W}}} \def\sX{{\mathbb{X}}} \def\sY{{\mathbb{Y}}} \def\sZ{{\mathbb{Z}}} % Entries of a matrix \def\emLambda{{\Lambda}} \def\emA{{A}} \def\emB{{B}} \def\emC{{C}} \def\emD{{D}} \def\emE{{E}} \def\emF{{F}} \def\emG{{G}} \def\emH{{H}} \def\emI{{I}} \def\emJ{{J}} \def\emK{{K}} \def\emL{{L}} \def\emM{{M}} \def\emN{{N}} \def\emO{{O}} \def\emP{{P}} \def\emQ{{Q}} \def\emR{{R}} \def\emS{{S}} \def\emT{{T}} \def\emU{{U}} \def\emV{{V}} \def\emW{{W}} \def\emX{{X}} \def\emY{{Y}} \def\emZ{{Z}} \def\emSigma{{\Sigma}} % entries of a tensor % Same font as tensor, without \bm wrapper \newcommand{\etens}[1]{\mathsfit{#1}} \def\etLambda{{\etens{\Lambda}}} \def\etA{{\etens{A}}} \def\etB{{\etens{B}}} \def\etC{{\etens{C}}} \def\etD{{\etens{D}}} \def\etE{{\etens{E}}} \def\etF{{\etens{F}}} \def\etG{{\etens{G}}} \def\etH{{\etens{H}}} \def\etI{{\etens{I}}} \def\etJ{{\etens{J}}} \def\etK{{\etens{K}}} \def\etL{{\etens{L}}} \def\etM{{\etens{M}}} \def\etN{{\etens{N}}} \def\etO{{\etens{O}}} \def\etP{{\etens{P}}} \def\etQ{{\etens{Q}}} \def\etR{{\etens{R}}} \def\etS{{\etens{S}}} \def\etT{{\etens{T}}} \def\etU{{\etens{U}}} \def\etV{{\etens{V}}} \def\etW{{\etens{W}}} \def\etX{{\etens{X}}} \def\etY{{\etens{Y}}} \def\etZ{{\etens{Z}}} % The true underlying data generating distribution \newcommand{\pdata}{p_{\rm{data}}} % The empirical distribution defined by the training set \newcommand{\ptrain}{\hat{p}_{\rm{data}}} \newcommand{\Ptrain}{\hat{P}_{\rm{data}}} % The model distribution \newcommand{\pmodel}{p_{\rm{model}}} \newcommand{\Pmodel}{P_{\rm{model}}} \newcommand{\ptildemodel}{\tilde{p}_{\rm{model}}} % Stochastic autoencoder distributions \newcommand{\pencode}{p_{\rm{encoder}}} \newcommand{\pdecode}{p_{\rm{decoder}}} \newcommand{\precons}{p_{\rm{reconstruct}}} \newcommand{\laplace}{\mathrm{Laplace}} % Laplace distribution \newcommand{\E}{\mathbb{E}} \newcommand{\Ls}{\mathcal{L}} \newcommand{\R}{\mathbb{R}} \newcommand{\emp}{\tilde{p}} \newcommand{\lr}{\alpha} \newcommand{\reg}{\lambda} \newcommand{\rect}{\mathrm{rectifier}} \newcommand{\softmax}{\mathrm{softmax}} \newcommand{\sigmoid}{\sigma} \newcommand{\softplus}{\zeta} \newcommand{\KL}{D_{\mathrm{KL}}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\standarderror}{\mathrm{SE}} \newcommand{\Cov}{\mathrm{Cov}} % Wolfram Mathworld says $L^2$ is for function spaces and $\ell^2$ is for vectors % But then they seem to use $L^2$ for vectors throughout the site, and so does % wikipedia. \newcommand{\normlzero}{L^0} \newcommand{\normlone}{L^1} \newcommand{\normltwo}{L^2} \newcommand{\normlp}{L^p} \newcommand{\normmax}{L^\infty} \newcommand{\parents}{Pa} % See usage in notation.tex. Chosen to match Daphne's book. \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator{\sign}{sign} \DeclareMathOperator{\Tr}{Tr} \let\ab\allowbreak $$

Notation

This document provides the formal notation used in this site. The main goal is to elegantly differentiate between notational clashes across different math topics. For example, \(X\) in linear algebra is a matrix, while \(X\) in probability theory is a random variable.

We mostly follow the notations in the Deep Learning Book, which is based on this LaTex file. Differences between our notation and theirs will be highlighted in \({\color{blue}\text{blue}}\) for clarity.

This notation elegantly differentiates between scalars \(x\) and scalar random variable \(\rvx\), where the original random variable notation \(X\) is only used with subscripts \(\emX_{i,j}\) to denote an element of matrix \(\mX\).

Numbers and Arrays

Notation Description
\(\displaystyle a\) A scalar (integer or real)
\(\displaystyle \va\) A vector
\(\displaystyle \mA\) A matrix
\(\displaystyle \tA\) A tensor
\(\displaystyle \mI_n\) Identity matrix with \(n\) rows and \(n\) columns
\(\displaystyle \mI\) Identity matrix with dimensionality implied by context
\(\displaystyle \ve^{(i)}\) Standard basis vector \([0,\dots,0,1,0,\dots,0]\) with a 1 at position \(i\)
\(\displaystyle \text{diag}(\va)\) A square, diagonal matrix with diagonal entries given by \(\va\)
\(\displaystyle \ra\) A scalar random variable
\(\displaystyle \rva\) A vector-valued random variable
\(\displaystyle \rmA\) A matrix-valued random variable

Sets and Graphs

Notation Description
\(\displaystyle \sA\) A set
\(\displaystyle \R\) The set of real numbers
\(\displaystyle \{0, 1\}\) The set containing 0 and 1
\(\displaystyle \{0, 1, \dots, n \}\) The set of all integers between \(0\) and \(n\)
\(\displaystyle [a, b]\) The real interval including \(a\) and \(b\)
\(\displaystyle (a, b]\) The real interval excluding \(a\) but including \(b\)
\(\displaystyle \sA \backslash \sB\) Set subtraction, i.e., the set containing the elements of \(\sA\) that are not in \(\sB\)
\(\displaystyle \gG\) A graph
\(\displaystyle \parents_\gG(\ervx_i)\) The parents of \(\ervx_i\) in \(\gG\)

Indexing

Notation Description
\(\displaystyle \eva_i\) Element \(i\) of vector \(\va\), with indexing starting at 1
\(\displaystyle \eva_{-i}\) All elements of vector \(\va\) except for element \(i\)
\(\displaystyle \emA_{i,j}\) Element \(i, j\) of matrix \(\mA\)
\(\displaystyle \mA_{i, :}\) Row \(i\) of matrix \(\mA\)
\(\displaystyle \mA_{:, i}\) Column \(i\) of matrix \(\mA\)
\(\displaystyle \etA_{i, j, k}\) Element \((i, j, k)\) of a 3-D tensor \(\tA\)
\(\displaystyle \tA_{:, :, i}\) 2-D slice of a 3-D tensor
\(\displaystyle \erva_i\) Element \(i\) of the random vector \(\rva\)

Linear Algebra Operations

Notation Description
\(\displaystyle \mA^\top\) Transpose of matrix \(\mA\)
\(\displaystyle \mA^+\) Moore-Penrose pseudoinverse of \(\mA\)
\(\displaystyle \mA \odot \mB\) Element-wise (Hadamard) product of \(\mA\) and \(\mB\)
\(\displaystyle \mathrm{det}(\mA)\) Determinant of \(\mA\)

Calculus

Notation Description
\(\displaystyle\frac{d y} {d x}\) Derivative of \(y\) with respect to \(x\)
\(\displaystyle \frac{\partial y} {\partial x}\) Partial derivative of \(y\) with respect to \(x\)
\(\displaystyle \nabla_\vx y\) Gradient of \(y\) with respect to \(\vx\)
\(\displaystyle \nabla_\mX y\) Matrix derivatives of \(y\) with respect to \(\mX\)
\(\displaystyle \nabla_\tX y\) Tensor containing derivatives of \(y\) with respect to \(\tX\)
\(\displaystyle \frac{\partial f}{\partial \vx}{\color{blue}\text{ or }\mJ(f)(\vx)}\) Jacobian matrix \(\mJ \in \R^{m\times n}\) of \(f: \R^n \rightarrow \R^m\)
\(\displaystyle \nabla_\vx^2 f(\vx)\text{ or }\mH( f)(\vx)\) The Hessian matrix \({\color{blue}\mH \in \R^{n\times n}}\) of \(f{\color{blue}: \R^n \rightarrow \R}\) at input point \(\vx\)
\(\displaystyle \int f(\vx) d\vx\) Definite integral over the entire domain of \(\vx\); clash of notation with indefinite integrals, will strive to avoid this notation
\(\displaystyle \int_\sS f(\vx) d\vx\) Definite integral with respect to \(\vx\) over the set \(\sS\)

Probability and Information Theory

Notation Description
\(\displaystyle \ra \bot \rb\) The random variables \(\ra\) and \(\rb\) are independent
\(\displaystyle \ra \bot \rb \mid \rc\) They are conditionally independent given \(\rc\)
\(\displaystyle P(\ra)\) A probability distribution over a discrete variable
\(\displaystyle p(\ra)\) A probability distribution over a continuous variable, or over a variable whose type has not been specified
\(\displaystyle \ra \sim P\) Random variable \(\ra\) has distribution \(P\)
\(\displaystyle \E_{\rx\sim P} [ f(x) ]\text{ or } \E f(x)\) Expectation of \(f(x)\) with respect to \(P(\rx)\); abuse of notation, will strive to use \(\E_{\rx\sim P} [ f(\rx) ]\) instead
\(\displaystyle \Var(f(x))\) Variance of \(f(x)\) under \(P(\rx)\); abuse of notation, will strive to use \(\Var(f(\rx))\) instead
\(\displaystyle \Cov(f(x),g(x))\) Covariance of \(f(x)\) and \(g(x)\) under \(P(\rx)\); abuse of notation, will strive to use \(\Cov(f(\rx),g(\ry))\) instead
\(\displaystyle H(\rx)\) Shannon entropy of the random variable \(\rx\)
\(\displaystyle \KL ( P \Vert Q )\) Kullback-Leibler divergence of P and Q
\(\displaystyle \mathcal{N} ( \vx ; \vmu , \mSigma)\) Gaussian distribution over \(\vx\) with mean \(\vmu\) and covariance \(\mSigma\)

Functions

Notation Description
\(\displaystyle f: \sA \rightarrow \sB\) The function \(f\) with domain \(\sA\) and range \(\sB\)
\(\displaystyle f \circ g\) Composition of the functions \(f\) and \(g\)
\(\displaystyle f(\vx ; \vtheta)\) A function of \(\vx\) parametrized by \(\vtheta\). (Sometimes we write \(f(\vx)\) and omit the argument \(\vtheta\) to lighten notation)
\(\displaystyle \log x\) Natural logarithm of \(x\)
\(\displaystyle \sigma(x)\) Logistic sigmoid, \(\displaystyle \frac{1} {1 + \exp(-x)}\)
\(\displaystyle \zeta(x)\) Softplus, \(\log(1 + \exp(x))\)
\(\displaystyle \vert\vert \vx \vert\vert_p\) \(\normlp\) norm of \(\vx\)
\(\displaystyle \vert\vert \vx \vert\vert\) \(\normltwo\) norm of \(\vx\)
\(\displaystyle x^+\) Positive part of \(x\), i.e., \(\max(0,x)\)
\(\displaystyle \1_\mathrm{condition}{\color{blue}\text{ or }[\mathrm{condition}]}\) is 1 if the condition is true, 0 otherwise

Sometimes we use a function \(f\) whose argument is a scalar but apply it to a vector, matrix, or tensor: \(f(\vx)\), \(f(\mX)\), or \(f(\tX)\). This denotes the application of \(f\) to the array element-wise. For example, if \(\tC = \sigma(\tX)\), then \(\etC_{i,j,k} = \sigma(\etX_{i,j,k})\) for all valid values of \(i\), \(j\) and \(k\).

Datasets and Distributions

Notation Description
\(\displaystyle \pdata\) The data generating distribution
\(\displaystyle \ptrain\) The empirical distribution defined by the training set
\(\displaystyle \sX\) A set of training examples
\(\displaystyle \vx^{(i)}\) The \(i\)-th example (input) from a dataset
\(\displaystyle y^{(i)}\text{ or }\vy^{(i)}\) The target associated with \(\vx^{(i)}\) for supervised learning
\(\displaystyle \mX\) The \(m \times n\) matrix with input example \(\vx^{(i)}\) in row \(\mX_{i,:}\)

Comments