← All Articles

Bishop PRML - Ch2. Probability Distributions (2)

Posted on

Multinomial Variables

  • one hot encoding

    • x=(0,0,1,0,0,0)T\mathbf{x} = (0,0,1,0,0,0)^\mathsf{T}
    • k=1Kxk=1\sum_{k=1}^Kx_k = 1
    • multinomial label을 표현하는 방법
  • categorical distribution

    • p(xμ)=k=1Kμkxkp(\mathbf{x}{\mid}\boldsymbol{\mu}) = \prod_{k=1}^K\mu_k^{x_k}

      • p(xk=1)=μkp(x_k=1) = \mu_k
      • μ=(μ1,,μK)T\boldsymbol{\mu} = (\mu_1,\cdots,\mu_K)^\mathsf{T}
    • xp(xμ)=k=1Kμk=1\sum_\mathbf{x}p(\mathbf{x}{\mid}\boldsymbol{\mu}) = \sum_{k=1}^K\mu_k = 1
    • E[x]μx=μ\mathbb{E}[\mathbf{x}{\mid}]\boldsymbol{\mu}\mathbf{x} = \boldsymbol{\mu}
    • likelihood

      • p(Dμ)=n=1Nk=1Kμkxnk=k=1Kμknxnk=k=1Kμkmkp(\mathcal{D}{\mid}\boldsymbol{\mu}) = \prod_{n=1}^N\prod_{k=1}^K\mu_k^{x_{nk}} = \prod_{k=1}^K\mu_k^{\sum_nx_{nk}} = \prod_{k=1}^K\mu_k^{m_k}

        • mk=nxnkm_k = \sum_nx_{nk} : sufficient statistic (xk=1x_k = 1인 관측값의 수)
    • MLE

      • Lagrange multiplier

        • L=k=1Kmklnμk+λ(k=1Kμk1)\mathcal{L} = \sum_{k=1}^Km_k\ln\mu_k + \lambda\left(\sum_{k=1}^K\mu_k-1\right)
      • find extrema

        • μk=mk/λ\mu_k = -m_k/\lambda
      • substituting into constraint

        • λ=N\lambda = -N
      • μkML=mkN\mu_k^{ML} = {m_k\over{N}} (N개의 관측값 중 xk=1x_k = 1인 경우의 비율과 동일)
  • multinomial distribution

    • Mult(m1,m2,,mKμ,N)=(Nm1,,mK)k=1Kμkmk\operatorname{Mult}(m_1,m_2,\cdots,m_K{\mid}\boldsymbol{\mu},N) = {N\choose{m_1,\cdots,m_K}}\prod_{k=1}^K\mu_k^{m_k}

      • (Nm1,,mK)=N!m1!mK!{N\choose{m_1,\cdots,m_K}} = {N!\over{m_1! \cdots m_K!}}
      • k=1Kmk=N\sum_{k=1}^Km_k = N
  • The Dirichlet distribution

    • Dir(μα)=Γ(α0)Γ(α1)Γ(αK)k=1Kμkαk1\operatorname{Dir}(\boldsymbol{\mu}{\mid}\boldsymbol{\alpha}) = {\Gamma(\alpha_0)\over{\Gamma(\alpha_1)\cdots\Gamma(\alpha_K)}}\prod_{k=1}^K\mu_k^{\alpha_k-1}

      • α0=k=1Kαk\alpha_0 = \sum_{k=1}^K\alpha_k
      • 0μk10\leq\mu_k\leq 1
      • kμk=1\sum_k\mu_k = 1
      • 잘 보면, 이 Dirichlet distribution도 multinomial의 실수영역에서의 확장임
  • posterior of multinomial likelihood & Dirichlet prior

    • p(μD,α)p(Dμ)p(μα)k=1Kμkαk+mk1p(\boldsymbol{\mu}{\mid}\mathcal{D},\boldsymbol{\alpha})\propto{p(\mathcal{D}{\mid}\boldsymbol{\mu})p(\boldsymbol{\mu}{\mid}\boldsymbol{\alpha})}\propto{\prod_{k=1}^K\mu_k^{\alpha_k+m_k-1}}
    • p(μD,α)=Dir(μα+m)=Γ(α0+N)Γ(α1+m1)Γ(αK+mK)k=1Kμkαk+mk1p(\boldsymbol{\mu}{\mid}\mathcal{D},\boldsymbol{\alpha}) = \operatorname{Dir}(\boldsymbol{\mu}{\mid}\boldsymbol{\alpha}+\mathbf{m}) = {\Gamma(\alpha_0+N)\over{\Gamma(\alpha_1+m_1)\cdots\Gamma(\alpha_K+m_K)}}\prod_{k=1}^K\mu_k^{\alpha_k+m_k-1}

      • m=(m1,,mK)T\mathbf{m} = (m_1,\cdots,m_K)^\mathsf{T}
    • posterior가 Dirichlet임을 확인할 수 있고, Dirichlet prior가 multinomial likelihood의 conjugate prior임을 확인할 수 있음
Machine LearningMLBookBishop PRML