Naive Bayes-based Context Extension

论文地址:

整体思路以及计算方式

苏神提出的方法,注意思路基于朴素贝叶斯,问题的描述为根据问题S1,,SnS_1,\ldots, S_n生成TT,即估计p(TS1,S2,,Sn)p\left(T |S_1, S_2, \cdots, S_n\right),根据贝叶斯公式可得:

p(TS1,S2,,Sn)p(S1,S2,,SnT)p(T)p\left(T| S_1, S_2, \cdots, S_n\right) \propto p\left(S_1, S_2, \cdots, S_n | T\right) p(T)

根据朴素贝叶斯假设可得:

p(S1,S2,,SnT)=i=1np(SiT)=i=1np(TSi)p(Si)p(T)p\left(S_1, S_2, \cdots, S_n |T\right) = \prod_{i=1}^n p\left(S_i| T\right) =\prod_{i=1}^n \frac{p(T|S_i)p(S_i)}{p(T)}

即:

p(TS1,S2,,Sn)i=1np(TSi)pn1(T)p\left(T| S_1, S_2, \cdots, S_n\right) \propto \frac{\prod_{i=1}^n p\left(T| S_i\right)}{p^{n-1}(T)}

所以可以根据右式进行采样。转换为对数概率情形可得:

logp(TS1,S2,,Sn)=i=1np(TSi)(n1)p(T)+C=np(TS)(n1)p(T)+Cp(TS)=i=1np(TSi)n\log p\left(T| S_1, S_2, \cdots, S_n\right)= {\sum_{i=1}^n p\left(T| S_i\right)} -(n-1) {p(T)} + C=n \overline{p\left(T| S\right)}-(n-1)p(T)+C \\ \overline{p\left(T| S\right)}=\frac{ {\sum_{i=1}^n p\left(T| S_i\right)}}{n}

然后苏神引入超参数β\beta,上式变为:

logp(TS1,S2,,Sn)=βp(TS)(β1)p(T)\log p\left(T| S_1, S_2, \cdots, S_n\right)=\beta \overline{p\left(T| S\right)}-(\beta-1)p(T)

代码

细节

实现时,将,S1,S2,,Sn\varnothing, S_1, S_2, \ldots, S_n分别作为模型的输入得到n+1n+1个结果,然后基于上述方法进行采样即可。

Last updated