> For the complete documentation index, see [llms.txt](https://doraemonzzz.gitbook.io/transformer_evolution_paper/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://doraemonzzz.gitbook.io/transformer_evolution_paper/pe/006.md).

# KERPLE Kernelized Relative Positional Embedding for Length Extrapolation

论文地址：

* <https://arxiv.org/abs/2205.09921>

## 整体思路以及计算方式

本文利用PD kernel来构造相对位置编码，得到了非常好的外推效果（训练长度为512，inference长度为1024），定义这里不再复述，理一下论文思路：

* 相对位置编码形式：$$k(m,n)=f(m-n)$$；
* CPD kernel可以描述高维空间中的距离，这一点和相对位置编码很像，但是由于无法表述内积，所以和Attention无法兼容；
* CPD kernel通过平移可以转换为PD Kernel，即对于CPD kernel $$\tilde k$$，存在$$c$$，使得$$c+\tilde k$$为PD kernel，尽管$$c$$无法直接给出，但是由于Softmax的平移不变性，可以在计算的时候再使用；
* 常见的CPD kernel：
  * $$\tilde{k}\left(\mathbf x, \mathbf x^{\prime}\right)=-a\left|\mathbf x-\mathbf x^{\prime}\right|^{p} \text { with } 0\<p \leq 2 \text { and } a>0$$
  * $$\tilde{k}\left(\mathbf x, \mathbf x^{\prime}\right)=-b \cdot \log \left(1+a\left|\mathbf x-\mathbf x^{\prime}\right|^{p}\right) \text { with } 0\<p \leq 2 \text { and } a, b>0$$
* 实际计算公式：
  * $$s\_{m,n}=\mathbf q\_m^{\top} \mathbf k\_n + \tilde k(m, n)$$

## 时间复杂度

不变。

## 训练以及loss

不变。

## 代码

暂无，但是实现起来很简单。

## 实验以及适用场景

适用于所有场景，论文测了LM，结果是外推性非常好。

## 细节

暂无。

## 简评

非常好的想法，将理论和实际结合，这里给出一个小问题：

* 为什么外推性比较好，没有给出理论或者直觉解释；


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doraemonzzz.gitbook.io/transformer_evolution_paper/pe/006.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
