V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
daweii
V2EX  ›  机器学习

LLM 模型量化问题 为什么不量化 lm_head ?

  •  
  •   daweii · 207 天前 · 568 次点击
    这是一个创建于 207 天前的主题,其中的信息可能已经有所发展或是发生改变。

    最近在看模型量化的课。

    里面在量化下面这个模型的时候说建议不要量化最后的lm_head

    CodeGenForCausalLM(
      (transformer): CodeGenModel(
        (wte): Embedding(51200, 1024)
        (drop): Dropout(p=0.0, inplace=False)
        (h): ModuleList(
          (0-19): 20 x CodeGenBlock(
            (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (attn): CodeGenAttention(
              (attn_dropout): Dropout(p=0.0, inplace=False)
              (resid_dropout): Dropout(p=0.0, inplace=False)
              (qkv_proj): W8A16LinearLayer()
              (out_proj): W8A16LinearLayer()
            )
            (mlp): CodeGenMLP(
              (fc_in): W8A16LinearLayer()
              (fc_out): W8A16LinearLayer()
              (act): NewGELUActivation()
              (dropout): Dropout(p=0.0, inplace=False)
            )
          )
        )
        (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=1024, out_features=51200, bias=True)
    )
    

    他说的原文如下:

    
    2:14 And as I said we're not going to quantize the language model head
    
    2:18 because since the model is an autoregressive model, it uses 
    
    2:22 the output from the previous iteration to get the output of the next iteration.
    
    2:27 If you quantize the language model head, a lot of errors might 
    
    2:31 might be accumulating over the generation steps.
    
    2:34 And you will most likely end up, having some gibberish after some tokens.
    

    没看懂他说的理由,为什么量化 lm_head 会积累错误?有大佬能简单易懂的解释一下吗?

    课程网页如下: https://learn.deeplearning.ai/courses/quantization-in-depth/lesson/12/quantize-any-open-source-pytorch-model

    目前尚无回复
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2507 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 26ms · UTC 04:40 · PVG 12:40 · LAX 20:40 · JFK 23:40
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.