多任务场景下一种新的大语言模型算法
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

国家自然科学基金面上项目(No.12371258);重庆市自然科学基金创新发展联合基金项目(No.CSTB2923NSCQ-LZX0056)


A New Algorithm for Large Language Models in Multitasking Scenarios
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    大型语言模型(large language model(s),LLM(s))在多数自然语言处理任务中表现出了卓越的性能。然而,直接应用通用LLM往往无法满足特定领域的应用需求。为解决此问题,通常需要通过从头开始训练模型或微调通用模型来定制。从头训练能实现高度定制化,确保与需求匹配并保护数据隐私,但存在成本过高且技术难度大的问题;因此现有方法多通过对通用模型进行微调来提升模型性能,但全参量微调会面临GPU内存限制的挑战;现有的参数高效微调技术虽然能够缓解内存限制,但该技术难以同时在多个任务中保持性能,而且在持续微调过程中也可能会出现灾难性遗忘现象。为了解决该问题,提出了一种既能维持多个领域性能又能缓解灾难性遗忘现象的新方法,即基于优化算法的逐层自适应高效合并方法(A layer-wise adaptive and efficient merging method based on black-box optimization,简称LAEM)。该方法以LoRA模块合并的形式进行:首先对多种特定任务中微调过的LoRA模块进行去冗余操作;其次,通过引入共享LoRA模块的思想,并利用逐层自适应加权平均的方法,将去冗余后的不同任务所对应的LoRA模块与共享模块进行合并,LAEM可以根据模型内部不同层的具体表现和对最终结果的贡献,灵活设定权重,从而更精准地融合多个模型的优势,充分释放模型在各层的潜能,达到更佳的整体性能表现。实验结果表明,LAEM不仅使模型具备了多种能力,而且在一定程度上缓解了灾难性遗忘的现象,同时避免了传统方法在模型合并时忽略层间特征差异的问题。

    Abstract:

    Large language models (LLMs) have shown excellent performance in most natural language processing tasks. However, direct application of general-purpose LLMs often does not meet the application needs of a specific domain. To solve this problem, it is often necessary to customize the model by training the model from scratch or fine-tuning the generic model. De novo training can achieve a high degree of customization, ensure matching with requirements and protect data privacy, but there are problems of high cost and technical difficulty, so the existing methods mostly improve the model performance by fine-tuning the general model, but the full-parameter fine-tuning will face the challenge of GPU memory limitation, although the existing parameter efficient fine-tuning technology can alleviate the memory limit, but the technology is difficult to maintain performance in multiple tasks at the same time, and catastrophic forgetting may also occur in the process of continuous fine-tuning. In order to solve this problem, a new method that can maintain the performance of multiple domains and alleviate the catastrophic forgetting phenomenon is proposed, namely the layer-by-layer adaptive weighted merger (LAEM) algorithm. This method is carried out in the form of LoRA module merger: firstly, the fine-tuned LoRA modules in a variety of specific tasks are deredundant. Secondly, by introducing the idea of sharing LoRA modules, and using the layer-by-layer adaptive weighted average method, the LoRA modules corresponding to different tasks after redundancy are merged with the shared modules, and the LAEM algorithm can flexibly set the weights according to the specific performance of different layers within the model and the contribution to the final result, so as to more accurately integrate the advantages of multiple models, fully release the potential of the model in each layer, and achieve better overall performance. Experimental results show that the LAEM algorithm not only enables the model to have a variety of capabilities, but also alleviates the phenomenon of catastrophic forgetting to a certain extent, and avoids the problem that the traditional method ignores the difference of features between layers when merging models.

    参考文献
    相似文献
    引证文献
引用本文

王博文,吴至友,郑显达,高桓.多任务场景下一种新的大语言模型算法[J].重庆师范大学学报自然科学版,2025,42(1):26-35

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-04-07