Abstract:Large language models (LLMs) have shown excellent performance in most natural language processing tasks. However, direct application of general-purpose LLMs often does not meet the application needs of a specific domain. To solve this problem, it is often necessary to customize the model by training the model from scratch or fine-tuning the generic model. De novo training can achieve a high degree of customization, ensure matching with requirements and protect data privacy, but there are problems of high cost and technical difficulty, so the existing methods mostly improve the model performance by fine-tuning the general model, but the full-parameter fine-tuning will face the challenge of GPU memory limitation, although the existing parameter efficient fine-tuning technology can alleviate the memory limit, but the technology is difficult to maintain performance in multiple tasks at the same time, and catastrophic forgetting may also occur in the process of continuous fine-tuning. In order to solve this problem, a new method that can maintain the performance of multiple domains and alleviate the catastrophic forgetting phenomenon is proposed, namely the layer-by-layer adaptive weighted merger (LAEM) algorithm. This method is carried out in the form of LoRA module merger: firstly, the fine-tuned LoRA modules in a variety of specific tasks are deredundant. Secondly, by introducing the idea of sharing LoRA modules, and using the layer-by-layer adaptive weighted average method, the LoRA modules corresponding to different tasks after redundancy are merged with the shared modules, and the LAEM algorithm can flexibly set the weights according to the specific performance of different layers within the model and the contribution to the final result, so as to more accurately integrate the advantages of multiple models, fully release the potential of the model in each layer, and achieve better overall performance. Experimental results show that the LAEM algorithm not only enables the model to have a variety of capabilities, but also alleviates the phenomenon of catastrophic forgetting to a certain extent, and avoids the problem that the traditional method ignores the difference of features between layers when merging models.