Unsloth揭秘:如何將模型微調效率提升2-5倍 精華
一、Unsloth 簡介
Unsloth 是一個專門為模型微調而設計的框架,它旨在解決模型微調過程中常見的訓練速度慢、顯存占用高等問題。通過一系列創新的技術和優化策略,Unsloth 能夠顯著提高模型微調的效率,使得開發者能夠在更短的時間內獲得更好的模型性能。
二、Unsloth 的主要優勢
1. 快速的訓練速度
在對主流模型(如 llama - 3、qwen2、mistral 等)進行微調時,Unsloth 展現出了令人矚目的訓練速度提升。相比其他傳統的微調方法,它的速度可以提高 2 至 5 倍。這意味著開發者能夠更快地完成模型的訓練過程,大大縮短了開發周期。例如,在處理大規模文本數據時,Unsloth 能夠迅速收斂,減少了訓練時間,讓開發者能夠更快地看到模型的效果。
2. 低顯存占用
顯存占用是模型微調過程中一個關鍵的問題,尤其是對于一些資源有限的設備。Unsloth 巧妙地解決了這個問題,它最大能夠減少約 70%的顯存使用量。這使得即使在顯存有限的硬件上,如一些中低端的 GPU 設備,也能夠順利進行模型微調訓練。這一優勢為更多開發者提供了機會,讓他們能夠在不同的硬件環境下開展工作,而不必擔心硬件資源的限制。
三、Unsloth 的技術特點
1. 強大的兼容性
Unsloth 支持多種硬件設置,涵蓋了從 Nvidia Tesla T4 到 H100 等不同型號的 GPU。不僅如此,它還擴展到了 AMD 和英特爾 GPU 的兼容性,這為使用不同硬件的開發者提供了極大的便利。無論你使用的是哪種 GPU 設備,都可以嘗試使用 Unsloth 進行模型微調。這種廣泛的兼容性使得 Unsloth 能夠在不同的硬件平臺上發揮出其優勢,為開發者提供了更多的選擇。
2. 優化的內存使用
Unsloth 采用了智能權重上投等開創性技術,在 QLoRA 過程中減少了權重上投的必要性,從而有效地優化了內存使用。通過這種方式,它能夠更好地利用硬件資源,提高模型訓練的效率。此外,Unsloth 還能夠迅速利用 BFloat16,提高 16 位訓練的穩定性,進一步加快了 QLoRA 的微調過程。這種對內存和計算資源的精細管理,使得 Unsloth 在處理大規模模型和數據時表現出色。
四、Unsloth 的使用體驗
1.安裝 Unsloth
安裝 Unsloth 相對簡單,你可以通過以下命令進行安裝:`pip install "unsloth(cu121 - torch230)@git + https://github.com/unslothai/unsloth.git"`。當然,具體的安裝命令可能會因環境和需求的不同而有所差異。在安裝過程中,建議參考官方文檔,以確保安裝的順利進行。
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
執行如下:
2.鏡像設置
由于網絡原因,可能無法訪問huggingface上的資源,可以使用國內的鏡像站。???https://hf-mirror.com??
1)安裝依賴
pip install -U huggingface_hub
2)設置環境變量
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
3.模型加載
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 2x faster
"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
"unsloth/Meta-Llama-3.1-70B-bnb-4bit",
"unsloth/Meta-Llama-3.1-405B-bnb-4bit", # 4bit for 405b!
"unsloth/Mistral-Small-Instruct-2409", # Mistral 22b 2x faster!
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/Phi-3.5-mini-instruct", # Phi-3.5 2x faster!
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/gemma-2-9b-bnb-4bit",
"unsloth/gemma-2-27b-bnb-4bit", # Gemma 2x faster!
"unsloth/Llama-3.2-1B-bnb-4bit", # NEW! Llama 3.2 models
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
"unsloth/Llama-3.2-3B-bnb-4bit",
"unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
加載如下:
4.LoRA 配置
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
5.數據集準備
使用 Maxime Labonne 的 ShareGPT 風格的 FineTome-100k 數據集。
??https://huggingface.co/datasets/mlabonne/FineTome-100k??
將 ("from", "value")格式,替換為("role", "content") 格式
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3.1",
)
def formatting_prompts_func(examples):
convos = examples["conversations"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
pass
from datasets import load_dataset
dataset = load_dataset("mlabonne/FineTome-100k", split = "train")
數據集讀取
我們現在使用`standardize_sharegpt`將sharegpt風格的數據集轉換為HuggingFace的通用格式。
```
{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}
```
to
```
{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}
```
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)
抽查第5條記錄的數據格式
dataset[5]["conversations"]
輸出:
[{'content': 'How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?',
'role': 'user'},
{'content': 'Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.',
'role': 'assistant'}]
查看第5條記錄,模板格式化后的效果
dataset[5]["text"]
輸出:
'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'
6.模型訓練
配置訓練參數
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 60,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
使用 Unsloth 的方法只在助手輸出上進行訓練,而忽略用戶用戶的inputs
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
trainer,
instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
檢查的掩碼處理后的,輸入的input_ids
tokenizer.decode(trainer.train_dataset[5]["input_ids"])
輸出:
'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'
檢查的掩碼處理后,輸入的labels
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])
輸出:
' \n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'
我們可以看到系統和指令提示已成功屏蔽!
開始模型訓練
trainer_stats = trainer.train()
訓練效果如下:
7.模型推理
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3.1",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,
temperature = 1.5, min_p = 0.1)
tokenizer.batch_decode(outputs)
輸出:
['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nContinue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThe next two terms would be 13 and 21.\n\nFibonacci Sequence: 1, 1, 2, 3, 5, 8, 13, 21.<|eot_id|>']
8.保存微調模型
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
9.加載微調模型并推理
if False:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"role": "user", "content": "Describe a tall tower in the capital of France."},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
use_cache = True, temperature = 1.5, min_p = 0.1)
推理結果如下:
The Eiffel Tower is a famous tall structure located in Paris, the capital of France. It was built for the 1889 World's Fair and stands at a height of 324 meters (1,063 feet) high. The Eiffel Tower has become a symbol of Paris and is often referred to as the Iron Lady. Its construction was designed by Gustave Eiffel, a French engineer, and it was intended to be a temporary structure. However, it has remained standing for over a century and has become an iconic landmark in the city.<|eot_id|>
五、Unsloth 在實際項目中的應用
Unsloth 的高效性和靈活性使其在眾多領域都有著廣泛的應用前景。
在自然語言處理任務中,如文本分類、情感分析、機器翻譯等,Unsloth 可以幫助開發者快速微調預訓練模型,以適應不同的數據集和任務需求。通過減少訓練時間和顯存占用,開發者可以更高效地進行實驗和優化,提高模型的性能。
在對話系統開發中,Unsloth 能夠讓開發者快速訓練出個性化的對話模型。通過對大規模對話數據的微調,模型可以更好地理解用戶的輸入,并生成更加自然和準確的回復。這對于構建智能客服、聊天機器人等應用具有重要意義。
此外,在內容生成領域,如文章寫作、故事創作等方面,Unsloth 也可以發揮其優勢。開發者可以利用 Unsloth 微調語言模型,使其能夠根據給定的主題或提示生成高質量的文本內容。
六、總結與展望
Unsloth 作為一個強大的預訓練模型微調框架,為開發者提供了高效、便捷的模型微調解決方案。它的快速訓練速度、低顯存占用以及廣泛的兼容性等優勢,使其在人工智能領域具有重要的地位。通過合理地使用 Unsloth,開發者可以更加輕松地將預訓練模型應用到實際項目中,推動人工智能技術的發展和應用。
當然,Unsloth 也在不斷發展和完善中。未來,我們可以期待它在更多方面的創新和突破,為模型微調帶來更多的驚喜和可能性。同時,我們也希望更多的開發者能夠關注和使用 Unsloth,共同探索人工智能的無限潛力。
