Fine-tuning open-source models can help businesses solve their complex and unique use cases using AI, without relying on any enterprise closed source AI models. It enables engineers to utilize the power of pre-trained models for specific use cases.
In this blog, we'll go through the detailed process of fine-tuning open-source Large Language Models using publicly available tools and frameworks.
There are many open-source models available, such as LLaMA, Mistral, and Gemma. You can choose one that better suits your needs. For this blog, we'll be using LLaMA by Meta. LLaMA 3.1 is the latest version of LLaMA, available in three parameter sizes (8B, 70B, and 405B). For this guide, we'll be using the LLaMA 3.1 8B model.
The easiest way to run an open-source LLM on your local machine or server is by using Ollama.
Ollama is available for Mac, Windows, and Linux operating systems. After downloading and installing Ollama on your local machine or server, you can access it using your operating system's command line.
Run the Ollama server
ollama serve
Download a model using Ollama
ollama pull llama3.1:8b
To access the downloaded models using the Ollama API, you simply need to send the model name and the prompt in the request body.
curl http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt":"Why is the sky blue?" }'
What is Unsloth?
Unsloth accelerates fine-tuning of LLMs like LLaMA-3, Mistral, Phi-3, and Gemma, making it twice as fast and using 70% less memory—all without compromising accuracy. You can utilize Unsloth's Google Colab notebook or create a Python script to run the fine-tuning on your server.
First, install Unsloth and its dependencies:
pip install unsloth pip install -U transformers accelerate
Create a dataset in the format Unsloth expects. Here's an example of how to structure your data:
dataset = [ {"instruction": "Translate to French: Hello, how are you?", "output": "Bonjour, comment allez-vous?"}, {"instruction": "Summarize this text: [Your long text here]", "output": "[Your summary here]"}, # Add more examples... ]
Load the pre-trained LLaMA model using Unsloth:
from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( "meta-llama/Llama-3.1-8b-hf", trust_remote_code=True, load_in_4bit=True, )
Set up your training configuration:
from transformers import TrainingArguments training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=4, warmup_steps=500, weight_decay=0.01, logging_dir="./logs", )
Use Unsloth to fine-tune the model on your dataset:
from unsloth import FastTrainer trainer = FastTrainer( model=model, args=training_args, train_dataset=dataset, tokenizer=tokenizer, ) trainer.train()
After training, save your fine-tuned model locally or push it to the Hugging Face Hub:
model.save_pretrained("./fine_tuned_model") # Local saving tokenizer.save_pretrained("./fine_tuned_model") # Push the fine-tuned model to Hugging Face Hub (Optional): # model.push_to_hub("your_name/fine_tuned_model", token = "...") # tokenizer.push_to_hub("your_name/fine_tuned_model", token = "...")
To use your fine-tuned model with Ollama, you'll need to convert it to the GGUF format. This conversion allows you to create and use your fine-tuned model within Ollama's ecosystem.
# Save to 16bit GGUF model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
After converting your model to GGUF format, you can create a new Ollama model using the following command:
This command creates a new Ollama model named 'model_name' using the Modelfile in the ./model directory. The Modelfile contains the necessary information for Ollama to use your fine-tuned model.
ollama create model_name -f ./model/Modelfile
Finally, test your fine-tuned model with some example inputs:
from transformers import pipeline generator = pipeline("text-generation", model="./fine_tuned_model") result = generator("Translate to French: Good morning!") print(result[0]['generated_text'])
This step-by-step guide provides a basic framework for fine-tuning using Unsloth. You may want to expand on each step with more detailed explanations and additional code snippets for data preprocessing, model evaluation, and advanced fine-tuning techniques.
You can also push your newly created fine-tuned Ollama model to your Ollama account, allowing you to download and use it anywhere.
Fine-tuning open-source models can help businesses solve their complex and unique use cases using AI, without relying on any enterprise closed source AI models. It enables engineers to utilize the power of pre-trained models for specific use cases.
In this blog, we'll go through the detailed process of fine-tuning open-source Large Language Models using publicly available tools and frameworks.
There are many open-source models available, such as LLaMA, Mistral, and Gemma. You can choose one that better suits your needs. For this blog, we'll be using LLaMA by Meta. LLaMA 3.1 is the latest version of LLaMA, available in three parameter sizes (8B, 70B, and 405B). For this guide, we'll be using the LLaMA 3.1 8B model.
The easiest way to run an open-source LLM on your local machine or server is by using Ollama.
Ollama is available for Mac, Windows, and Linux operating systems. After downloading and installing Ollama on your local machine or server, you can access it using your operating system's command line.
Run the Ollama server
ollama serve
Download a model using Ollama
ollama pull llama3.1:8b
To access the downloaded models using the Ollama API, you simply need to send the model name and the prompt in the request body.
curl http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt":"Why is the sky blue?" }'
What is Unsloth?
Unsloth accelerates fine-tuning of LLMs like LLaMA-3, Mistral, Phi-3, and Gemma, making it twice as fast and using 70% less memory—all without compromising accuracy. You can utilize Unsloth's Google Colab notebook or create a Python script to run the fine-tuning on your server.
First, install Unsloth and its dependencies:
pip install unsloth pip install -U transformers accelerate
Create a dataset in the format Unsloth expects. Here's an example of how to structure your data:
dataset = [ {"instruction": "Translate to French: Hello, how are you?", "output": "Bonjour, comment allez-vous?"}, {"instruction": "Summarize this text: [Your long text here]", "output": "[Your summary here]"}, # Add more examples... ]
Load the pre-trained LLaMA model using Unsloth:
from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( "meta-llama/Llama-3.1-8b-hf", trust_remote_code=True, load_in_4bit=True, )
Set up your training configuration:
from transformers import TrainingArguments training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=4, warmup_steps=500, weight_decay=0.01, logging_dir="./logs", )
Use Unsloth to fine-tune the model on your dataset:
from unsloth import FastTrainer trainer = FastTrainer( model=model, args=training_args, train_dataset=dataset, tokenizer=tokenizer, ) trainer.train()
After training, save your fine-tuned model locally or push it to the Hugging Face Hub:
model.save_pretrained("./fine_tuned_model") # Local saving tokenizer.save_pretrained("./fine_tuned_model") # Push the fine-tuned model to Hugging Face Hub (Optional): # model.push_to_hub("your_name/fine_tuned_model", token = "...") # tokenizer.push_to_hub("your_name/fine_tuned_model", token = "...")
To use your fine-tuned model with Ollama, you'll need to convert it to the GGUF format. This conversion allows you to create and use your fine-tuned model within Ollama's ecosystem.
# Save to 16bit GGUF model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
After converting your model to GGUF format, you can create a new Ollama model using the following command:
This command creates a new Ollama model named 'model_name' using the Modelfile in the ./model directory. The Modelfile contains the necessary information for Ollama to use your fine-tuned model.
ollama create model_name -f ./model/Modelfile
Finally, test your fine-tuned model with some example inputs:
from transformers import pipeline generator = pipeline("text-generation", model="./fine_tuned_model") result = generator("Translate to French: Good morning!") print(result[0]['generated_text'])
This step-by-step guide provides a basic framework for fine-tuning using Unsloth. You may want to expand on each step with more detailed explanations and additional code snippets for data preprocessing, model evaluation, and advanced fine-tuning techniques.
You can also push your newly created fine-tuned Ollama model to your Ollama account, allowing you to download and use it anywhere.
Let’s Talk And Get Started