Ravenwood Creations

Setting Up a Raspberry Pi 5 to Use an Offline LLM in Survival Situations

Setting Up a Raspberry Pi 5 to Use an Offline LLM in Survival Situations

Ever since OpenAI released ChatGPT 3 in late 2022, Large Language Models (LLMs) have captured the world’s imagination by demonstrating remarkable capabilities, from writing essays to answering complex questions. However, you don’t need to rely on companies like OpenAI or Google and share potentially personal data to take advantage of the power of LLMs. With just an affordable Raspberry Pi, you can set up your own local AI chat-based assistant. This guide will show you how.

What You'll Need

Essential Components

To set up your own LLM on a Raspberry Pi, there are a few essential components you’ll need:

  • Raspberry Pi: Since LLMs are resource-intensive, it is recommended to use the most powerful Raspberry Pi available for optimal performance. As of writing this article, the recommended choice is the 8 GB Raspberry Pi 5.
  • microSD Card with Raspberry Pi OS: For maximum performance, consider using the lite version of Raspberry Pi OS, as a graphical user interface isn’t necessary to run an LLM (you can interact with it remotely using a terminal and SSH). However, if you’re using your Raspberry Pi for other tasks or as your primary computer, you can use the regular version of Raspberry Pi OS.
  • Additional Components: Apart from the Raspberry Pi and a fast microSD card, you’ll need a reliable power supply (the official one is recommended), a keyboard, mouse, and monitor for initial setup (optional if you’re using SSH), and an internet connection for downloading necessary software and models.

With these components on hand, you are ready to begin setting up your LLM on your Raspberry Pi.

Setting Up Raspian OS on Your Raspberry Pi

Before you can install Ollama or any other software, you need to set up your Raspberry Pi with Raspian OS. Follow these steps to get started:

  1. Download Raspian OS: Visit the official Raspberry Pi website and download the latest version of Raspian OS. For optimal performance, I recommend downloading the Raspian Imager.
  2. Flash the microSD Card: Use a tool like Raspian Imager to flash the downloaded Raspian OS image onto your microSD card. Insert the microSD card into your computer, select the Raspian OS image, choose the microSD card as the target, and click "Flash."
  3. Initial Setup:
    • Insert the flashed microSD card into your Raspberry Pi.
    • Connect the keyboard, mouse, and monitor to your Raspberry Pi.
    • Plug in the power supply to boot up the Raspberry Pi.
  4. Configure Raspian OS:
    • Follow the on-screen instructions to configure your Raspian OS installation, including setting up the language, time zone, and Wi-Fi network.
    • Once the setup is complete, you should see the Raspian OS desktop.
  5. Update the System:
sudo apt update
sudo apt upgrade -y

With Raspian OS installed and configured, your Raspberry Pi is ready for the next steps in setting up your own local AI chat-based assistant.

Install Ollama

The first step in setting up your own LLM on a Raspberry Pi is to install the necessary software. Currently, the two most popular choices for running LLMs locally are llama.cpp and Ollama.

Why Ollama?

  • llama.cpp is a lightweight C++ implementation of Meta’s LLaMA (Large Language Model Adapter) that can run on a wide range of hardware, including Raspberry Pi. It was developed by Georgi Gerganov and released in March 2023.
  • Ollama, on the other hand, is built around llama.cpp, offering several user-friendly features. It automatically handles templating chat requests to the format each model expects, and it loads and unloads models on demand based on the client’s request. Ollama also manages downloading and caching models, including quantized models, so you can request them by name.

For this guide, we’ll be using Ollama due to its ease of use and extra features.

Installing Ollama

To install Ollama on your Raspberry Pi, open a terminal window. If you’re using SSH, connect to your Raspberry Pi using your preferred SSH client. Then, enter the following command in the terminal:

curl -fsSL https://ollama.com/install.sh | sh

This command downloads and executes the installation script from the official Ollama website. The script will automatically install the required dependencies and set up Ollama on your Raspberry Pi.

Download and Run an LLM

After installing Ollama, it's time to download a large language model. If you're using a Raspberry Pi with 8 GB of RAM, you can run models with up to 7 billion parameters (the settings that the AI uses to determine its outputs).

Choosing a Model

Some popular choices include Mistral (7B), Gemma (7B or 2B), Llama 2 uncensored (7B), or Microsoft’s Phi-3 (3.8B). You can view all supported models on the Ollama library page.

For this guide, we’ll be using Microsoft’s Phi-3 model. Despite its small size and efficiency, Phi-3 is an extremely capable model. To install it, simply run the following command in the terminal:

ollama run phi3

Using a Local LLM on Your Raspberry Pi

After downloading and installing the Phi-3 model, you’ll see a prompt in the terminal that looks like this:

>>> Send a message (/? for help)

This means that the LLM is running and waiting for your input. To start interacting with the model, type your message and press Enter.

Effective Prompt Crafting

Here are some tips for crafting effective prompts:

  • Be Specific: Provide clear and detailed instructions or questions to help the LLM understand what you’re looking for.
  • Set the Context: Give the LLM some background information or a scenario to help it generate more relevant responses.
  • Define Roles: Specify the role the LLM should assume in its response, such as a storyteller, a teacher, or a technical expert.

To end the LLM session, press Ctrl + d or enter the /bye command. If you wish to start another session later, just open a new terminal and run the ollama run phi3 command. Since the model is already downloaded, it will start up quickly without needing to download again.

Performance Considerations

Keep in mind that the Raspberry Pi 5’s performance has its limits, and it can only output a few tokens per second. For better performance, consider running Ollama on a more powerful computer with a dedicated graphics card.

Maximizing Your LLM Experience

Customizing Model Settings

To get the most out of your LLM, you might want to tweak some settings:

  • Token Limit: Adjust the maximum number of tokens generated in a single response.
  • Temperature: Control the randomness of the model’s responses. A higher temperature makes the output more random, while a lower temperature makes it more deterministic.
  • Top-k Sampling: Limit the number of tokens to consider at each step during generation, which can help with generating more focused responses.

Running Multiple Models

You can also download and run multiple models on your Raspberry Pi. Simply use the ollama run [model_name] command with the name of the model you want to use. This flexibility allows you to switch between different models depending on your needs.

Troubleshooting Common Issues

Installation Problems

If you encounter issues during the installation of Ollama, make sure your Raspberry Pi OS is up to date. Run the following commands to update and upgrade your system:

sudo apt-get update
sudo apt-get upgrade

Performance Bottlenecks

If you find that the performance of your Raspberry Pi is not meeting your expectations, consider the following:

  • Cooling: Ensure your Raspberry Pi is adequately cooled, as overheating can throttle performance.
  • Power Supply: Use a high-quality power supply to prevent power-related performance issues.
  • Resource Management: Close unnecessary applications and processes to free up system resources.

Problems running Ollama

If you happen to run into the error:

bind: address already in use

This issue is due to Ollama automatically starting its service after installation and encountering a problem with the available port. Here's how you can resolve this:

Identify the Ollama Service Port:

Run the following command to list the Ollama service running on the specified port:

sudo lsof -i :11434

Stop and Restart the Ollama Service:

After identifying the service, stop and restart it by running:

sudo systemctl stop ollama.service
sudo systemctl start ollama.service

These steps should resolve the port conflict and allow Ollama to run smoothly on your Raspberry Pi.

Model Loading Issues

If a model fails to load, verify that you have enough free memory and storage space. You can check the available memory with the free -h command and available storage with the df -h command.

Advanced Usage Scenarios

Integrating with Other Applications

You can enhance your LLM by integrating it with other applications. For example, you can use your LLM as a backend for a chatbot on a website or integrate it with home automation systems to provide intelligent responses to voice commands.

Creating Custom Models

If you have specific requirements, you can create your own custom models by fine-tuning existing models on your own dataset. This process involves training the model on a specialized dataset to improve its performance on specific tasks or domains.

Deploying in a Networked Environment

Deploying your Large Language Model (LLM) in a networked environment can enhance the setup, enabling multiple devices to connect to the model via a local network. This configuration is particularly beneficial in educational or research contexts where several users require access to the LLM.

Optimizing for Survival Situations

Power Management

  • Use a portable battery pack.
  • Consider solar chargers for extended use.

Data Backup

  • Regularly back up your MicroSD card.
  • Keep a spare card with a pre-installed setup.

Weatherproofing

  • Use a waterproof case for your Raspberry Pi.
  • Store in a durable container to protect against physical damage.

Conclusion

Setting up your own local AI chat-based assistant using a Raspberry Pi is not only feasible but also a rewarding experience. By following this guide, you can harness the power of large language models without relying on third-party services and ensure your data remains private. Whether you're using it for personal projects, educational purposes, or just for fun, the possibilities are endless.

FAQs

1. Can I use an older Raspberry Pi model for this setup?

While it’s possible to use older Raspberry Pi models, their performance may be significantly limited. For the best experience, it’s recommended to use the Raspberry Pi 5 with 8 GB of RAM. I successfully ran this on a pi 3b....but it was ridiculously slow and not very useful tbh.

2. How do I update Ollama or the installed models?

To update Ollama, simply rerun the installation script. To update models, visit the Ollama library page, look for new versions, and adhere to the provided download instructions.

3. Is it possible to run multiple LLMs simultaneously on a Raspberry Pi?

Yes, you can run multiple LLMs, but keep in mind that the Raspberry Pi’s resources are limited. Running multiple models simultaneously may impact performance.

4. Can I use the LLM for real-time applications?

While it’s possible to use the LLM for real-time applications, the performance constraints of the Raspberry Pi might make it challenging. For better real-time performance, consider using a more powerful machine.

5. How can I contribute to the development of Ollama or similar projects?

You can contribute by providing feedback, reporting issues, or even contributing code if you have the skills. Check the official Ollama repository for contribution guidelines and ways to get involved.