Ollama Setup Guide

Ollama Installation Guide

Ollama is a tool for running large language models (LLMs) locally. Follow these steps to install and run Ollama on your system.

Installation

  1. Use the official Ollama installation script:

    curl -fsSL https://ollama.com/install.sh | sh
  2. Pull the models you want to run by using the following command:

    ollama pull llama3:8b

Different models can be found in the Ollama Library

Running an LLM

  1. To run an instance of llama3:8b, use the following command:

    ollama run llama3:8b

Notes

  • These instructions can be modified to run other LLMs listed in the MODELS environment variable.

  • The speed and performance of the model depend on your hardware's capabilities.

  • You can run different models by replacing llama3:8b with the name of another model from the MODELS list.

Customizing Your Model Selection

To run a different model, use the format:

ollama run [model_name]

Replace [model_name] with your chosen model from the MODELS list in your .env file.

Rough Hardware Guidelines

  1. General Requirements:

    • CPU: Modern multi-core processor (6+ cores recommended)

    • RAM: Minimum 16GB, 32GB or more recommended

    • Storage: SSD with at least 20GB free space

    • Operating System: Linux, macOS, or Windows with WSL2

  2. Model-Specific Guidelines:

    a) StableLM-2-Zephyr-1.6B:

    • Minimum: 8GB RAM, 4-core CPU

    • Recommended: 16GB RAM, 6-core CPU

    • GPU: Not strictly necessary, but a mid-range GPU can improve performance

    b) Llama-3-8B:

    • Minimum: 16GB RAM, 6-core CPU

    • Recommended: 32GB RAM, 8-core CPU

    • GPU: Mid-range GPU (e.g., NVIDIA GTX 1660 or better) recommended

    c) Qwen-4B:

    • Minimum: 12GB RAM, 4-core CPU

    • Recommended: 24GB RAM, 6-core CPU

    • GPU: Entry-level GPU can help, but not essential

    d) Gemma2-9B:

    • Minimum: 32GB RAM, 6-core CPU

    • Recommended: 64GB RAM, 8-core CPU

    • GPU: Mid-range GPU recommended (e.g., NVIDIA RTX 3060 or better)

    e) Mistral-7B:

    • Minimum: 16GB RAM, 6-core CPU

    • Recommended: 32GB RAM, 8-core CPU

    • GPU: Mid-range to high-end GPU recommended (e.g., NVIDIA RTX 3070 or better)

    f) Phi-3-3.8B:

    • Minimum: 12GB RAM, 4-core CPU

    • Recommended: 24GB RAM, 6-core CPU

    • GPU: Entry-level to mid-range GPU can improve performance

Notes:

  1. These are rough estimates and actual performance may vary.

  2. GPU acceleration can significantly improve inference speed for all models.

  3. For optimal performance, especially with larger models like 7B and 8B, a dedicated GPU with at least 8GB VRAM is recommended.

  4. Users can run these models on CPUs, but inference times will be slower compared to GPU-accelerated setups.

  5. SSD storage is strongly recommended for faster model loading times.

Always check the official Ollama documentation and specific model requirements for the most up-to-date information. Performance can be optimized by adjusting model parameters and quantization levels.

Last updated