Ollama Setup Guide
Ollama Installation Guide
Ollama is a tool for running large language models (LLMs) locally. Follow these steps to install and run Ollama on your system.
Installation
Use the official Ollama installation script:
Pull the models you want to run by using the following command:
ollama pull llama3:8b
Different models can be found in the Ollama Library
Running an LLM
To run an instance of
llama3:8b
, use the following command:
Notes
These instructions can be modified to run other LLMs listed in the MODELS environment variable.
The speed and performance of the model depend on your hardware's capabilities.
You can run different models by replacing
llama3:8b
with the name of another model from the MODELS list.
Customizing Your Model Selection
To run a different model, use the format:
Replace [model_name]
with your chosen model from the MODELS list in your .env
file.
Rough Hardware Guidelines
General Requirements:
CPU: Modern multi-core processor (6+ cores recommended)
RAM: Minimum 16GB, 32GB or more recommended
Storage: SSD with at least 20GB free space
Operating System: Linux, macOS, or Windows with WSL2
Model-Specific Guidelines:
a) StableLM-2-Zephyr-1.6B:
Minimum: 8GB RAM, 4-core CPU
Recommended: 16GB RAM, 6-core CPU
GPU: Not strictly necessary, but a mid-range GPU can improve performance
b) Llama-3-8B:
Minimum: 16GB RAM, 6-core CPU
Recommended: 32GB RAM, 8-core CPU
GPU: Mid-range GPU (e.g., NVIDIA GTX 1660 or better) recommended
c) Qwen-4B:
Minimum: 12GB RAM, 4-core CPU
Recommended: 24GB RAM, 6-core CPU
GPU: Entry-level GPU can help, but not essential
d) Gemma2-9B:
Minimum: 32GB RAM, 6-core CPU
Recommended: 64GB RAM, 8-core CPU
GPU: Mid-range GPU recommended (e.g., NVIDIA RTX 3060 or better)
e) Mistral-7B:
Minimum: 16GB RAM, 6-core CPU
Recommended: 32GB RAM, 8-core CPU
GPU: Mid-range to high-end GPU recommended (e.g., NVIDIA RTX 3070 or better)
f) Phi-3-3.8B:
Minimum: 12GB RAM, 4-core CPU
Recommended: 24GB RAM, 6-core CPU
GPU: Entry-level to mid-range GPU can improve performance
Notes:
These are rough estimates and actual performance may vary.
GPU acceleration can significantly improve inference speed for all models.
For optimal performance, especially with larger models like 7B and 8B, a dedicated GPU with at least 8GB VRAM is recommended.
Users can run these models on CPUs, but inference times will be slower compared to GPU-accelerated setups.
SSD storage is strongly recommended for faster model loading times.
Always check the official Ollama documentation and specific model requirements for the most up-to-date information. Performance can be optimized by adjusting model parameters and quantization levels.
Last updated