Troubleshooting
Common issues and solutions for mistral.rs.
Debug Mode
Enable debug mode for more information:
MISTRALRS_DEBUG=1 mistralrs run -m <model>
Debug mode causes:
- If loading a GGUF or GGML model, outputs a file containing the names, shapes, and types of each tensor:
mistralrs_gguf_tensors.txtormistralrs_ggml_tensors.txt
- Increased logging verbosity
System Diagnostics
Run the built-in diagnostics tool:
mistralrs doctor
This checks your system configuration and reports any issues.
Common Issues
CUDA Issues
Setting the CUDA compiler path:
- Set the
NVCC_CCBINenvironment variable during build
Error: recompile with -fPIE:
- Some Linux distributions require compiling with
-fPIE - Set during build:
CUDA_NVCC_FLAGS=-fPIE cargo build --release --features cuda
Error: CUDA_ERROR_NOT_FOUND or symbol not found:
- For non-quantized models, specify the data type to load and run in
- Use one of
f32,f16,bf16orauto(auto chooses based on device) - Example:
mistralrs run -m <model> -d auto
Minimum CUDA compute capability:
- The minimum supported CUDA compute cap is 5.3
- Set a specific compute cap with:
CUDA_COMPUTE_CAP=80 cargo build --release --features cuda
Metal Issues (macOS)
Metal not found (error: unable to find utility “metal”):
-
Install Xcode:
xcode-select --install -
Set the active developer directory:
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
error: cannot execute tool ‘metal’ due to missing Metal toolchain
- Install Metal Toolchain:
xcodebuild -downloadComponent MetalToolchain
Disabling Metal kernel precompilation:
- By default, Metal kernels are precompiled during build time for better performance
- To skip precompilation (useful for CI or when Metal is not needed):
MISTRALRS_METAL_PRECOMPILE=0 cargo build --release --features metal
Memory Issues
Disabling mmap loading:
- Set
MISTRALRS_NO_MMAP=1to disable memory-mapped file loading - Forces all tensor data into memory
- Useful if you’re seeing mmap-related errors
Out of memory errors:
- Try using quantization:
--isq q4kor--isq q8_0 - Use device mapping to offload layers:
-n 0:16;cpu:16 - Reduce context length with PagedAttention:
--pa-context-len 4096
Model Loading Issues
Model type not auto-detected:
- If auto-detection fails, please raise an issue
- You can manually specify the architecture if needed
Chat template issues:
- Templates are usually auto-detected
- Override with:
-c /path/to/template.jinja - See Chat Templates for details
Getting Help
If you’re still stuck:
- Discord - Community support
- Matrix - Alternative chat
- GitHub Issues - Bug reports and feature requests
When reporting issues, please include:
- Output of
mistralrs doctor - Full error message
- Command you ran
- Hardware (GPU model, OS)