Self-Hosted LLMs¶
Medium Converter can integrate with self-hosted LLMs, giving you privacy, cost savings, and control over the enhancement process.
Local LLM Setup¶
Installation¶
This installs the llama-cpp-python package which allows running various local LLMs.
Using Local Models¶
Command Line¶
# Use local LLM for enhancement
medium convert https://medium.com/example-article --enhance --llm-provider local \
--option llm.model_path="path/to/model.gguf"
Python API¶
from medium_converter import convert_article
from medium_converter.llm.config import LLMConfig, LLMProvider
llm_config = LLMConfig(
provider=LLMProvider.LOCAL,
extra_params={
"model_path": "path/to/model.gguf",
"n_ctx": 4096,
"n_batch": 512
}
)
await convert_article(
url="https://medium.com/example-article",
enhance=True,
llm_config=llm_config
)
Recommended Models¶
The following models work well with Medium Converter:
- Mistral-7B-Instruct - Good balance of quality and performance
- TheBloke/Mistral-7B-Instruct-v0.2-GGUF
-
Recommended quantization: Q4_K_M
-
Llama-2-7b-chat - Good text enhancement capabilities
- TheBloke/Llama-2-7B-Chat-GGUF
-
Recommended quantization: Q4_K_M
-
Phi-2 - Lightweight model for basic enhancements
- TheBloke/phi-2-GGUF
- Recommended quantization: Q4_0
Configuration Options¶
| Option | Description | Default |
|---|---|---|
model_path |
Path to the model file (.gguf) | Required |
n_ctx |
Context window size | 2048 |
n_batch |
Batch size for prompt processing | 512 |
n_gpu_layers |
Number of layers to offload to GPU | 0 |
temperature |
Randomness of generations | 0.7 |
repeat_penalty |
Penalty for repeating tokens | 1.1 |
top_p |
Nucleus sampling parameter | 0.9 |
top_k |
Limit vocabulary to top K options | 40 |
Server Integration¶
Medium Converter can use local LLMs running on a server with compatible APIs.
Using LM Studio¶
- Download and install LM Studio
- Load a model of your choice
- Start the local server (OpenAI API compatible)
- Configure Medium Converter:
from medium_converter import convert_article
from medium_converter.llm.config import LLMConfig, LLMProvider
llm_config = LLMConfig(
provider=LLMProvider.OPENAI, # Uses OpenAI-compatible API
model="local-model", # Model name doesn't matter
api_base="http://localhost:1234/v1", # LM Studio server URL
api_key="lm-studio", # Any string works
)
await convert_article(
url="https://medium.com/example-article",
enhance=True,
llm_config=llm_config
)
Using Ollama¶
- Install Ollama
- Pull a model:
ollama pull mistral - Start Ollama
- Configure Medium Converter:
from medium_converter import convert_article
from medium_converter.llm.config import LLMConfig
llm_config = LLMConfig(
model="ollama/mistral", # Must prefix with ollama/
extra_params={
"api_base": "http://localhost:11434"
}
)
await convert_article(
url="https://medium.com/example-article",
enhance=True,
llm_config=llm_config
)
Performance Considerations¶
- Hardware Requirements: Most 7B models require at least 8GB of RAM
- GPU Acceleration: Setting
n_gpu_layers> 0 can dramatically improve performance if you have a GPU - Quantization: Lower precision (e.g., Q4_K_M) reduces memory requirements but may affect quality
- Context Length: Reducing
n_ctxcan save memory but limits the text length that can be processed at once
Troubleshooting¶
Common Issues¶
- Out of Memory
- Try a smaller model or lower quantization (Q4 instead of Q8)
- Reduce the context length (
n_ctx) -
Process the article in smaller chunks
-
Slow Performance
- Enable GPU acceleration if available
- Increase batch size (
n_batch) if you have sufficient memory -
Use a smaller model
-
Poor Quality Output
- Try a larger or more recent model
- Adjust temperature (lower for more deterministic output)
- Fine-tune prompt templates (see the docs on customizing prompts)