-
-
Notifications
You must be signed in to change notification settings - Fork 170
2.3.33 Satellite: OptiLLM
Handle:
optillm
URL: http://localhost:34301

optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs.
# [Optional] Pre-build the image
harbor build optillm
# Start the service [--tail is optional, to show logs]
harbor up optillm --tailSee troubleshooting guide if you encounter any issues.
optillm serves underlying models without any modifications, so it's not possible for the upstream services (for example webui) to distinguish them from the original models when connected to both optillm and the original inference backend. Additionally, its streaming implementation is not compatible with all the frontends. See compatibility section.
optillm is pre-configured (but not tested) to connect to the following services: airllm, aphrodite, boost (yes you can chain the optimising proxies), dify, ktransformers, litellm, llamacpp, mistralrs, nexa, ollama, omnichain, pipelines, sglang, tabbyapi, vllm.
See the official parameters reference. To set them, see Harbor's environment configuration guide.
# Example: see OPTILLM_APPROACH env variable value
harbor env optillm OPTILLM_APPROACH
# Example: set OPTILLM_APPROACH env variable
harbor env optillm OPTILLM_APPROACH mctsAdditionally, following options can be set via harbor config:
# The port on the host where OptiLLM endpoint will be available
OPTILLM_HOST_PORT 34301
# The path to the workspace directory on the host machine
# (relative to $(harbor home), but can be global as well)
OPTILLM_WORKSPACE ./optillm/data