Releases: VectorInstitute/vector-inference
Releases · VectorInstitute/vector-inference
v0.5.0
What's Changed
- Decouple model config from repo, the config priority is the following descending order: user-defined config, cached config, default config
- Code base refactor, command logic moved to helper classes
- Retire launch server bash script, moved server launching logic to python
- Update metrics command to use
metrics
API endpoint - Exposed additional launch parameters for CLI
- Automate docker image build process and pushing to dockerhub
- Added unit tests for
_cli.py
,_utils.py
, and imports, improved test coverage - Add server launch info json to logging
- Misc house-keeping fixes
New Models
- VLM:
- Phi-3.5-vision-instruct
- Molmo-7B-D-0924
- InternVL2_5-8B
- glm-4v-9b
- deepseek-vl2
- deepseek-vl2-small
- Reward Modelling:
- Qwen2.5-Math-PRM-7B
New Contributors
Other Contributors
@amrit110 @XkunW @fcogidi @xeon27 @kohankhaki
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
- Add pre-commit CI config by @amrit110 in #20
- Add dependabot.yml file by @amrit110 in #23
- Update jinja2 and fix poetry.lock file by @amrit110 in #25
- Bump up vllm to 0.7.0 by @amrit110 in #27
- Add CI check for PRs submitted to develop branch by @amrit110 in #26
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #22
- Migrate to uv, add lint fixes, improvements and some unit tests by @amrit110 in #28
- onboard BGE-Base-EN-v1.5 and All-MiniLM-L6-v2 by @kohankhaki in #29
- Add docs for vector-inference by @amrit110 in #30
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #31
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #32
- Update docs_build.yml by @amrit110 in #33
- Update dependencies by @fcogidi in #34
New Contributors
- @pre-commit-ci made their first contribution in #22
Full Changelog: v0.4.0...v0.4.1
v0.4.0.post1
- Fix wrong dependency
- Updated README files
v0.4.0
- Onboarded various new models and new model types: text embedding model and reward reasoning model.
- Added
metrics
command that streams performance metrics for inference server. - Enabled more launch command options:
--max-num-seqs
,--model-weights-parent-dir
,--pipeline-parallelism
,--enforce-eager
. - Improved support for launching custom models.
- Improved command response time.
- Improved visuals for
list
command.
v0.3.3
v0.3.2
v0.3.1
v0.3.0
-
Added
vec-inf
CLI:- Install
vec-inf
viapip
launch
command to launch modelsstatus
command to check inference server statusshutdown
command to stop inference serverlist
command to see all available models
- Install
-
Upgraded
vllm
to0.5.4
-
Added support for new model families:
- Llama 3.1 (Including 405B)
- Gemma 2
- Phi 3
- Mistral Large