Skip to content

Releases: VectorInstitute/vector-inference

v0.5.0

20 Mar 20:36
45ef380
Compare
Choose a tag to compare

What's Changed

  • Decouple model config from repo, the config priority is the following descending order: user-defined config, cached config, default config
  • Code base refactor, command logic moved to helper classes
  • Retire launch server bash script, moved server launching logic to python
  • Update metrics command to use metrics API endpoint
  • Exposed additional launch parameters for CLI
  • Automate docker image build process and pushing to dockerhub
  • Added unit tests for _cli.py, _utils.py, and imports, improved test coverage
  • Add server launch info json to logging
  • Misc house-keeping fixes

New Models

  • VLM:
    • Phi-3.5-vision-instruct
    • Molmo-7B-D-0924
    • InternVL2_5-8B
    • glm-4v-9b
    • deepseek-vl2
    • deepseek-vl2-small
  • Reward Modelling:
    • Qwen2.5-Math-PRM-7B

New Contributors

Other Contributors

@amrit110 @XkunW @fcogidi @xeon27 @kohankhaki

Full Changelog: v0.4.1...v0.5.0

v0.4.1

14 Feb 19:28
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.4.1

v0.4.0.post1

28 Nov 19:23
dba901b
Compare
Choose a tag to compare
  • Fix wrong dependency
  • Updated README files

v0.4.0

28 Nov 18:21
d221dae
Compare
Choose a tag to compare
  • Onboarded various new models and new model types: text embedding model and reward reasoning model.
  • Added metrics command that streams performance metrics for inference server.
  • Enabled more launch command options: --max-num-seqs, --model-weights-parent-dir, --pipeline-parallelism, --enforce-eager.
  • Improved support for launching custom models.
  • Improved command response time.
  • Improved visuals for list command.

v0.3.3

03 Sep 21:53
d10758d
Compare
Choose a tag to compare
  • Added missing package in decencies
  • Fixed pre-commit hooks
  • Linted and formatted code
  • Updated outdated examples

v0.3.2

03 Sep 18:27
39b98a2
Compare
Choose a tag to compare
  • Add support for custom models, users can now launch custom models as long as the model architecture is supported by vllm
  • Minor update multi-node job launching to better support custom models
  • Add Llama3-OpenBioLLM-70B to supported model list

v0.3.1

29 Aug 13:41
f43d7bf
Compare
Choose a tag to compare
  • Add model-name argument to list command to show default setup of a specific supported model
  • Improved command option descriptions
  • Restructured models directory
  • Add some default values for using a custom model

v0.3.0

29 Aug 06:09
156dfa5
Compare
Choose a tag to compare
  • Added vec-inf CLI:

    • Install vec-inf via pip
    • launch command to launch models
    • status command to check inference server status
    • shutdown command to stop inference server
    • list command to see all available models
  • Upgraded vllm to 0.5.4

  • Added support for new model families:

    • Llama 3.1 (Including 405B)
    • Gemma 2
    • Phi 3
    • Mistral Large

v0.2.1

06 Jul 15:58
2c43a25
Compare
Choose a tag to compare
  • Add CodeLlama
  • Update model variant names for Llama 2 in README

v0.2.0

04 Jul 14:29
635e13f
Compare
Choose a tag to compare
  • Update default environment to use singularity container, added associated Dockerfile
  • Update vLLM to 0.5.0 and added VLM support (LLaVa-1.5 and LLaVa-NEXT) and updated example scripts
  • Refactored repo structure for simpler model onboard and update process