Bringing 🤗 Text Embeddings Inference to Amazon SageMaker

Just opened a PR (huggingface/text-embeddings-inference#103) to add support for SageMaker-compatible images. Similar to huggingface/text-generation-inference#147, only for HF TEI.

Implementation-wise, since the required routes were already implemented, it was mostly just CI stuff and some hacks:

  1. Added build-and-push-sagemaker-image steps to build_* workflows
  2. Added a sagemaker target to Dockerfile-cuda and a custom sagemaker_entrypoint.sh

Initial tests suggest it works quite well with text embedding and reranker models (see image below for an example with BAAI/bge-reranker-base). Currently working on a notebook demo and some load/stress tests to compare HF TEI’s performance against similar solutions.

Still under review, so stay tuned!


© João Galego | Built with ❤️ using Jekyll