Just opened a PR (huggingface/text-embeddings-inference#103) to add support for SageMaker-compatible images. Similar to huggingface/text-generation-inference#147, only for HF TEI.
Implementation-wise, since the required routes were already implemented, it was mostly just CI stuff and some hacks:
build-and-push-sagemaker-image
steps to build_*
workflowssagemaker
target to Dockerfile-cuda
and a custom sagemaker_entrypoint.sh
Initial tests suggest it works quite well with text embedding and reranker models (see image below for an example with BAAI/bge-reranker-base
). Currently working on a notebook demo and some load/stress tests to compare HF TEI’s performance against similar solutions.
Still under review, so stay tuned!