In this post, I’m going to show you a neat way to deploy small languages models (SLMs) or quantized versions of larger ones on AWS Lambda using function URLs and response streaming.
📝 Read the full article on AWS Community.
👨💻 All code and documentation is available at github.com/JGalego/SLaMbda.