Description
System Info
Hello,
Attempting to deploy the AWS prebuilt tei-cpu:2.0.1-tei1.7.0-cpu-py310-ubuntu22.04
image on a SageMaker serverless endpoint yields the following error:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| timestamp | message |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1747757347229 | {"timestamp":"2025-05-20T16:09:07.146790Z","level":"INFO","message":"Args { model_id: \"mix*******-**/*****-*****-****e-v1\", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: \"0.0.0.0\", port: 8080, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/data\"), payload_limit: 2000000, api_key: None, json_output: true, disable_spans: false, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":185} |
| 1747757347443 | {"timestamp":"2025-05-20T16:09:07.443002Z","level":"INFO","message":"Starting download","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":20,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]} |
| 1747757347443 | {"timestamp":"2025-05-20T16:09:07.443565Z","level":"INFO","message":"Downloading `1_Pooling/config.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":53,"span":{"name":"download_pool_config"},"spans":[{"name":"download_artifacts"},{"name":"download_pool_config"}]} |
| 1747757347543 | {"timestamp":"2025-05-20T16:09:07.543009Z","level":"WARN","message":"Download failed: I/O error Permission denied (os error 13)","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":26,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]} |
| 1747757348454 | {"timestamp":"2025-05-20T16:09:08.454266Z","level":"INFO","message":"Downloading `config_sentence_transformers.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":77,"span":{"name":"download_new_st_config"},"spans":[{"name":"download_artifacts"},{"name":"download_new_st_config"}]} |
| 1747757348544 | {"timestamp":"2025-05-20T16:09:08.544007Z","level":"WARN","message":"Download failed: I/O error Permission denied (os error 13)","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":36,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]} |
| 1747757348544 | {"timestamp":"2025-05-20T16:09:08.544042Z","level":"INFO","message":"Downloading `config.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":40,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]} |
| 1747757348637 | Error: Could not download model artifacts |
| 1747757348637 | Caused by: |
| 1747757348637 | 0: I/O error Permission denied (os error 13) |
| 1747757348637 | 1: Permission denied (os error 13) |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
My searching turned up this section of the AWS SageMaker docs:
If the container you use for a serverless endpoint is the same one you used on an instance-based endpoint, your container may not have permissions to write files. This can happen for the following reasons:
Your serverless endpoint fails to create or update due to a ping health check failure.
The Amazon CloudWatch logs for the endpoint show that the container is failing to write to some file or directory due to a permissions error.
To fix this issue, you can try to add read, write, and execute permissions for other on the file or directory and then rebuild the container. You can perform the following steps to complete this process:
In the Dockerfile you used to build your container, add the following command:
RUN chmod o+rwX <file or directory name>
Rebuild the container.
Upload the new container image to Amazon ECR.
Try to create or update the serverless endpoint again.
If this is indeed the missing line, can this step be reasonably/straightforwardly be added to the tei-cpu
container available to AWS SageMaker? If not, is there a recommended way to proceed otherwise?
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
In a SageMaker notebook:
- Ensure latest version of
sagemaker
Python SDK (2.224.2 at time of writing) for access to TEI 1.7:
%pip install --upgrade sagemaker
- Execute minimal deployment reproduction (modified from TEI introductory blog post):
import boto3
from sagemaker import get_execution_role
from sagemaker.huggingface import get_huggingface_llm_image_uri
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serverless import ServerlessInferenceConfig
try:
role = get_execution_role()
except ValueError:
iam = boto3.client("iam")
role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]
config = {
"HF_MODEL_ID": "Snowflake/snowflake-arctic-embed-m-v1.5"
}
emb_model = HuggingFaceModel(
role=role,
image_uri=get_huggingface_llm_image_uri("huggingface-tei-cpu"),
env=config
)
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=6144, max_concurrency=1,
)
predictor = emb_model.deploy(
serverless_inference_config=serverless_config
)
This fails and CloudWatch logs should yield same result as shown above.
Endpoint test code for completeness:
data = {
"inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted .",
}
res = emb.predict(data=data)
print(f"length of embeddings: {len(res[0])}")
print(f"first 10 elements of embeddings: {res[0][:10]}")
Expected behavior
Expect deployment to complete successfully and test endpoint invocation to complete successfully.