Skip to content

Failing Deployment on AWS SageMaker Serverless Endpoint #609

Closed
awslabs/llm-hosting-container
#152
@sutsr

Description

@sutsr

System Info

Hello,

Attempting to deploy the AWS prebuilt tei-cpu:2.0.1-tei1.7.0-cpu-py310-ubuntu22.04 image on a SageMaker serverless endpoint yields the following error:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   timestamp   |                                                                                                                                                                                                                                                                                                                                                                                                                       message                                                                                                                                                                                                                                                                                                                                                                                                                       |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1747757347229 | {"timestamp":"2025-05-20T16:09:07.146790Z","level":"INFO","message":"Args { model_id: \"mix*******-**/*****-*****-****e-v1\", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: \"0.0.0.0\", port: 8080, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/data\"), payload_limit: 2000000, api_key: None, json_output: true, disable_spans: false, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":185} |
| 1747757347443 | {"timestamp":"2025-05-20T16:09:07.443002Z","level":"INFO","message":"Starting download","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":20,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| 1747757347443 | {"timestamp":"2025-05-20T16:09:07.443565Z","level":"INFO","message":"Downloading `1_Pooling/config.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":53,"span":{"name":"download_pool_config"},"spans":[{"name":"download_artifacts"},{"name":"download_pool_config"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| 1747757347543 | {"timestamp":"2025-05-20T16:09:07.543009Z","level":"WARN","message":"Download failed: I/O error Permission denied (os error 13)","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":26,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| 1747757348454 | {"timestamp":"2025-05-20T16:09:08.454266Z","level":"INFO","message":"Downloading `config_sentence_transformers.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":77,"span":{"name":"download_new_st_config"},"spans":[{"name":"download_artifacts"},{"name":"download_new_st_config"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| 1747757348544 | {"timestamp":"2025-05-20T16:09:08.544007Z","level":"WARN","message":"Download failed: I/O error Permission denied (os error 13)","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":36,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| 1747757348544 | {"timestamp":"2025-05-20T16:09:08.544042Z","level":"INFO","message":"Downloading `config.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":40,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| 1747757348637 | Error: Could not download model artifacts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| 1747757348637 | Caused by:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| 1747757348637 |     0: I/O error Permission denied (os error 13)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 1747757348637 |     1: Permission denied (os error 13)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

My searching turned up this section of the AWS SageMaker docs:

If the container you use for a serverless endpoint is the same one you used on an instance-based endpoint, your container may not have permissions to write files. This can happen for the following reasons:

Your serverless endpoint fails to create or update due to a ping health check failure.

The Amazon CloudWatch logs for the endpoint show that the container is failing to write to some file or directory due to a permissions error.

To fix this issue, you can try to add read, write, and execute permissions for other on the file or directory and then rebuild the container. You can perform the following steps to complete this process:

In the Dockerfile you used to build your container, add the following command: RUN chmod o+rwX <file or directory name>

Rebuild the container.

Upload the new container image to Amazon ECR.

Try to create or update the serverless endpoint again.

If this is indeed the missing line, can this step be reasonably/straightforwardly be added to the tei-cpu container available to AWS SageMaker? If not, is there a recommended way to proceed otherwise?

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

In a SageMaker notebook:

  1. Ensure latest version of sagemaker Python SDK (2.224.2 at time of writing) for access to TEI 1.7:
%pip install --upgrade sagemaker
  1. Execute minimal deployment reproduction (modified from TEI introductory blog post):
import boto3
from sagemaker import get_execution_role
from sagemaker.huggingface import get_huggingface_llm_image_uri
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serverless import ServerlessInferenceConfig

try:
    role = get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

config = {
  "HF_MODEL_ID": "Snowflake/snowflake-arctic-embed-m-v1.5"
}

emb_model = HuggingFaceModel(
  role=role,
  image_uri=get_huggingface_llm_image_uri("huggingface-tei-cpu"),
  env=config
)

serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=6144, max_concurrency=1,
)

predictor = emb_model.deploy(
    serverless_inference_config=serverless_config
)

This fails and CloudWatch logs should yield same result as shown above.

Endpoint test code for completeness:

data = {
  "inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted .",
}
res = emb.predict(data=data)

print(f"length of embeddings: {len(res[0])}")
print(f"first 10 elements of embeddings: {res[0][:10]}")

Expected behavior

Expect deployment to complete successfully and test endpoint invocation to complete successfully.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions