Skip to content

Failing Deployment on AWS SageMaker Serverless Endpoint #609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 4 tasks
sutsr opened this issue May 21, 2025 · 4 comments · Fixed by awslabs/llm-hosting-container#152
Closed
2 of 4 tasks

Failing Deployment on AWS SageMaker Serverless Endpoint #609

sutsr opened this issue May 21, 2025 · 4 comments · Fixed by awslabs/llm-hosting-container#152
Assignees

Comments

@sutsr
Copy link

sutsr commented May 21, 2025

System Info

Hello,

Attempting to deploy the AWS prebuilt tei-cpu:2.0.1-tei1.7.0-cpu-py310-ubuntu22.04 image on a SageMaker serverless endpoint yields the following error:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   timestamp   |                                                                                                                                                                                                                                                                                                                                                                                                                       message                                                                                                                                                                                                                                                                                                                                                                                                                       |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1747757347229 | {"timestamp":"2025-05-20T16:09:07.146790Z","level":"INFO","message":"Args { model_id: \"mix*******-**/*****-*****-****e-v1\", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: \"0.0.0.0\", port: 8080, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/data\"), payload_limit: 2000000, api_key: None, json_output: true, disable_spans: false, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":185} |
| 1747757347443 | {"timestamp":"2025-05-20T16:09:07.443002Z","level":"INFO","message":"Starting download","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":20,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| 1747757347443 | {"timestamp":"2025-05-20T16:09:07.443565Z","level":"INFO","message":"Downloading `1_Pooling/config.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":53,"span":{"name":"download_pool_config"},"spans":[{"name":"download_artifacts"},{"name":"download_pool_config"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| 1747757347543 | {"timestamp":"2025-05-20T16:09:07.543009Z","level":"WARN","message":"Download failed: I/O error Permission denied (os error 13)","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":26,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| 1747757348454 | {"timestamp":"2025-05-20T16:09:08.454266Z","level":"INFO","message":"Downloading `config_sentence_transformers.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":77,"span":{"name":"download_new_st_config"},"spans":[{"name":"download_artifacts"},{"name":"download_new_st_config"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| 1747757348544 | {"timestamp":"2025-05-20T16:09:08.544007Z","level":"WARN","message":"Download failed: I/O error Permission denied (os error 13)","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":36,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| 1747757348544 | {"timestamp":"2025-05-20T16:09:08.544042Z","level":"INFO","message":"Downloading `config.json`","target":"text_embeddings_core::download","filename":"core/src/download.rs","line_number":40,"span":{"name":"download_artifacts"},"spans":[{"name":"download_artifacts"}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| 1747757348637 | Error: Could not download model artifacts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| 1747757348637 | Caused by:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| 1747757348637 |     0: I/O error Permission denied (os error 13)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 1747757348637 |     1: Permission denied (os error 13)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

My searching turned up this section of the AWS SageMaker docs:

If the container you use for a serverless endpoint is the same one you used on an instance-based endpoint, your container may not have permissions to write files. This can happen for the following reasons:

Your serverless endpoint fails to create or update due to a ping health check failure.

The Amazon CloudWatch logs for the endpoint show that the container is failing to write to some file or directory due to a permissions error.

To fix this issue, you can try to add read, write, and execute permissions for other on the file or directory and then rebuild the container. You can perform the following steps to complete this process:

In the Dockerfile you used to build your container, add the following command: RUN chmod o+rwX <file or directory name>

Rebuild the container.

Upload the new container image to Amazon ECR.

Try to create or update the serverless endpoint again.

If this is indeed the missing line, can this step be reasonably/straightforwardly be added to the tei-cpu container available to AWS SageMaker? If not, is there a recommended way to proceed otherwise?

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

In a SageMaker notebook:

  1. Ensure latest version of sagemaker Python SDK (2.224.2 at time of writing) for access to TEI 1.7:
%pip install --upgrade sagemaker
  1. Execute minimal deployment reproduction (modified from TEI introductory blog post):
import boto3
from sagemaker import get_execution_role
from sagemaker.huggingface import get_huggingface_llm_image_uri
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serverless import ServerlessInferenceConfig

try:
    role = get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

config = {
  "HF_MODEL_ID": "Snowflake/snowflake-arctic-embed-m-v1.5"
}

emb_model = HuggingFaceModel(
  role=role,
  image_uri=get_huggingface_llm_image_uri("huggingface-tei-cpu"),
  env=config
)

serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=6144, max_concurrency=1,
)

predictor = emb_model.deploy(
    serverless_inference_config=serverless_config
)

This fails and CloudWatch logs should yield same result as shown above.

Endpoint test code for completeness:

data = {
  "inputs": "the mesmerizing performances of the leads keep the film grounded and keep the audience riveted .",
}
res = emb.predict(data=data)

print(f"length of embeddings: {len(res[0])}")
print(f"first 10 elements of embeddings: {res[0][:10]}")

Expected behavior

Expect deployment to complete successfully and test endpoint invocation to complete successfully.

@alvarobartt
Copy link
Member

Hey @sutsr thanks for reporting, I'll try to investigate and come back to you soon! 🤗

cc @fgbelidji for visibility!

@alvarobartt
Copy link
Member

Hey @sutsr, I've reproduced and can confirm that the issue is there, the solution would be to also include the environment variable "HUGGINGFACE_HUB_CACHE": "/opt/ml/model" in the env argument so that the Hugging Face cache directory used is the /opt/ml/model i.e. a writable location, otherwise the default HUGGINGFACE_HUB_CACHE value won't work.

model = HuggingFaceModel(
    role=role,
    image_uri=get_huggingface_llm_image_uri("huggingface-tei-cpu"),
    env={
        "HF_MODEL_ID": "Snowflake/snowflake-arctic-embed-m-v1.5",
        "HUGGINGFACE_HUB_CACHE": "/opt/ml/model",
    },
)

Ideally, AFAIK that should be included by default but it's apparently not so you need to specify that manually yourself, but I can confirm that with the snippet above it will work just fine! Feel free to close the issue if resolved, and I'll iterate internally with the team to make sure that the environment variable is correctly set (cc @fgbelidji, @arjkesh and @pagezyhf)

@alvarobartt alvarobartt self-assigned this May 26, 2025
@sutsr
Copy link
Author

sutsr commented May 26, 2025

Brilliant, thanks Alvaro! I'll be able to confirm the solution tomorrow then will report back.

@sutsr
Copy link
Author

sutsr commented May 27, 2025

Confirming that the addition of the HUGGINGFACE_HUB_CACHE environment variable as Alvaro indicated allows the SageMaker serverless endpoint to deploy without issue and respond as expected when invoked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants