Tried to deploy as serverless with AWS sagemaker and got error that transformer

#9
by prem2282 - opened

Here is the deployment script i used.

sagemaker_session = sagemaker.Session(default_bucket=bucket_name)

huggingface_model = HuggingFaceModel(
transformers_version="4.51.3",
pytorch_version="2.6.0",
py_version="py312",
env=hub,
role=role,
sagemaker_session=sagemaker_session
)

serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=6144, # 8GB, adjust if needed
max_concurrency=2
)

predictor = huggingface_model.deploy(
serverless_inference_config=serverless_config,
endpoint_name="medical-reasoning-gpt-oss-20b-serverless"
)


Here is the error

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400)
from model with message "{
"code": 400,
"type": "InternalServerException",
"message": "The checkpoint you are trying to load has model type gpt_oss but Transformers does not recognize
this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers
is out of date.\n\nYou can update Transformers with the command pip install --upgrade transformers. If this does
not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In
this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git"

Here is the deployment script i used.

sagemaker_session = sagemaker.Session(default_bucket=bucket_name)

huggingface_model = HuggingFaceModel(
transformers_version="4.51.3",
pytorch_version="2.6.0",
py_version="py312",
env=hub,
role=role,
sagemaker_session=sagemaker_session
)

serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=6144, # 8GB, adjust if needed
max_concurrency=2
)

predictor = huggingface_model.deploy(
serverless_inference_config=serverless_config,
endpoint_name="medical-reasoning-gpt-oss-20b-serverless"
)


Here is the error

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400)
from model with message "{
"code": 400,
"type": "InternalServerException",
"message": "The checkpoint you are trying to load has model type gpt_oss but Transformers does not recognize
this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers
is out of date.\n\nYou can update Transformers with the command pip install --upgrade transformers. If this does
not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In
this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git"

The issue comes from deploying the base model and the adapter as two separate repositories. SageMaker Serverless only downloads a single model repo, so the adapter weights are never loaded. In this state, SageMaker attempts to initialize the raw base checkpoint, which has "model_type": "gpt_oss".

The solution is to merge the adapter into the base model locally (using PEFT's merge_and_unload ()) and then upload the merged weights as one unified model repository. After merging, SageMaker can load the model correctly.

Sign up or log in to comment