-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
PySDK Version
- PySDK V2 (2.x)
- PySDK V3 (3.x)
Describe the bug
The SFT finetuning example notebook hardcodes an S3 URI (s3://mc-flows-sdk-testing/...) that external users do not have access to. Any user following the notebook will hit a 403 Forbidden error immediately when registering the dataset.
The notebook should either use a publicly accessible dataset or clearly instruct users to substitute their own, with a link to the required dataset format.
To reproduce
Run the following cell from sft_finetuning_example_notebook_pysdk_prod_v3.ipynb as-is:
from sagemaker.ai_registry.dataset import DataSet
dataset = DataSet.create(
name="demo-1",
source="s3://mc-flows-sdk-testing/input_data/sft/sample_data_256_final.jsonl"
)Expected behavior
The example notebook should work out of the box, or clearly guide users to supply their own dataset with instructions on the required format.
Screenshots or logs
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:9 │
│ │
│ 6 # Register dataset in SageMaker AI Registry │
│ 7 # This creates a versioned dataset that can be referenced by ARN │
│ 8 # Provide a source (it can be local file path or S3 URL) │
│ ❱ 9 dataset = DataSet.create( │
│ 10 │ name="demo-1", │
│ 11 │ source="s3://mc-flows-sdk-testing/input_data/sft/sample_data_256_final.jsonl" │
│ 12 ) │
│ │
│ .venv/lib/python3.11/site-packages/sagemaker/core/telemetry/telemetry_logging.py:172 in wrapper │
│ ❱ 172 │ │ │ │ │ │ raise caught_ex │
│ │
│ .venv/lib/python3.11/site-packages/sagemaker/core/telemetry/telemetry_logging.py:143 in wrapper │
│ ❱ 143 │ │ │ │ │ response = func(*args, **kwargs) │
│ │
│ .venv/lib/python3.11/site-packages/sagemaker/ai_registry/dataset.py:283 in create │
│ 280 │ │ │ │ local_path = tmp_file.name │
│ 281 │ │ │
│ 282 │ │ │ try: │
│ ❱ 283 │ │ │ │ AIRHub.download_from_s3(source, local_path) │
│ 284 │ │ │ │ cls._validate_dataset_format(local_path) │
│ 285 │ │ │ finally: │
│ 286 │ │ │ │ if os.path.exists(local_path): │
│ │
│ .venv/lib/python3.11/site-packages/sagemaker/core/telemetry/telemetry_logging.py:180 in wrapper │
│ ❱ 180 │ │ │ │ return func(*args, **kwargs) │
│ │
│ .venv/lib/python3.11/site-packages/sagemaker/ai_registry/air_hub.py:290 in download_from_s3 │
│ 287 │ │ parsed = urlparse(s3_uri) │
│ 288 │ │ bucket = parsed.netloc │
│ 289 │ │ key = parsed.path.lstrip("/") │
│ ❱ 290 │ │ AIRHub._s3_client.download_file(bucket, key, local_path) │
│ 291 │
│ │
│ .venv/lib/python3.11/site-packages/botocore/context.py:123 in wrapper │
│ ❱ 123 │ │ │ │ return func(*args, **kwargs) │
│ │
│ .venv/lib/python3.11/site-packages/boto3/s3/inject.py:223 in download_file │
│ 222 │ with S3Transfer(self, Config) as transfer: │
│ ❱ 223 │ │ return transfer.download_file( │
│ 224 │ │ │ bucket=Bucket, │
│ 225 │ │ │ key=Key, │
│ 226 │ │ │ filename=Filename, │
│ │
│ .venv/lib/python3.11/site-packages/boto3/s3/transfer.py:484 in download_file │
│ ❱ 484 │ │ │ future.result() │
│ │
│ .venv/lib/python3.11/site-packages/s3transfer/futures.py:111 in result │
│ ❱ 111 │ │ │ return self._coordinator.result() │
│ │
│ .venv/lib/python3.11/site-packages/s3transfer/futures.py:287 in result │
│ 286 │ │ if self._exception: │
│ ❱ 287 │ │ │ raise self._exception │
│ │
│ .venv/lib/python3.11/site-packages/s3transfer/tasks.py:272 in _main │
│ ❱ 272 │ │ │ self._submit(transfer_future=transfer_future, **kwargs) │
│ │
│ .venv/lib/python3.11/site-packages/s3transfer/download.py:359 in _submit │
│ 356 │ │ │ transfer_future.meta.size is None │
│ 357 │ │ │ or transfer_future.meta.etag is None │
│ 358 │ │ ): │
│ ❱ 359 │ │ │ response = client.head_object( │
│ 360 │ │ │ │ Bucket=transfer_future.meta.call_args.bucket, │
│ 361 │ │ │ │ Key=transfer_future.meta.call_args.key, │
│ 362 │ │ │ │ **transfer_future.meta.call_args.extra_args, │
│ │
│ .venv/lib/python3.11/site-packages/botocore/client.py:602 in _api_call │
│ ❱ 602 │ │ │ return self._make_api_call(operation_name, kwargs) │
│ │
│ .venv/lib/python3.11/site-packages/botocore/context.py:123 in wrapper │
│ ❱ 123 │ │ │ │ return func(*args, **kwargs) │
│ │
│ .venv/lib/python3.11/site-packages/botocore/client.py:1078 in _make_api_call │
│ 1075 │ │ │ │ 'error_code_override' │
│ 1076 │ │ │ ) or error_info.get("Code") │
│ 1077 │ │ │ error_class = self.exceptions.from_code(error_code) │
│ ❱ 1078 │ │ │ raise error_class(parsed_response, operation_name) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
System information
- SageMaker Python SDK version: SageMaker 3.5.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): SFTTrainer
- Framework version: N/A
- Python version: 3.11
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
Affected file: v3-examples/model-customization-examples/sft_finetuning_example_notebook_pysdk_prod_v3.ipynb