snowflake-ml-python1.18.0
Published
The machine learning client library that is used for interacting with Snowflake to build machine learning solutions.
pip install snowflake-ml-python
Package Downloads
Authors
Project URLs
Requires Python
<3.13,>=3.9
Dependencies
- anyio
<5,>=3.5.0 - cachetools
<6,>=3.1.1 - cloudpickle
>=2.0.0 - cryptography
- fsspec
[http]<2026,>=2024.6.1 - importlib_resources
<7,>=6.1.1 - numpy
<3,>=1.23 - packaging
<25,>=20.9 - pandas
<3,>=2.1.4 - platformdirs
<5 - pyarrow
<19.0.0 - pydantic
<3,>=2.8.2 - pyjwt
<3,>=2.0.0 - pytimeparse
<2,>=1.1.8 - pyyaml
<7,>=6.0 - retrying
<2,>=1.3.3 - s3fs
<2026,>=2024.6.1 - scikit-learn
<1.8 - scipy
<2,>=1.9 - shap
<1,>=0.46.0 - snowflake-connector-python
[pandas]<4,>=3.17.0 - snowflake-snowpark-python
!=1.26.0,<2,>=1.17.0 - snowflake.core
<2,>=1.0.2 - sqlparse
<1,>=0.4 - tqdm
<5 - typing-extensions
<5,>=4.1.0 - xgboost
<4 - altair
<6,>=5; extra == "all" - catboost
<2,>=1.2.0; extra == "all" - keras
<4,>=2.0.0; extra == "all" - lightgbm
<5,>=4.1.0; extra == "all" - mlflow
<3,>=2.16.0; extra == "all" - prophet
<2,>=1.1.0; extra == "all" - sentence-transformers
<4,>=2.7.0; extra == "all" - sentencepiece
<0.2.0,>=0.1.95; extra == "all" - streamlit
<2,>=1.30.0; extra == "all" - tensorflow
<3,>=2.17.0; extra == "all" - tokenizers
<1,>=0.15.1; extra == "all" - torch
<3,>=2.0.1; extra == "all" - torchdata
<1,>=0.4; extra == "all" - transformers
!=4.51.3,<5,>=4.39.3; extra == "all" - altair
<6,>=5; extra == "altair" - catboost
<2,>=1.2.0; extra == "catboost" - keras
<4,>=2.0.0; extra == "keras" - tensorflow
<3,>=2.17.0; extra == "keras" - torch
<3,>=2.0.1; extra == "keras" - lightgbm
<5,>=4.1.0; extra == "lightgbm" - mlflow
<3,>=2.16.0; extra == "mlflow" - prophet
<2,>=1.1.0; extra == "prophet" - streamlit
<2,>=1.30.0; extra == "streamlit" - tensorflow
<3,>=2.17.0; extra == "tensorflow" - torch
<3,>=2.0.1; extra == "torch" - torchdata
<1,>=0.4; extra == "torch" - sentence-transformers
<4,>=2.7.0; extra == "transformers" - sentencepiece
<0.2.0,>=0.1.95; extra == "transformers" - tokenizers
<1,>=0.15.1; extra == "transformers" - torch
<3,>=2.0.1; extra == "transformers" - transformers
!=4.51.3,<5,>=4.39.3; extra == "transformers"
Snowflake ML Python
Snowflake ML Python is a set of tools including SDKs and underlying infrastructure to build and deploy machine learning models. With Snowflake ML Python, you can pre-process data, train, manage and deploy ML models all within Snowflake, and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine Learning workflow.
Key Components of Snowflake ML Python
The Snowflake ML Python SDK provides a number of APIs to support each stage of an end-to-end Machine Learning development and deployment process.
Snowflake ML Model Development
Snowflake ML Model Development provides a collection of python APIs enabling efficient ML model development directly in Snowflake:
-
Modeling API (
snowflake.ml.modeling) for data preprocessing, feature engineering and model training in Snowflake. This includes thesnowflake.ml.modeling.preprocessingmodule for scalable data transformations on large data sets utilizing the compute resources of underlying Snowpark Optimized High Memory Warehouses, and a large collection of ML model development classes based on sklearn, xgboost, and lightgbm. -
Framework Connectors: Optimized, secure and performant data provisioning for Pytorch and Tensorflow frameworks in their native data loader formats.
Snowflake ML Ops
Snowflake ML Python contains a suite of MLOps tools. It complements the Snowflake Modeling API, and provides end to end development to deployment within Snowflake. The Snowflake ML Ops suite consists of:
- Registry: A python API allows secure deployment and management of models in Snowflake, supporting models trained both inside and outside of Snowflake.
- Feature Store: A fully integrated solution for defining, managing, storing and discovering ML features derived from your data. The Snowflake Feature Store supports automated, incremental refresh from batch and streaming data sources, so that feature pipelines need be defined only once to be continuously updated with new data.
- Datasets: Dataset provide an immutable, versioned snapshot of your data suitable for ingestion by your machine learning models.
Getting started
Learn about all Snowflake ML feature offerings in the Developer Guide.
Have your Snowflake account ready
If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.
Installation
Snowflake ML Python is pre-installed in Container Runtime notebook environments. Learn more.
In Snowflake Warehouse notebook environments, snowflake-ml-python can be installed using the "Packages" drop-down menu.
Follow the installation instructions in the Snowflake documentation.
Python versions 3.9 to 3.12 are supported. You can use miniconda or anaconda to create a Conda environment (recommended), or virtualenv to create a virtual environment.
Conda channels
The Snowflake Anaconda Channel contains the official snowflake-ml-python package
releases. To install snowflake-ml-python from this conda channel:
conda install \
-c https://repo.anaconda.com/pkgs/snowflake \
--override-channels \
snowflake-ml-python
See the developer guide for detailed installation instructions.
The snowflake-ml-python package is also published in conda-forge.
To install snowflake-ml-python from conda forge:
conda install \
-c https://conda.anaconda.org/conda-forge/ \
--override-channels \
snowflake-ml-python
Verifying the package
-
Install cosign. This example is using golang installation: installing-cosign-with-go.
-
Download the file from the repository like pypi.
-
Download the signature files from the release tag.
-
Verify signature on projects signed using Jenkins job:
cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0.tar.gz.sig cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0
NOTE: Version 1.7.0 is used as example here. Please choose the the latest version.
Release History
1.18.0
Bug Fixes
- Registry: The create_service API now validates that a model has a GPU runtime configuration and will throw a descriptive error if the configuration is missing.
Behavior Changes
New Features
- Registry (PrPr): Introducing
ModelVersion.run_batchfor batch inference in Snowpark Container Services. - Experiment Tracking (PrPr): Added
version_nameargument to the autologging callbacks to specify the version name for the autologged model.
Deprecations
Python 3.9is deprecated.
1.17.0
Bug Fixes
- ML Job: Added support for retrieving details of deleted jobs, including status, compute pool, and target instances.
Behavior Changes
New Features
- Support xgboost 3.x.
- ML Job: Overhauled the
MLJob.result()API with broader cross-version compatibility and support for additional data types, namely:- Pandas DataFrames
- PyArrow Tables
- NumPy arrays
- NOTE: Requires
snowflake-ml-python>=1.17.0to be installed inside remote container environment.
- ML Job: Enabled job submission v2 by default
- Jobs submitted using v2 will automatically use the latest Container Runtime image
- v1 behavior can be restored by setting environment variable
MLRS_USE_SUBMIT_JOB_V2tofalse
Deprecations
1.16.0
Bug Fixes
- Registry: Remove redundant pip dependency warnings when
artifact_repository_mapis provided for warehouse model deployments.
Behavior Changes
New Features
- Support scikit-learn < 1.8.
- ML Job: Added support for configuring the runtime image via
runtime_environment(image tag or full image URL) at submission time. Examples:- @remote(compute_pool, stage_name = 'payload_stage', runtime_environment = '1.8.0')
- submit_file('/path/to/repo/test.py', compute_pool, stage_name = 'payload_stage', runtime_environment = '/mydb/myschema/myrepo/myimage:latest')
- Registry: Ability to mark model methods as
Volatility.VOLATILEorVolatility.IMMUTABLE.
from snowflake.ml.model.volatility import Volatility
options = {
"embed_local_ml_library": True,
"relax_version": True,
"save_location": "/path/to/my/directory",
"function_type": "TABLE_FUNCTION",
"volatility": Volatility.IMMUTABLE,
"method_options": {
"predict": {
"case_sensitive": False,
"max_batch_size": 100,
"function_type": "TABLE_FUNCTION",
"volatility": Volatility.VOLATILE,
},
},
}
1.15.0 (09-29-2025)
Bug Fixes
Behavior Changes
- Registry: Dropping support for deprecated
conversationaltask type for Huggingface models. To read more https://github.com/huggingface/transformers/pull/31165
New Features
1.14.0 (09-18-2025)
Bug Fixes
Behavior Changes
New Features
- ML Job: The
additional_payloadsargument is now deprecated in favor ofimports.
1.13.0
Bug Fixes
Behavior Changes
New Features
- Registry: Log a HuggingFace model without having to load the model in memory using
the
huggingface_pipeline.HuggingFacePipelineModel. Requireshuggingface_hubpackage to installed. To disable downloading HuggingFace repository, providedownload_snapshot=Falsewhile creating thehuggingface_pipeline.HuggingFacePipelineModelobject. - Registry: Added support for XGBoost models to use
enable_categorical=Truewith pandas DataFrame - Registry: Added support to display privatelink inference endpoint in ModelVersion list services.
1.12.0
Bug Fixes
- Registry: Fixed an issue where the string representation of dictionary-type output columns was being incorrectly created during structured output deserialization. Now, the original data type is properly preserved.
- Registry: Fixed the inference server performance issue for wide (500+ features) and JSON inputs.
Behavior Changes
New Features
- Registry: Add OpenAI chat completion compatible signature option for
text-generationmodels.
from snowflake.ml.model import openai_signatures
import pandas as pd
mv = snowflake_registry.log_model(
model=generator,
model_name=...,
...,
signatures=openai_signatures.OPENAI_CHAT_SIGNATURE,
)
# create a pd.DataFrame with openai.client.chat.completions arguments like below:
x_df = pd.DataFrame.from_records(
[
{
"messages": [
{"role": "system", "content": "Complete the sentence."},
{
"role": "user",
"content": "A descendant of the Lost City of Atlantis, who swam to Earth while saying, ",
},
],
"max_completion_tokens": 250,
"temperature": 0.9,
"stop": None,
"n": 3,
"stream": False,
"top_p": 1.0,
"frequency_penalty": 0.1,
"presence_penalty": 0.2,
}
],
)
# OpenAI Chat Completion compatible output
output_df = mv.run(X=x_df)
- Model Monitoring: Added support for segment columns to enable filtered analysis.
- Added
segment_columnsparameter toModelMonitorSourceConfigto specify columns for segmenting monitoring data - Segment columns must be of STRING type and exist in the source table
- Added methods to dynamically manage segments:
add_segment_column(): Add a new segment column to an existing monitordrop_segment_column(): Remove a segment column from an existing monitor
- Added
- Experiment Tracking (PrPr): Support for logging artifacts (files and directories) with
log_artifact - Experiment Tracking (PrPr): Support for listing artifacts in a run with
list_artifacts - Experiment Tracking (PrPr): Support for downloading artifacts in a run with
download_artifacts
1.11.0 (08-12-2025)
Bug Fixes
- ML Job: Fix
Error: Unable to retrieve head IP addressif not all instances start within the timeout. - ML Job: Fix
TypeError: SnowflakeCursor.execute() got an unexpected keyword argument '_force_qmark_paramstyle'when running inside Stored Procedures.
Behavior Changes
New Features
ModelVersion.create_service(): Madeimage_repoargument optional. By default it will use a default image repo, which is being rolled out in server version 9.22+.- Experiment Tracking (PrPr): Automatically log the model, metrics, and parameters while training Keras models with
snowflake.ml.experiment.callback.keras.SnowflakeKerasCallback.
1.10.0
Behavior Changes
- Experiment Tracking (PrPr): The import paths for the auto-logging callbacks have changed to
snowflake.ml.experiment.callback.xgboost.SnowflakeXgboostCallbackandsnowflake.ml.experiment.callback.lightgbm.SnowflakeLightgbmCallback.
New Features
- Registry: add progress bars for
ModelVersion.create_serviceandModelVersion.log_model. - ModelRegistry: Logs emitted during
ModelVersion.create_servicewill be written to a file. The file location will be shown in the console.
1.9.2
Bug Fixes
- DataConnector: Fix
self._sessionrelated errors inside Container Runtime. - Registry: Fix a bug when trying to pass
Noneto array (pd.dtype('O')) in signature and pandas data handler.
New Features
- Experiment Tracking (PrPr): Automatically log the model, metrics, and parameters while training XGBoost and LightGBM models.
from snowflake.ml.experiment import ExperimentTracking
from snowflake.ml.experiment.callback import SnowflakeXgboostCallback, SnowflakeLightgbmCallback
exp = ExperimentTracking(session=sp_session, database_name="ML", schema_name="PUBLIC")
exp.set_experiment("MY_EXPERIMENT")
# XGBoost
callback = SnowflakeXgboostCallback(
exp, log_model=True, log_metrics=True, log_params=True, model_name="model_name", model_signature=sig
)
model = XGBClassifier(callbacks=[callback])
with exp.start_run():
model.fit(X, y, eval_set=[(X_test, y_test)])
# LightGBM
callback = SnowflakeLightgbmCallback(
exp, log_model=True, log_metrics=True, log_params=True, model_name="model_name", model_signature=sig
)
model = LGBMClassifier()
with exp.start_run():
model.fit(X, y, eval_set=[(X_test, y_test)], callbacks=[callback])
1.9.1 (07-18-2025)
Bug Fixes
- Registry: Fix a bug when trying to set the PAD token the HuggingFace
text-generationmodel had multiple EOS tokens. The handler picks the first EOS token as PAD token now.
New Features
- DataConnector: DataConnector objects can now be pickled
- Dataset: Dataset objects can now be pickled
- Registry (PrPr): Introducing
create_servicefunction insnowflake/ml/model/models/huggingface_pipeline.pywhich creates a service to log a HF model and upon successful logging, an inference service is created.
from snowflake.ml.model.models import huggingface_pipeline
hf_model_ref = huggingface_pipeline.HuggingFacePipelineModel(
model="gpt2",
task="text-generation", # Optional
)
hf_model_ref.create_service(
session=session,
service_name="test_service",
service_compute_pool="test_compute_pool",
image_repo="test_repo",
...
)
- Experiment Tracking (PrPr): New module for managing and tracking ML experiments in Snowflake.
from snowflake.ml.experiment import ExperimentTracking
exp = ExperimentTracking(session=sp_session, database_name="ML", schema_name="PUBLIC")
exp.set_experiment("MY_EXPERIMENT")
with exp.start_run():
exp.log_param("batch_size", 32)
exp.log_metrics("accuracy", 0.98, step=10)
exp.log_model(my_model, model_name="MY_MODEL")
- Registry: Added support for wide input (500+ features) for inference done using SPCS
1.9.0
Bug Fixes
- Registry: Fixed bug causing snowpark to pandas dataframe conversion to fail when
QUOTED_IDENTIFIERS_IGNORE_CASEparameter is enabled - Registry: Fixed duplicate UserWarning logs during model packaging
- Registry: If the huggingface pipeline text-generation model doesn't contain a default chat template, a ChatML template is assigned to the tokenizer.
{% for message in messages %}
{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}
{% endfor %}
{% if add_generation_prompt %}
{{ '<|im_start|>assistant\n' }}
{% endif %}"
- Registry: Fixed SQL queries during registry initialization that were forcing warehouse requirement
Behavior Changes
- ML Job: The
list_jobs()API has been modified. Thescopeparameter has been removed, optionaldatabaseandschemaparameters have been added, the return type has changed fromsnowpark.DataFrametopandas.DataFrame, and the returned columns have been updated toname,status,message,database_name,schema_name,owner,compute_pool,target_instances,created_time, andcompleted_time. - Registry: Set
relax_versionto false when pip_requirements are specified while logging model - Registry: UserWarning will now be raised based on specified target_platforms (addresses spurious warnings)
New Features
- Registry:
target_platformssupportsTargetPlatformMode:WAREHOUSE_ONLY,SNOWPARK_CONTAINER_SERVICES_ONLY, orBOTH_WAREHOUSE_AND_SNOWPARK_CONTAINER_SERVICES. - Registry: Introduce
snowflake.ml.model.target_platform.TargetPlatform, target platform constants, andsnowflake.ml.model.task.Task. - ML Job: Single-node ML Jobs are now in GA. Multi-node support is now in PuPr
- Moved less frequently used job submission parameters to
**kwargs - Platform metrics are now enabled by default
list_jobs()behavior changed, see Behavior Changes for more info
- Moved less frequently used job submission parameters to
1.8.6
Bug Fixes
- Fixed fatal errors from internal telemetry wrappers.
New Features
- Registry: Add service container info to logs.
- ML Job (PuPr): Add new
submit_from_stage()API for submitting a payload from an existing stage path. - ML Job (PuPr): Add support for
snowpark.Sessionobjects in the argument list of@remotedecorated functions.Sessionobject will be injected from context in the job execution environment.
1.8.5
Bug Fixes
- Registry: Fixed a bug when listing and deleting container services.
- Registry: Fixed explainability issue with scikit-learn pipelines, skipping explain function creation.
- Explainability: bump minimum streamlit version down to 1.30
- Modeling: Make XGBoost a required dependency (xgboost is not a required dependency in snowflake-ml-python 1.8.4).
Behavior Changes
- ML Job (Multi-node PrPr): Rename argument
num_instancestotarget_instancesin job submission APIs and change type fromOptional[int]toint
New Features
- Registry: No longer checks if the snowflake-ml-python version is available in the Snowflake Conda channel when logging an SPCS-only model.
- ML Job (PuPr): Add
min_instancesargument to the job decorator to allow waiting for workers to be ready. - ML Job (PuPr): Adjust polling behavior to reduce number of SQL calls.
Deprecations
SnowflakeLoginOptionsis deprecated and will be removed in a future release.
1.8.4 (2025-05-12)
Bug Fixes
- Registry: Default
enable_explainabilityto True when the model can be deployed to Warehouse. - Registry: Add
custom_model.partitioned_apidecorator and deprecatepartitioned_inference_api. - Registry: Fixed a bug when logging pytroch and tensorflow models that caused
UnboundLocalError: local variable 'multiple_inputs' referenced before assignment.
Behavior Changes
- ML Job (PuPr) Updated property
idto be fully qualified name; Introduced new propertynameto represent the ML Job name - ML Job (PuPr) Modified
list_jobs()to return ML Jobnameinstead ofid - Registry: Error in
log_modelifenable_explainabilityis True and model is only deployed to Snowpark Container Services, instead of just user warning.
New Features
- ML Job (PuPr): Extend
@remotefunction decorator,submit_file()andsubmit_directory()to acceptdatabaseandschemaparameters - ML Job (PuPr): Support querying by fully qualified name in
get_job() - Explainability: Added visualization functions to
snowflake.ml.monitoringto plot explanations in notebooks. - Explainability: Support explain for categorical transforms for sklearn pipeline
- Support categorical type for
xgboost.DMatrixinputs.
1.8.3
New Features
- Registry: Default to the runtime cuda version if available when logging a GPU model in Container Runtime.
- ML Job (PuPr): Added
as_listargument toMLJob.get_logs()to enable retrieving logs as a list of strings - Registry: Support
ModelVersion.run_jobto run inference with a single-node Snowpark Container Services job. - DataConnector: Removed PrPr decorators
- Registry: Default the target platform to warehouse when logging a partitioned model.
1.8.2
New Features
- ML Job now available as a PuPr feature
- Add ability to retrieve results for
@remotedecorated functions using newMLJobWithResult.result()API, which will return the unpickled result or raise an exception if the job execution failed. - Pre-created Snowpark Session is now available inside job payloads using
snowflake.snowpark.context.get_active_session()
- Add ability to retrieve results for
- Registry: Introducing
save_locationtolog_modelusing theoptionsargument. Users can use thesave_locationoption to specify a local directory where the model files and configuration are written. This is useful when the default temporary directory has space limitations.
reg.log_model(
model=...,
model_name=...,
version_name=...,
...,
options={"save_location": "./model_directory"},
)
- Registry: Include model dependencies in pip requirements by default when logging in Container Runtime.
- Multi-node ML Job (PrPr): Add
instance_idargument toget_logsandshow_logsmethod to support multi node log retrieval - Multi-node ML Job (PrPr): Add
job.get_instance_status(instance_id=...)API to support multi node status retrieval
1.8.1 (03-26-2025)
Bug Fixes
- Registry: Fix a bug that caused
unsupported model typeerror while logging a sklearn model withscore_samplesinference method. - Registry: Fix a bug that model inference service creation fails on an existing and suspended service.
New Features
- ML Job (PrPr): Update Container Runtime image version to
1.0.1 - ML Job (PrPr): Add
enable_metricsargument to job submission APIs to enable publishing service metrics to Event Table. See Accessing Event Table service metrics for retrieving published metrics and Costs of telemetry data collection for cost implications. - Registry: When creating a copy of a
ModelVersionwithlog_model, raise an exception if unsupported arguments are provided.
1.8.0 (03-20-2025)
Bug Fixes
- Modeling: Fix a bug in some metrics that allowed an unsupported version of numpy to be installed automatically in the stored procedure, resulting in a numpy error on execution
- Registry: Fix a bug that leads to incorrect
Model is does not have _is_inference_apierror message when assigning a supported model as a property of a CustomModel. - Registry: Fix a bug that inference is not working when models with more than 500 input features are deployed to SPCS.
Behavior Change
-
Registry: With FeatureGroupSpec support, auto inferred model signature for
transformers.Pipelinemodels have been updated, including:-
Signature for fill-mask task has been changed from
ModelSignature( inputs=[ FeatureSpec(name="inputs", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )to
ModelSignature( inputs=[ FeatureSpec(name="inputs", dtype=DataType.STRING), ], outputs=[ FeatureGroupSpec( name="outputs", specs=[ FeatureSpec(name="sequence", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), FeatureSpec(name="token", dtype=DataType.INT64), FeatureSpec(name="token_str", dtype=DataType.STRING), ], shape=(-1,), ), ], ) -
Signature for token-classification task has been changed from
ModelSignature( inputs=[ FeatureSpec(name="inputs", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )to
ModelSignature( inputs=[FeatureSpec(name="inputs", dtype=DataType.STRING)], outputs=[ FeatureGroupSpec( name="outputs", specs=[ FeatureSpec(name="word", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), FeatureSpec(name="entity", dtype=DataType.STRING), FeatureSpec(name="index", dtype=DataType.INT64), FeatureSpec(name="start", dtype=DataType.INT64), FeatureSpec(name="end", dtype=DataType.INT64), ], shape=(-1,), ), ], ) -
Signature for question-answering task when top_k is larger than 1 has been changed from
ModelSignature( inputs=[ FeatureSpec(name="question", dtype=DataType.STRING), FeatureSpec(name="context", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )to
ModelSignature( inputs=[ FeatureSpec(name="question", dtype=DataType.STRING), FeatureSpec(name="context", dtype=DataType.STRING), ], outputs=[ FeatureGroupSpec( name="answers", specs=[ FeatureSpec(name="score", dtype=DataType.DOUBLE), FeatureSpec(name="start", dtype=DataType.INT64), FeatureSpec(name="end", dtype=DataType.INT64), FeatureSpec(name="answer", dtype=DataType.STRING), ], shape=(-1,), ), ], ) -
Signature for text-classification task when top_k is
Nonehas been changed fromModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), FeatureSpec(name="text_pair", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="label", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), ], )to
ModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="label", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), ], ) -
Signature for text-classification task when top_k is not
Nonehas been changed fromModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), FeatureSpec(name="text_pair", dtype=DataType.STRING), ], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )to
ModelSignature( inputs=[ FeatureSpec(name="text", dtype=DataType.STRING), ], outputs=[ FeatureGroupSpec( name="labels", specs=[ FeatureSpec(name="label", dtype=DataType.STRING), FeatureSpec(name="score", dtype=DataType.DOUBLE), ], shape=(-1,), ), ], ) -
Signature for text-generation task has been changed from
ModelSignature( inputs=[FeatureSpec(name="inputs", dtype=DataType.STRING)], outputs=[ FeatureSpec(name="outputs", dtype=DataType.STRING), ], )to
ModelSignature( inputs=[ FeatureGroupSpec( name="inputs", specs=[ FeatureSpec(name="role", dtype=DataType.STRING), FeatureSpec(name="content", dtype=DataType.STRING), ], shape=(-1,), ), ], outputs=[ FeatureGroupSpec( name="outputs", specs=[ FeatureSpec(name="generated_text", dtype=DataType.STRING), ], shape=(-1,), ) ], )
-
-
Registry: PyTorch and TensorFlow models now expect a single tensor input/output by default when logging to Model Registry. To use multiple tensors (previous behavior), set
options={"multiple_inputs": True}.Example with single tensor input:
import torch class TorchModel(torch.nn.Module): def __init__(self, n_input: int, n_hidden: int, n_out: int, dtype: torch.dtype = torch.float32) -> None: super().__init__() self.model = torch.nn.Sequential( torch.nn.Linear(n_input, n_hidden, dtype=dtype), torch.nn.ReLU(), torch.nn.Linear(n_hidden, n_out, dtype=dtype), torch.nn.Sigmoid(), ) def forward(self, tensor: torch.Tensor) -> torch.Tensor: return cast(torch.Tensor, self.model(tensor)) # Sample usage: data_x = torch.rand(size=(batch_size, n_input)) # Log model with single tensor reg.log_model( model=model, ..., sample_input_data=data_x ) # Run inference with single tensor mv.run(data_x)For multiple tensor inputs/outputs, use:
reg.log_model( model=model, ..., sample_input_data=[data_x_1, data_x_2], options={"multiple_inputs": True} ) -
Registry: Default
enable_explainabilityto False when the model can be deployed to Snowpark Container Services.
New Features
- Registry: Added support to single
torch.Tensor,tensorflow.Tensorandtensorflow.Variableas input or output data. - Registry: Support
xgboost.DMatrixdatatype for XGBoost models.
1.7.5 (03-06-2025)
- Support Python 3.12.
- Explainability: Support native and snowflake.ml.modeling sklearn pipeline
Bug Fixes
- Registry: Fixed a compatibility issue when using
snowflake-ml-python1.7.0 or greater to save atensorflow.kerasmodel withkeras2.x, ifrelax_versionis set or default to True, and newer version ofsnowflake-ml-pythonis available in Snowflake Anaconda Channel, model could not be run in Snowflake. If you have such model, you could use the latest version ofsnowflake-ml-pythonand callModelVersion.loadto load it back, and re-log it. Alternatively, you can prevent this issue by settingrelax_version=Falsewhen saving the model. - Registry: Removed the validation that disallows data that does not have non-null values being passed to
ModelVersion.run. - ML Job (PrPr): No longer require CREATE STAGE privilege if
stage_namepoints to an existing stage - ML Job (PrPr): Fixed a bug causing some payload source and entrypoint path
combinations to be erroneously rejected with
ValueError(f"{self.entrypoint} must be a subpath of {self.source}") - ML Job (PrPr): Fixed a bug in Ray cluster startup config which caused certain Runtime APIs to fail
New Features
- Registry: Added support for handling Hugging Face model configurations with auto-mapping functionality.
- Registry: Added support for
keras3.x model withtensorflowandpytorchbackend
1.7.4 (01-28-2025)
- FileSet: The
snowflake.ml.fileset.FileSethas been deprecated and will be removed in a future version. Use snowflake.ml.dataset.Dataset and snowflake.ml.data.DataConnector instead. - Registry:
ModelVersion.runon a service would require redeploying the service once account opts into nested function.
Bug Fixes
- Registry: Fixed an issue that the hugging face pipeline is loaded using incorrect dtype.
- Registry: Fixed an issue that only 1 row is used when infer the model signature in the modeling model.
New Features
- Add new
snowflake.ml.jobspreview API for running headless workloads on SPCS using Container Runtime for ML - Added
guardrailsoption to Cortexcompletefunction, enabling Cortex Guard support - Model Monitoring: Expose Model Monitoring Python API by default.
1.7.3 (2025-01-08)
- Added lowercase versions of Cortex functions, added deprecation warning to Capitalized versions.
- Bumped the requirements of
fsspecands3fsto>=2024.6.1,<2026 - Bumped the requirement of
mlflowto>=2.16.0, <3 - Registry: Support 500+ features for model registry
- Feature Store: Add support for
cluster_byfor feature views.
Bug Fixes
- Registry: Fixed a bug when providing non-range index pandas DataFrame as the input to a
ModelVersion.run. - Registry: Improved random model version name generation to prevent collisions.
- Registry: Fix an issue when inferring signature or running inference with Snowpark data that has a column whose type
is
ARRAYand containsNULLvalue. - Registry:
ModelVersion.runnow accepts fully qualified service name. - Monitoring: Fix issue in SDK with creating monitors using fully qualified names.
- Registry: Fix error in log_model for any sklearn models with only data pre-processing including pre-processing only pipeline models due to default explainability enablement.
New Features
- Added
user_filesargument toRegistry.log_modelfor including images or any extra file with the model. - Registry: Added support for handling Hugging Face model configurations with auto-mapping functionality
- DataConnector: Add new
DataConnector.from_sql()constructor - Registry: Provided new arguments to
snowflake.ml.model.model_signature.infer_signaturemethod to specify rows limit to be used when inferring the signature.
1.7.2 (2024-11-21)
Bug Fixes
- Model Explainability: Fix issue that explain is enabled for scikit-learn pipeline whose task is UNKNOWN and fails later when invoked.
New Features
- Registry: Support asynchronous model inference service creation with the
blockoption inModelVersion.create_service()set to True by default. - Registry: Allow specify
batch_sizewhen inferencing using sentence-transformers model.
1.7.1 (2024-11-05)
Bug Fixes
- Registry: Null value is now allowed in the dataframe used in model signature inference. Null values will be ignored and others will be used to infer the signature.
- Registry: Pandas Extension DTypes (
pandas.StringDType(),pandas.BooleanDType(), etc.) are now supported in model signature inference. - Registry: Null value is now allowed in the dataframe used to predict.
- Data: Fix missing
snowflake.ml.data.*module exports in wheel - Dataset: Fix missing
snowflake.ml.dataset.*module exports in wheel. - Registry: Fix the issue that
tf_keras.Modelis not recognized as keras model when logging.
New Features
- Registry: Option to
enable_monitoringset to False by default. This will gate access to preview features of Model Monitoring. - Model Monitoring:
show_model_monitorsRegistry method. This feature is still in Private Preview. - Registry: Support
pd.Seriesin input and output data. - Model Monitoring:
add_monitorRegistry method. This feature is still in Private Preview. - Model Monitoring:
resumeandsuspendModelMonitor. This feature is still in Private Preview. - Model Monitoring:
get_monitorRegistry method. This feature is still in Private Preview. - Model Monitoring:
delete_monitorRegistry method. This feature is still in Private Preview.
1.7.0 (10-22-2024)
Behavior Change
- Generic: Require python >= 3.9.
- Data Connector: Update
to_torch_datasetandto_torch_datapipeto add a dimension for scalar data. This allows for more seamless integration with PyTorchDataLoader, which creates batches by stacking inputs of each batch.
Examples:
ds = connector.to_torch_dataset(shuffle=False, batch_size=3)
-
Input: "col1": [10, 11, 12]
- Previous batch: array([10., 11., 12.]) with shape (3,)
- New batch: array([[10.], [11.], [12.]]) with shape (3, 1)
-
Input: "col2": [[0, 100], [1, 110], [2, 200]]
- Previous batch: array([[ 0, 100], [ 1, 110], [ 2, 200]]) with shape (3,2)
- New batch: No change
-
Model Registry: External access integrations are optional when creating a model inference service in Snowflake >= 8.40.0.
-
Model Registry: Deprecate
build_external_access_integrationwithbuild_external_access_integrationsinModelVersion.create_service().
Bug Fixes
- Registry: Updated
log_modelAPI to accept both signature and sample_input_data parameters. - Feature Store: ExampleHelper uses fully qualified path for table name. change weather features aggregation from 1d to 1h.
- Data Connector: Return numpy array with appropriate object type instead of list for multi-dimensional
data from
to_torch_datasetandto_torch_datapipe - Model explainability: Incompatibility between SHAP 0.42.1 and XGB 2.1.1 resolved by using latest SHAP 0.46.0.
New Features
- Registry: Provide pass keyworded variable length of arguments to class ModelContext. Example usage:
mc = custom_model.ModelContext(
config = 'local_model_dir/config.json',
m1 = model1
)
class ExamplePipelineModel(custom_model.CustomModel):
def __init__(self, context: custom_model.ModelContext) -> None:
super().__init__(context)
v = open(self.context['config']).read()
self.bias = json.loads(v)['bias']
@custom_model.inference_api
def predict(self, input: pd.DataFrame) -> pd.DataFrame:
model_output = self.context['m1'].predict(input)
return pd.DataFrame({'output': model_output + self.bias})
- Model Development: Upgrade scikit-learn in UDTF backend for log_loss metric. As a result,
epsargument is now ignored. - Data Connector: Add the option of passing a
Nonesized batch toto_torch_datasetfor better interoperability with PyTorch DataLoader. - Model Registry: Support pandas.CategoricalDtype
- Limitations:
- The native categorical data handling handling by XGBoost using
enable_categorical=Trueis not supported. Instead please usesklearn.pipelineto preprocess the categorical datatype and log the pipeline with the XGBoost model.
- The native categorical data handling handling by XGBoost using
- Limitations:
- Registry: It is now possible to pass
signaturesandsample_input_dataat the same time to capture background data from explainablity and data lineage.
1.6.4 (2024-10-17)
Bug Fixes
- Registry: Fix an issue that leads to incident when using
ModelVersion.runwith service.
1.6.3 (2024-10-07)
- Model Registry (PrPr) has been removed.
Bug Fixes
- Registry: Fix a bug that when package whose name does not follow PEP-508 is provided when logging the model, an unexpected normalization is happening.
- Registry: Fix
not a valid remote urierror when logging mlflow models. - Registry: Fix a bug that
ModelVersion.runis called in a nested way. - Registry: Fix an issue that leads to
log_modelfailure when local package version contains parts other than base version. - Fix issue where
sample_weightswere not being applied to search estimators. - Model explainability: Fix bug which creates explain as a function instead of table function when enabling by default.
- Model explainability: Update lightgbm binary classification to return non-json values, from customer feedback.
New Features
- Data: Improve
DataConnector.to_pandas()performance when loading from Snowpark DataFrames. - Model Registry: Allow users to set a model task while using
log_model. - Feature Store: FeatureView supports ON_CREATE or ON_SCHEDULE initialize mode.
1.6.2 (2024-09-04)
Bug Fixes
-
Modeling: Support XGBoost version that is larger than 2.
-
Data: Fix multiple epoch iteration over
DataConnector.to_torch_datapipe()DataPipes. -
Generic: Fix a bug that when an invalid name is provided to argument where fully qualified name is expected, it will be parsed wrongly. Now it raises an exception correctly.
-
Model Explainability: Handle explanations for multiclass XGBoost classification models
-
Model Explainability: Workarounds and better error handling for XGB>2.1.0 not working with SHAP==0.42.1
New Features
- Data: Add top-level exports for
DataConnectorandDataSourcetosnowflake.ml.data. - Data: Add native batching support via
batch_sizeanddrop_last_batcharguments toDataConnector.to_torch_dataset() - Feature Store: update_feature_view() supports taking feature view object as argument.
1.6.1 (2024-08-12)
Bug Fixes
- Feature Store: Support large metadata blob when generating dataset
- Feature Store: Added a hidden knob in FeatureView as kargs for setting customized refresh_mode
- Registry: Fix an error message in Model Version
runwhenfunction_nameis not mentioned and model has multiple target methods. - Cortex inference: snowflake.cortex.Complete now only uses the REST API for streaming and the use_rest_api_experimental is no longer needed.
- Feature Store: Add a new API: FeatureView.list_columns() which list all column information.
- Data: Fix
DataFrameingestion withArrowIngestor.
New Features
- Enable
set_paramsto set the parameters of the underlying sklearn estimator, if the snowflake-ml model has been fit. - Data: Add
snowflake.ml.data.ingestor_utilsmodule with utility functions helpful forDataIngestorimplementations. - Data: Add new
to_torch_dataset()connector toDataConnectorto replace deprecated DataPipe. - Registry: Option to
enable_explainabilityset to True by default for XGBoost, LightGBM and CatBoost as PuPr feature. - Registry: Option to
enable_explainabilitywhen registering SHAP supported sklearn models.
1.6.0 (2024-07-29)
Bug Fixes
- Modeling:
SimpleImputercan impute integer columns with integer values. - Registry: Fix an issue when providing a pandas Dataframe whose index is not starting from 0 as the input to
the
ModelVersion.run.
New Features
- Feature Store: Add overloads to APIs accept both object and name/version. Impacted APIs include read_feature_view(), refresh_feature_view(), get_refresh_history(), resume_feature_view(), suspend_feature_view(), delete_feature_view().
- Feature Store: Add docstring inline examples for all public APIs.
- Feature Store: Add new utility class
ExampleHelperto help with load source data to simplify public notebooks. - Registry: Option to
enable_explainabilitywhen registering XGBoost models as a pre-PuPr feature. - Feature Store: add new API
update_entity(). - Registry: Option to
enable_explainabilitywhen registering Catboost models as a pre-PuPr feature. - Feature Store: Add new argument warehouse to FeatureView constructor to overwrite the default warehouse. Also add a new column 'warehouse' to the output of list_feature_views().
- Registry: Add support for logging model from a model version.
- Modeling: Distributed Hyperparameter Optimization now announce GA refresh version. The latest memory efficient version
will not have the 10GB training limitation for dataset any more. To turn off, please run
from snowflake.ml.modeling._internal.snowpark_implementations import ( distributed_hpo_trainer, ) distributed_hpo_trainer.ENABLE_EFFICIENT_MEMORY_USAGE = False - Registry: Option to
enable_explainabilitywhen registering LightGBM models as a pre-PuPr feature. - Data: Add new
snowflake.ml.datapreview module which contains data reading utilities likeDataConnectorDataConnectorprovides efficient connectors from SnowparkDataFrameand Snowpark MLDatasetto external frameworks like PyTorch, TensorFlow, and Pandas. CreateDataConnectorinstances using the classmethod constructorsDataConnector.from_dataset()andDataConnector.from_dataframe().
- Data: Add new
DataConnector.from_sources()classmethod constructor for constructing fromDataSourceobjects. - Data: Add new
ingestor_classarg toDataConnectorclassmethod constructors for easierDataIngestorinjection. - Dataset:
DatasetReadernow subclasses newDataConnectorclass.- Add optional
limitarg toDatasetReader.to_pandas()
- Add optional
Behavior Changes
- Feature Store: change some positional parameters to keyword arguments in following APIs:
- Entity(): desc.
- FeatureView(): timestamp_col, refresh_freq, desc.
- FeatureStore(): creation_mode.
- update_entity(): desc.
- register_feature_view(): block, overwrite.
- list_feature_views(): entity_name, feature_view_name.
- get_refresh_history(): verbose.
- retrieve_feature_values(): spine_timestamp_col, exclude_columns, include_feature_view_timestamp_col.
- generate_training_set(): save_as, spine_timestamp_col, spine_label_cols, exclude_columns, include_feature_view_timestamp_col.
- generate_dataset(): version, spine_timestamp_col, spine_label_cols, exclude_columns, include_feature_view_timestamp_col, desc, output_type.
1.5.4 (2024-07-11)
Bug Fixes
- Model Registry (PrPr): Fix 401 Unauthorized issue when deploying model to SPCS.
- Feature Store: Downgrades exceptions to warnings for few property setters in feature view. Now you can set desc, refresh_freq and warehouse for draft feature views.
- Modeling: Fix an issue with calling
OrdinalEncoderwithcategoriesas a dictionary and a pandas DataFrame - Modeling: Fix an issue with calling
OneHotEncoderwithcategoriesas a dictionary and a pandas DataFrame
New Features
- Registry: Allow overriding
device_mapanddevicewhen loading huggingface pipeline models. - Registry: Add
set_aliasmethod toModelVersioninstance to set an alias to model version. - Registry: Add
unset_aliasmethod toModelVersioninstance to unset an alias to model version. - Registry: Add
partitioned_inference_apiallowing users to create partitioned inference functions in registered models. Enable model inference methods with table functions with vectorized process methods in registered models. - Feature Store: add 3 more columns: refresh_freq, refresh_mode and scheduling_state to the result of
list_feature_views(). - Feature Store:
update_feature_view()supports updating description. - Feature Store: add new API
refresh_feature_view(). - Feature Store: add new API
get_refresh_history(). - Feature Store: Add
generate_training_set()API for generating table-backed feature snapshots. - Feature Store: Add
DeprecationWarningforgenerate_dataset(..., output_type="table"). - Feature Store:
update_feature_view()supports updating description. - Feature Store: add new API
refresh_feature_view(). - Feature Store: add new API
get_refresh_history(). - Model Development: OrdinalEncoder supports a list of array-likes for
categoriesargument. - Model Development: OneHotEncoder supports a list of array-likes for
categoriesargument.
1.5.3 (06-17-2024)
Bug Fixes
- Modeling: Fix an issue causing lineage information to be missing for
Pipeline,GridSearchCV,SimpleImputer, andRandomizedSearchCV - Registry: Fix an issue that leads to incorrect result when using pandas Dataframe with over 100, 000 rows as the input
of
ModelVersion.runmethod in Stored Procedure.
New Features
- Registry: Add support for TIMESTAMP_NTZ model signature data type, allowing timestamp input and output.
- Dataset: Add
DatasetVersion.label_colsandDatasetVersion.exclude_colsproperties.
1.5.2 (06-10-2024)
Bug Fixes
- Registry: Fix an issue that leads to unable to log model in store procedure.
- Modeling: Quick fix
import snowflake.ml.modeling.parameters.enable_anonymous_sproccannot be imported due to package dependency error.
1.5.1 (05-22-2024)
Bug Fixes
- Dataset: Fix
snowflake.connector.errors.DataError: Query Result did not match expected number of rowswhen accessing DatasetVersion properties when case insensitiveSHOW VERSIONS IN DATASETcheck matches multiple version names. - Dataset: Fix bug in SnowFS bulk file read when used with DuckDB
- Registry: Fixed a bug when loading old models.
- Lineage: Fix Dataset source lineage propagation through
snowpark.DataFrametransformations
Behavior Changes
- Feature Store: convert clear() into a private function. Also make it deletes feature views and entities only.
- Feature Store: Use NULL as default value for timestamp tag value.
New Features
- Feature Store: Added new
snowflake.ml.feature_store.setup_feature_store()API to assist Feature Store RBAC setup. - Feature Store: Add
output_typeargument toFeatureStore.generate_dataset()to allow generating data snapshots as Datasets or Tables. - Registry:
log_model,get_model,delete_modelnow supports fully qualified name. - Modeling: Supports anonymous stored procedure during fit calls so that modeling would not require sufficient
permissions to operate on schema. Please call
import snowflake.ml.modeling.parameters.enable_anonymous_sproc # noqa: F401
1.5.0 (05-01-2024)
Bug Fixes
- Registry: Fix invalid parameter 'SHOW_MODEL_DETAILS_IN_SHOW_VERSIONS_IN_MODEL' error.
Behavior Changes
- Model Development: The behavior of
fit_transformfor all estimators is changed. Firstly, it will cover all the estimator that contains this function, secondly, the output would be the union of pandas DataFrame and snowpark DataFrame.
Model Registry (PrPr)
snowflake.ml.registry.artifact and related snowflake.ml.model_registry.ModelRegistry APIs have been removed.
- Removed
snowflake.ml.registry.artifactmodule. - Removed
ModelRegistry.log_artifact(),ModelRegistry.list_artifacts(),ModelRegistry.get_artifact() - Removed
artifactsargument fromModelRegistry.log_model()
Dataset (PrPr)
snowflake.ml.dataset.Dataset has been redesigned to be backed by Snowflake Dataset entities.
- New
Datasets can be created withDataset.create()and existingDatasets may be loaded withDataset.load(). Datasets now maintain an immutableselected_versionstate. TheDataset.create_version()andDataset.load_version()APIs return newDatasetobjects with the requestedselected_versionstate.- Added
dataset.create_from_dataframe()anddataset.load_dataset()convenience APIs as a shortcut to creating and loadingDatasets with a pre-selected version. Dataset.materialized_tableandDataset.snapshot_tableno longer exist withDataset.fully_qualified_nameas the closest equivalent.Dataset.dfno longer exists. Instead, useDatasetReader.read.to_snowpark_dataframe().Dataset.ownerhas been moved toDataset.selected_version.ownerDataset.deschas been moved toDatasetVersion.selected_version.commentDataset.timestamp_col,Dataset.label_cols,Dataset.feature_store_metadata, andDataset.schema_versionhave been removed.
Feature Store (PrPr)
-
FeatureStore.generate_datasetargument list has been changed to match the newsnowflake.ml.dataset.Datasetdefinitionmaterialized_tablehas been removed and replaced withnameandversion.namemoved to first positional argumentsave_modehas been removed asmergebehavior is no longer supported. The new behavior is alwayserrorifexists.
-
Change feature view version type from str to
FeatureViewVersion. It is a restricted string literal. -
Remove as_dataframe arg from FeatureStore.list_feature_views(), now always returns result as DataFrame.
-
Combines few metadata tags into a new tag: SNOWML_FEATURE_VIEW_METADATA. This will make previously created feature views not readable by new SDK.
New Features
- Registry: Add
exportmethod toModelVersioninstance to export model files. - Registry: Add
loadmethod toModelVersioninstance to load the underlying object from the model. - Registry: Add
Model.renamemethod toModelinstance to rename or move a model.
Dataset (PrPr)
- Added Snowpark DataFrame integration using
Dataset.read.to_snowpark_dataframe() - Added Pandas DataFrame integration using
Dataset.read.to_pandas() - Added PyTorch and TensorFlow integrations using
Dataset.read.to_torch_datapipe()andDataset.read.to_tf_dataset()respectively. - Added
fsspecstyle file integration usingDataset.read.files()andDataset.read.filesystem()
Feature Store
- use new tag_reference_internal to speed up metadata lookup.
1.4.1 (2024-04-18)
New Features
- Registry: Add support for
catboostmodel (catboost.CatBoostClassifier,catboost.CatBoostRegressor). - Registry: Add support for
lightgbmmodel (lightgbm.Booster,lightgbm.LightGBMClassifier,lightgbm.LightGBMRegressor).
Bug Fixes
- Registry: Fix a bug that leads to relax_version option is not working.
Behavior changes
- Feature Store: update_feature_view takes refresh_freq and warehouse as argument.
1.4.0 (2024-04-08)
Bug Fixes
- Registry: Fix a bug when multiple models are being called from the same query, models other than the first one will have incorrect result. This fix only works for newly logged model.
- Modeling: When registering a model, only method(s) that is mentioned in
save_modelwould be added to model signature in SnowML models. - Modeling: Fix a bug that when n_jobs is not 1, model cannot execute methods such as predict, predict_log_proba, and other batch inference methods. The n_jobs would automatically set to 1 because vectorized udf currently doesn't support joblib parallel backend.
- Modeling: Fix a bug that batch inference methods cannot infer the datatype when the first row of data contains NULL.
- Modeling: Matches Distributed HPO output column names with the snowflake identifier.
- Modeling: Relax package versions for all Distributed HPO methods if the installed version is not available in the Snowflake conda channel
- Modeling: Add sklearn as required dependency for LightGBM package.
Behavior Changes
- Registry:
applymethod is no longer by default logged when logging a xgboost model. If that is required, it could be specified manually when logging the model bylog_model(..., options={"target_methods": ["apply", ...]}). - Feature Store: register_entity returns an entity object.
- Feature Store: register_feature_view
block=truebecomes default.
New Features
- Registry: Add support for
sentence-transformersmodel (sentence_transformers.SentenceTransformer). - Registry: Now version name is no longer required when logging a model. If not provided, a random human readable ID will be generated.
1.3.1 (2024-03-21)
New Features
- FileSet:
snowflake.ml.fileset.sfcfs.SFFileSystemcan now be used in UDFs and stored procedures.
1.3.0 (2024-03-12)
Bug Fixes
- Registry: Fix a bug that leads to module in
code_pathswhenlog_modelcannot be correctly imported. - Registry: Fix incorrect error message when validating input Snowpark DataFrame with array feature.
- Model Registry: Fix an issue when deploying a model to SPCS that some files do not have proper permission.
- Model Development: Relax package versions for all inference methods if the installed version is not available in the Snowflake conda channel
Behavior Changes
- Registry: When running the method of a model, the value range based input validation to avoid input from overflowing
is now optional rather than enforced, this should improve the performance and should not lead to problem for most
kinds of model. If you want to enable this check as previous, specify
strict_input_validation=Truewhen callingrun. - Registry: By default
relax_version=Truewhen logging a model instead of using the specific local dependency versions. This improves dependency versioning by using versions available in Snowflake. To switch back to the previous behavior and use specific local dependency versions, specifyrelax_version=Falsewhen callinglog_model. - Model Development: The behavior of
fit_predictfor all estimators is changed. Firstly, it will cover all the estimator that contains this function, secondly, the output would be the union of pandas DataFrame and snowpark DataFrame.
New Features
- FileSet:
snowflake.ml.fileset.sfcfs.SFFileSystemcan now be serialized withpickle.
1.2.3 (2024-02-26)
Bug Fixes
- Registry: Now when providing Decimal Type column to a DOUBLE or FLOAT feature will not error out but auto cast with warnings.
- Registry: Improve the error message when specifying currently unsupported
pip_requirementsargument. - Model Development: Fix precision_recall_fscore_support incorrect results when
average="samples". - Model Registry: Fix an issue that leads to description, metrics or tags are not correctly returned in newly created Model Registry (PrPr) due to Snowflake BCR 2024_01
Behavior Changes
- Feature Store:
FeatureStore.suspend_feature_viewandFeatureStore.resume_feature_viewdoesn't mutate input feature view argument any more. The updated status only reflected in the returned feature view object.
New Features
- Model Development: support
score_samplesmethod for all the classes, including Pipeline, GridSearchCV, RandomizedSearchCV, PCA, IsolationForest, ... - Registry: Support deleting a version of a model.
1.2.2 (2024-02-13)
New Features
- Model Registry: Support providing external access integrations when deploying a model to SPCS. This will help and be
required to make sure the deploying process work as long as SPCS will by default deny all network connections. The
following endpoints must be allowed to make deployment work: docker.com:80, docker.com:443, anaconda.com:80,
anaconda.com:443, anaconda.org:80, anaconda.org:443, pypi.org:80, pypi.org:443. If you are using
snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModelobject, the following endpoints are required to be allowed: huggingface.com:80, huggingface.com:443, huggingface.co:80, huggingface.co:443.
1.2.1 (2024-01-25)
New Features
- Model Development: Infers output column data type for transformers when possible.
- Registry:
relax_versionoption is available in theoptionsargument when logging the model.
1.2.0 (2024-01-11)
Bug Fixes
- Model Registry: Fix "XGBoost version not compiled with GPU support" error when running CPU inference against open-source XGBoost models deployed to SPCS.
- Model Registry: Fix model deployment to SPCS on Windows machines.
New Features
- Model Development: Introduced XGBoost external memory training feature. This feature enables training XGBoost models on large datasets that don't fit into memory.
- Registry: New Registry class named
snowflake.ml.registry.Registryproviding similar APIs as the old one but works with new MODEL object in Snowflake SQL. Also, we are providingsnowflake.ml.model.Modelandsnowflake.ml.model.ModelVersionto represent a model and a specific version of a model. - Model Development: Add support for
fit_predictmethod inAgglomerativeClustering,DBSCAN, andOPTICSclasses; - Model Development: Add support for
fit_transformmethod inMDS,SpectralEmbeddingandTSNEclass.
Additional Notes
- Model Registry: The
snowflake.ml.registry.model_registry.ModelRegistryhas been deprecated starting from version 1.2.0. It will stay in the Private Preview phase. For future implementations, kindly utilizesnowflake.ml.registry.Registry, except when specifically required. The old model registry will be removed once all its primary functionalities are fully integrated into the new registry.
1.1.2 (2023-12-18)
Bug Fixes
- Generic: Fix the issue that stack trace is hidden by telemetry unexpectedly.
- Model Development: Execute model signature inference without materializing full dataframe in memory.
- Model Registry: Fix occasional 'snowflake-ml-python library does not exist' error when deploying to SPCS.
Behavior Changes
- Model Registry: When calling
predictwith Snowpark DataFrame, both inferred or normalized column names are accepted. - Model Registry: When logging a Snowpark ML Modeling Model, sample input data or manually provided signature will be ignored since they are not necessary.
New Features
- Model Development: SQL implementation of binary
precision_scoremetric.
1.1.1 (2023-12-05)
Bug Fixes
- Model Registry: The
predicttarget method on registered models is now compatible with unsupervised estimators. - Model Development: Fix confusion_matrix incorrect results when the row number cannot be divided by the batch size.
New Features
- Introduced passthrough_col param in Modeling API. This new param is helpful in scenarios requiring automatic input_cols inference, but need to avoid using specific columns, like index columns, during training or inference.
1.1.0 (2023-12-01)
Bug Fixes
- Model Registry: Fix panda dataframe input not handling first row properly.
- Model Development: OrdinalEncoder and LabelEncoder output_columns do not need to be valid snowflake identifiers. They would previously be excluded if the normalized name did not match the name specified in output_columns.
New Features
- Model Registry: Add support for invoking public endpoint on SPCS service, by providing a "enable_ingress" SPCS deployment option.
- Model Development: Add support for distributed HPO - GridSearchCV and RandomizedSearchCV execution will be distributed on multi-node warehouses.
1.0.12 (2023-11-13)
Bug Fixes
- Model Registry: Fix regression issue that container logging is not shown during model deployment to SPCS.
- Model Development: Enhance the column capacity of OrdinalEncoder.
- Model Registry: Fix unbound
batch_sizeerror when deploying a model other than Hugging Face Pipeline and LLM with GPU on SPCS.
Behavior Changes
- Model Registry: Raise early error when deploying to SPCS with db/schema that starts with underscore.
- Model Registry:
conda-forgechannel is now automatically added to channel lists when deploying to SPCS. - Model Registry:
relax_versionwill not strip all version specifier, instead it will relax==x.y.zspecifier to>=x.y,<(x+1). - Model Registry: Python with different patchlevel but the same major and minor will not result a warning when loading the model via Model Registry and would be considered to use when deploying to SPCS.
- Model Registry: When logging a
snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModelobject, versions of local installed libraries won't be picked as dependencies of models, instead it will pick up some pre- defined dependencies to improve user experience.
New Features
- Model Registry: Enable best-effort SPCS job/service log streaming when logging level is set to INFO.
1.0.11 (2023-10-27)
New Features
- Model Registry: Add log_artifact() public method.
- Model Development: Add support for
kneighbors.
Behavior Changes
- Model Registry: Change log_model() argument from TrainingDataset to List of Artifact.
- Model Registry: Change get_training_dataset() to get_artifact().
Bug Fixes
- Model Development: Fix support for XGBoost and LightGBM models using SKLearn Grid Search and Randomized Search model selectors.
- Model Development: DecimalType is now supported as a DataType.
- Model Development: Fix metrics compatibility with Snowpark Dataframes that use Snowflake identifiers
- Model Registry: Resolve 'delete_deployment' not deleting the SPCS service in certain cases.
1.0.10 (2023-10-13)
Behavior Changes
- Model Development: precision_score, recall_score, f1_score, fbeta_score, precision_recall_fscore_support, mean_absolute_error, mean_squared_error, and mean_absolute_percentage_error metric calculations are now distributed.
- Model Registry:
deploywill now returnDeploymentfor deployment information.
New Features
- Model Registry: When the model signature is auto-inferred, it will be printed to the log for reference.
- Model Registry: For SPCS deployment,
Deploymentdetails will containsimage_name,service_specandservice_function_sql.
Bug Fixes
- Model Development: Fix an issue that leading to UTF-8 decoding errors when using modeling modules on Windows.
- Model Development: Fix an issue that alias definitions cause
SnowparkSQLUnexpectedAliasExceptionin inference. - Model Registry: Fix an issue that signature inference could be incorrect when using Snowpark DataFrame as sample input.
- Model Registry: Fix too strict data type validation when predicting. Now, for example, if you have a INT8 type feature in the signature, if providing a INT64 dataframe but all values are within the range, it would not fail.
1.0.9 (2023-09-28)
Behavior Changes
- Model Development: log_loss metric calculation is now distributed.
Bug Fixes
- Model Registry: Fix an issue that building images fails with specific docker setup.
- Model Registry: Fix an issue that unable to embed local ML library when the library is imported by
zipimport. - Model Registry: Fix out-of-date doc about
platformargument in thedeployfunction. - Model Registry: Fix an issue that unable to deploy a GPU-trained PyTorch model to a platform where GPU is not available.
1.0.8 (2023-09-15)
Bug Fixes
- Model Development: Ordinal encoder can be used with mixed input column types.
- Model Development: Fix an issue when the sklearn default value is
np.nan. - Model Registry: Fix an issue that incorrect docker executable is used when building images.
- Model Registry: Fix an issue that specifying
tokenargument when usingsnowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModelwithtransformers < 4.32.0is not effective. - Model Registry: Fix an issue that incorrect system function call is used when deploying to SPCS.
- Model Registry: Fix an issue when using a
transformers.pipelinethat does not have atokenizer. - Model Registry: Fix incorrectly-inferred image repository name during model deployment to SPCS.
- Model Registry: Fix GPU resource retention issue caused by failed or stuck previous deployments in SPCS.
1.0.7 (2023-09-05)
Bug Fixes
- Model Development & Model Registry: Fix an error related to
pandas.io.json.json_normalize. - Allow disabling telemetry.
1.0.6 (2023-09-01)
New Features
- Model Registry: add
create_if_not_existsparameter in constructor. - Model Registry: Added get_or_create_model_registry API.
- Model Registry: Added support for using GPU inference when deploying XGBoost (
xgboost.XGBModelandxgboost.Booster), PyTorch (torch.nn.Moduleandtorch.jit.ScriptModule) and TensorFlow (tensorflow.Moduleandtensorflow.keras.Model) models to Snowpark Container Services. - Model Registry: When inferring model signature,
Sequenceof built-in types,Sequenceofnumpy.ndarray,Sequenceoftorch.Tensor,Sequenceoftensorflow.TensorandSequenceoftensorflow.Tensorcan be used instead of onlyListof them. - Model Registry: Added
get_training_datasetAPI. - Model Development: Size of metrics result can exceed previous 8MB limit.
- Model Registry: Added support save/load/deploy HuggingFace pipeline object (
transformers.Pipeline) and our wrapper (snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel) to it. Using the wrapper to specify configurations and the model for the pipeline will be loaded dynamically when deploying. Currently, following tasks are supported to log without manually specifying model signatures:- "conversational"
- "fill-mask"
- "question-answering"
- "summarization"
- "table-question-answering"
- "text2text-generation"
- "text-classification" (alias "sentiment-analysis" available)
- "text-generation"
- "token-classification" (alias "ner" available)
- "translation"
- "translation_xx_to_yy"
- "zero-shot-classification"
Bug Fixes
- Model Development: Fixed a bug when using simple imputer with numpy >= 1.25.
- Model Development: Fixed a bug when inferring the type of label columns.
Behavior Changes
- Model Registry:
log_model()now return aModelReferenceobject instead of a model ID. - Model Registry: When deploying a model with 1
target methodonly, thetarget_methodargument can be omitted. - Model Registry: When using the snowflake-ml-python with version newer than what is available in Snowflake Anaconda
Channel,
embed_local_ml_libraryoption will be set asTrueautomatically if not. - Model Registry: When deploying a model to Snowpark Container Services and using GPU, the default value of num_workers will be 1.
- Model Registry:
keep_orderandoutput_with_input_featuresin the deploy options have been removed. Now the behavior is controlled by the type of the input when callingmodel.predict(). If the input is apandas.DataFrame, the behavior will be the same askeep_order=Trueandoutput_with_input_features=Falsebefore. If the input is asnowpark.DataFrame, the behavior will be the same askeep_order=Falseandoutput_with_input_features=Truebefore. - Model Registry: When logging and deploying PyTorch (
torch.nn.Moduleandtorch.jit.ScriptModule) and TensorFlow (tensorflow.Moduleandtensorflow.keras.Model) models, we no longer accept models whose input is a list of tensor and output is a list of tensors. Instead, now we accept models whose input is 1 or more tensors as positional arguments, and output is a tensor or a tuple of tensors. The input and output dataframe when predicting keep the same as before, that is every column is an array feature and contains a tensor.
1.0.5 (2023-08-17)
New Features
- Model Registry: Added support save/load/deploy xgboost Booster model.
- Model Registry: Added support to get the model name and the model version from model references.
Bug Fixes
- Model Registry: Restore the db/schema back to the session after
create_model_registry(). - Model Registry: Fixed an issue that the UDF name created when deploying a model is not identical to what is provided and cannot be correctly dropped when deployment getting dropped.
- connection_params.SnowflakeLoginOptions(): Added support for
private_key_path.
1.0.4 (2023-07-28)
New Features
- Model Registry: Added support save/load/deploy Tensorflow models (
tensorflow.Module). - Model Registry: Added support save/load/deploy MLFlow PyFunc models (
mlflow.pyfunc.PyFuncModel). - Model Development: Input dataframes can now be joined against data loaded from staged files.
- Model Development: Added support for non-English languages.
Bug Fixes
- Model Registry: Fix an issue that model dependencies are incorrectly reported as unresolvable on certain platforms.
1.0.3 (2023-07-14)
Behavior Changes
- Model Registry: When predicting a model whose output is a list of NumPy ndarray, the output would not be flattened, instead, every ndarray will act as a feature(column) in the output.
New Features
- Model Registry: Added support save/load/deploy PyTorch models (
torch.nn.Moduleandtorch.jit.ScriptModule).
Bug Fixes
- Model Registry: Fix an issue that when database or schema name provided to
create_model_registrycontains special characters, the model registry cannot be created. - Model Registry: Fix an issue that
get_model_descriptionreturns with additional quotes. - Model Registry: Fix incorrect error message when attempting to remove a unset tag of a model.
- Model Registry: Fix a typo in the default deployment table name.
- Model Registry: Snowpark dataframe for sample input or input for
predictmethod that contains a column with SnowflakeNUMBER(precision, scale)data type wherescale = 0will not lead to error, and will now correctly recognized asINT64data type in model signature. - Model Registry: Fix an issue that prevent model logged in the system whose default encoding is not UTF-8 compatible from deploying.
- Model Registry: Added earlier and better error message when any file name in the model or the file name of model itself contains characters that are unable to be encoded using ASCII. It is currently not supported to deploy such a model.
1.0.2 (2023-06-22)
Behavior Changes
- Model Registry: Prohibit non-snowflake-native models from being logged.
- Model Registry:
_use_local_snowmlparameter in options ofdeploy()has been removed. - Model Registry: A default
Falseembed_local_ml_libraryparameter has been added to the options oflog_model(). With this set toFalse(default), the version of the local snowflake-ml-python library will be recorded and used when deploying the model. With this set toTrue, local snowflake-ml-python library will be embedded into the logged model, and will be used when you load or deploy the model.
New Features
- Model Registry: A new optional argument named
code_pathshas been added to the arguments oflog_model()for users to specify additional code paths to be imported when loading and deploying the model. - Model Registry: A new optional argument named
optionshas been added to the arguments oflog_model()to specify any additional options when saving the model. - Model Development: Added metrics:
- d2_absolute_error_score
- d2_pinball_score
- explained_variance_score
- mean_absolute_error
- mean_absolute_percentage_error
- mean_squared_error
Bug Fixes
- Model Development:
accuracy_score()now works when given label column names are lists of a single value.
1.0.1 (2023-06-16)
Behavior Changes
- Model Development: Changed Metrics APIs to imitate sklearn metrics modules:
accuracy_score(),confusion_matrix(),precision_recall_fscore_support(),precision_score()methods move from respective modules tometrics.classification.
- Model Registry: The default table/stage created by the Registry now uses "SYSTEM" as a prefix.
- Model Registry:
get_model_history()method as been enhanced to include the history of model deployment.
New Features
- Model Registry: A default
Falseflag namedreplace_udfhas been added to the options ofdeploy(). Setting this toTruewill allow overwrite existing UDF with the same name when deploying. - Model Development: Added metrics:
- f1_score
- fbeta_score
- recall_score
- roc_auc_score
- roc_curve
- log_loss
- precision_recall_curve
- Model Registry: A new argument named
permanenthas been added to the argument ofdeploy(). Setting this toTrueallows the creation of a permanent deployment without needing to specify the UDF location. - Model Registry: A new method
list_deployments()has been added to enumerate all permanent deployments originating from a specific model. - Model Registry: A new method
get_deployment()has been added to fetch a deployment by its deployment name. - Model Registry: A new method
delete_deployment()has been added to remove an existing permanent deployment.
1.0.0 (2023-06-09)
Behavior Changes
- Model Registry:
predict()method moves from Registry to ModelReference. - Model Registry:
_snowml_wheel_pathparameter in options ofdeploy(), is replaced with_use_local_snowmlwith default value ofFalse. Setting this toTruewill have the same effect of uploading local SnowML code when executing model in the warehouse. - Model Registry: Removed
idfield fromModelReferenceconstructor. - Model Development: Preprocessing and Metrics move to the modeling package:
snowflake.ml.modeling.preprocessingandsnowflake.ml.modeling.metrics. - Model Development:
get_sklearn_object()method is renamed toto_sklearn(),to_xgboost(), andto_lightgbm()for respective native models.
New Features
- Added PolynomialFeatures transformer to the snowflake.ml.modeling.preprocessing module.
- Added metrics:
- accuracy_score
- confusion_matrix
- precision_recall_fscore_support
- precision_score
Bug Fixes
- Model Registry: Model version can now be any string (not required to be a valid identifier)
- Model Deployment:
deploy()&predict()methods now correctly escapes identifiers
0.3.2 (2023-05-23)
Behavior Changes
- Use cloudpickle to serialize and deserialize models throughout the codebase and removed dependency on joblib.
New Features
- Model Deployment: Added support for snowflake.ml models.
0.3.1 (2023-05-18)
Behavior Changes
- Standardized registry API with following
- Create & open registry taking same set of arguments
- Create & Open can choose schema to use
- Set_tag, set_metric, etc now explicitly calls out arg name as metric_name, tag_name, metric_name, etc.
New Features
- Changes to support python 3.9, 3.10
- Added kBinsDiscretizer
- Support for deployment of XGBoost models & int8 types of data
0.3.0 (2023-05-11)
Behavior Changes
- Big Model Registry Refresh
- Fixed API discrepancies between register_model & log_model.
- Model can be referred by Name + Version (no opaque internal id is required)
New Features
- Model Registry: Added support save/load/deploy SKL & XGB Models
0.2.3 (2023-04-27)
Bug Fixes
- Allow using OneHotEncoder along with sklearn style estimators in a pipeline.
New Features
- Model Registry: Added support for delete_model. Use delete_artifact = False to not delete the underlying model data but just unregister.
0.2.2 (2023-04-11)
New Features
- Initial version of snowflake-ml modeling package.
- Provide support for training most of scikit-learn and xgboost estimators and transformers.
Bug Fixes
- Minor fixes in preprocessing package.
0.2.1 (2023-03-23)
New Features
- New in Preprocessing:
- SimpleImputer
- Covariance Matrix
- Optimization of Ordinal Encoder client computations.
Bug Fixes
- Minor fixes in OneHotEncoder.
0.2.0 (2023-02-27)
New Features
- Model Registry
- PyTorch & Tensorflow connector file generic FileSet API
- New to Preprocessing:
- Binarizer
- Normalizer
- Pearson correlation Matrix
- Optimization in Ordinal Encoder to cache vocabulary in temp tables.
0.1.3 (2023-02-02)
New Features
- Initial version of transformers including:
- Label Encoder
- Max Abs Scaler
- Min Max Scaler
- One Hot Encoder
- Ordinal Encoder
- Robust Scaler
- Standard Scaler