Machine learning classes

class o2sclpy.bgmm_sklearn

Use scikit-learn to generate a Bayesian Gaussian mixture model of a specified set of data.

This is an experimental interface to provide easier interaction with C++.

components(v)

For a point (or set of points) specified in v, use the Gaussian mixture at to compute the density (or densities) of each component as a contiguous numpy array. Each array will have entries which sum to 1.

get_data()

Return the properties of the Gaussian mixture model as contiguous numpy arrays. This function returns, in order, the weights, the means, the covariances, the precisions (the inverse of the covariances), and the Cholesky decomposition of the precisions.

log_pdf(x)

Return the per-sample average log likelihood of the data as a single floating point value given the vector or vectors specified in x.

o2graph_to_bgmm(o2scl, amp, link, args)

The function providing the ‘to-bgmm’ command for o2graph.

predict(v)

Predict the labels (the index of the Gaussian) given a vector or vectors v and return them in a one-dimensional numpy array with data type int64.

sample(n_samples=1)

Sample the Gaussian mixture model, returning a tuple with two components, the first being an 2D array of the coordinates of the new samples and the second being a 1D array of the labels for each new sample.

score_samples(x)

Given a vector (or list of vectors) in x, return the log likelihood at each point as a numpy array.

set_data(in_data, verbose=0, n_components=2, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1)

Fit the mixture model with the specified input data, a numpy array of shape (n_samples,n_coordinates)

set_data_str(in_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.classify_sklearn_dtc

Classify a data set using scikit-learn’s decision tree classifier.

See https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html .

eval(v)

Evaluate the classifier at point v. If self.outformat is equal to list, then the output is a Python list, otherwise, the output is a numpy array.

eval_list(v)

Evaluate the classifier at the array of points stored in v.

load(filename, obj_prefix)

Load the classifer from an HDF5 file named filename as a string named obj_prefix.

save(filename, obj_prefix='classify_sklearn_dtc')

Save the classifer to an HDF5 file named filename as a string named obj_prefix.

set_data(in_data, out_data, outformat='numpy', verbose=0, test_size=0.0, criterion='gini', splitter='best', max_depth=None, max_features=None, random_state=None)

Set the input and output data to train the classifier

The variable in_data should be an array of shape (n_points,n_dim), and out_data can be of shape (n_points) or (n_points,1).

AWS, 12/4/24: I’m not sure if this class works with more than one output label.

set_data_str(in_data, out_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

verbose = 0

Verbosity parameter (default 0)

class o2sclpy.classify_sklearn_gnb

Classify a data set using scikit-learn’s Gaussian naive Bayes classifier.

See https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html .

eval(v)

Evaluate the classifier at point v.

eval_list(v)

Evaluate the classifier at the array of points stored in v.

load(filename, obj_prefix='classify_sklearn_gnb')

Load the classifer from an HDF5 file named filename as a string named obj_prefix.

save(filename, obj_prefix='classify_sklearn_gnb')

Save the classifer to an HDF5 file named filename as a string named obj_prefix.

set_data(in_data, out_data, outformat='numpy', test_size=0.0, priors=None, var_smoothing=1e-09, verbose=0, transform_in='none')

Set the input and output data to train the interpolator

The variable in_data should be an array of shape (n_points,n_dim), and out_data can be of shape (n_points) or (n_points,1).

AWS, 12/4/24: I’m not sure if this class works with more than one output label.

set_data_str(in_data, out_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.classify_sklearn_mlpc

Classify a data set using scikit-learn’s multi-layer perceptron classifier.

See https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html .

eval(v)

Evaluate the classifier at point v.

eval_list(v)

Evaluate the classifier at the array of points stored in v.

load(filename, obj_prefix)

Load the classifer from an HDF5 file named filename as a string named obj_prefix.

save(filename, obj_prefix='classify_sklearn_mlpc')

Save the classifer to an HDF5 file named filename as a string named obj_prefix.

set_data(in_data, out_data, transform_in='none', outformat='numpy', test_size=0.0, hlayers=(100,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', max_iter=200, random_state=None, verbose=False, early_stopping=False, n_iter_no_change=10, tol=0.0001)

Set the input and output data to train the interpolator

The variable in_data should be an array of shape (n_points,n_dim), and out_data can be of shape (n_points) or (n_points,1).

AWS, 12/4/24: I don’t think this class works with than one output label.

set_data_str(in_data, out_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.gmm_sklearn

Use scikit-learn to generate a Gaussian mixture model of a specified set of data.

This is an experimental interface to provide easier interaction with C++.

components(v)

For a point (or set of points) specified in v, use the Gaussian mixture at to compute the density (or densities) of each component as a contiguous numpy array. Each array will have entries which sum to 1.

get_data()

Return the properties of the Gaussian mixture model as contiguous numpy arrays. This function returns, in order, the weights, the means, the covariances, the precisions (the inverse of the covariances), and the Cholesky decomposition of the precisions.

log_pdf(x)

Return the per-sample average log likelihood of the data as a single floating point value given the vector or vectors specified in x.

o2graph_to_gmm(o2scl, amp, link, args)

The function providing the ‘to-gmm’ command for o2graph.

predict(v)

Predict the labels (the index of the Gaussian) given a vector or vectors v and return them in a one-dimensional numpy array with data type int64.

sample(n_samples=1)

Sample the Gaussian mixture model, returning a tuple with two components, the first being an 2D array of the coordinates of the new samples and the second being a 1D array of the labels for each new sample.

score_samples(x)

Given a vector (or list of vectors) in x, return the log likelihood at each point as a numpy array.

set_data(in_data, verbose=0, n_components=2, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1)

Fit the mixture model with the specified input data, a numpy array of shape (n_samples,n_coordinates)

set_data_str(in_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.nflows_nsf

Neural spline flow probability density distribution from normflows which uses pytorch

This class is experimental.

This code was originally based on https://github.com/VincentStimper/normalizing-flows/blob/master/examples/circular_nsf.ipynb .

log_pdf(x)

Return the log likelihood

The value x can be a single point, expressed as a one-dimensional list or numpy array, or a series of points specified as a numpy array.

If x contains only one point, then only a single floating point value is returned. Otherwise, the return type is a list or numpy array, depending on the value of outformat.

pdf(x)

Return the likelihood

sample(n_samples=1)

Sample the distribution

The output is a list or numpy array, depending on which option was specified to set_data() or set_data_str(). The list or numpy array is only one-dimensional if n_samples is 1.

set_data(in_data, verbose=0, num_layers=20, num_hidden_channels=128, max_iter=20000, outformat='numpy', adam_lr=0.0001, adam_decay=0.0001)

Fit the mixture model with the specified input data, a numpy array of shape (n_samples,n_coordinates)

adam_lr is Adam learning rate (pytorch default is 1.0e-3) adam_decay is the Adam weight decay (pytorch default is 0)

set_data_str(in_data, options='')

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.kde_sklearn

Use scikit-learn to generate a KDE.

This is an experimental interface to provide easier interaction with C++.

Todo

  • Fix the comparison between sklearn and scipy, making sure they both produce the same log_pdf() in the correct conditions. Ensure the integral is normalized when appropriate.

See https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html .

get_bandwidth()

Return the bandwidth

log_pdf(x)

Return the log likelihood

pdf(x)

Return the likelihood

sample(n_samples=1)

Sample the Gaussian mixture model

set_data(in_data, bw_array, verbose=0, kernel='gaussian', metric='euclidean', outformat='numpy', transform='unit', bandwidth='none')

Fit the mixture model with the specified input data, a numpy array of shape (n_samples,n_coordinates)

set_data_str(in_data, bw_array, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.kde_scipy

Use scipy to generate a KDE

This is an experimental and very simplifed interface, mostly to provide easier interaction with C++.

get_bandwidth()

Return the bandwidth

log_pdf(x)

Return the log likelihood

pdf(x)

Return the likelihood

sample(n_samples=1)

Sample the Gaussian mixture model

set_data(in_data, verbose=0, weights=None, outformat='numpy', bw_method=None, transform='unit')

Fit the mixture model with the specified input data, a numpy array of shape (n_samples,n_coordinates)

set_data_str(in_data, weights, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

string_to_dict(s)

Convert a string to a dictionary, converting strings to values when necessary.

class o2sclpy.interpm_sklearn_dtr

Interpolate one or many multidimensional data sets using scikit-learn’s decision tree regression.

See https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html .

eval(v)

Evaluate the regression at point v.

eval_list(v)

Evaluate the GP at point v.

load(filename, obj_name)

Load the interpolation settings from a file

save(filename, obj_name)

Save the interpolation settings to an HDF5 file

set_data(in_data, out_data, outformat='numpy', verbose=0, test_size=0.0, criterion='squared_error', splitter='best', max_depth=None, random_state=None)

Set the input and output data to train the interpolator

set_data_str(in_data, out_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.interpm_sklearn_gp

Interpolate one or many multimensional data sets using a Gaussian process from scikit-learn

See https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html .

eval(v)

Evaluate the GP at point v.

eval_list(v)

Evaluate the GP at point v.

eval_unc(v)

Evaluate the GP and its uncertainty at point v.

# AWS, 3/27/24: Keep in mind that # o2scl::interpm_python.eval_unc() expects the return type to # be a tuple of numpy arrays.

load(filename, obj_name)

Load the interpolation settings from a file

save(filename, obj_name)

Save the interpolation settings to an HDF5 file

set_data(in_data, out_data, kernel='1.0*RBF(1.0,(1e-2,1e2))', test_size=0.0, normalize_y=True, transform_in='none', alpha=1e-10, transform_out='none', outformat='numpy', verbose=0, random_state=None)

Set the input and output data to train the interpolator

set_data_str(in_data, out_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.interpm_sklearn_mlpr

Interpolate one or many multidimensional data sets using scikit-learn’s multi-layer perceptron regressor.

eval(v)

Evaluate the MLP at point v.

eval_list(v)

Evaluate the GP at point v.

eval_unc(v)

Empty function because this interpolator does not currently provide uncertainties

load(filename, obj_name)

Load the interpolation settings from a file

save(filename, obj_name)

Save the interpolation settings to an HDF5 file

set_data(in_data, out_data, outformat='numpy', test_size=0.0, hlayers=(100,), activation='relu', transform_in='none', transform_out='none', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='adaptive', max_iter=500, random_state=1, verbose=0, early_stopping=True, n_iter_no_change=10)

Set the input and output data to train the interpolator

set_data_str(in_data, out_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.

class o2sclpy.interpm_tf_dnn

Interpolate one or many multimensional data sets using a deep neural network from TensorFlow

eval(v)

Evaluate the NN at point v.

eval_list(v)

Evaluate the GP at point v.

eval_unc(v)

Empty function because this interpolator does not currently provide uncertainties

load(filename)

Load the interpolation settings from a file

(No custom object support)

save(filename)

Save the interpolation settings to a file

(No custom object support)

set_data(in_data, out_data, outformat='numpy', verbose=0, activations=['relu', 'relu'], batch_size=None, epochs=100, transform_in='none', transform_out='none', test_size=0.0, evaluate=False, hlayers=[8, 8], loss='mean_squared_error', es_min_delta=0.0001, es_patience=100, es_start=50)

Set the input and output data to train the interpolator

some activation functions: relu [0,infty] sigmoid [0,1] tanh [-1,1]

transformations: quantile transforms to [0,1] MinMaxScaler transforms to [a,b]

set_data_str(in_data, out_data, options)

Set the input and output data to train the interpolator, using a string to specify the keyword arguments.