sklearn_utils¶
Utility functions, preprocessing steps, and class I need during in my research and developement projects in scikit learn.
Examples¶
If you want to scale your data based on reference values you may use StandardScalerByLabel. For example, I scale all the blood sample by healthy samples.
from sklearn_utils.preprocessing import StandardScalerByLabel
preprocessing = StandardScalerByLabel('healthy')
X_t = preprocessing.fit_transform(X, y)
Or you may want your list of dict in the end of sklearn pipeline, after set of operations and feature selection.
from sklearn_utils.preprocessing import InverseDictVectorizer
vect = DictVectorizer(sparse=False)
skb = SelectKBest(k=100)
pipe = Pipeline([
('vect', vect),
('skb', skb),
('inv_vect', InverseDictVectorizer(vect, skb))
])
X_t = pipe.fit_transform(X, y)
For more features, You can check the documentation.
Documentation¶
The documentation of the project avaiable in http://sklearn-utils.rtfd.io .
API Documentation¶
Preprocessing¶
-
class
sklearn_utils.preprocessing.DictInput(transformer, feature_selection=False, sparse=False)[source]¶ Bases:
sklearn.base.TransformerMixinConverts a preprocessing step to accept list of dict.
-
class
sklearn_utils.preprocessing.FoldChangeScaler(reference_label, bounds=(-10, 10))[source]¶ Bases:
sklearn.base.TransformerMixinScales by measured value by distance to mean according to time of value. Useful when you want to standart scale but no varience.
-
class
sklearn_utils.preprocessing.FeatureRenaming(names, case_sensetive=False)[source]¶ Bases:
sklearn.base.TransformerMixinPreprocessing to re-name features.
-
class
sklearn_utils.preprocessing.StandardScalerByLabel(reference_label)[source]¶ Bases:
sklearn.preprocessing.data.StandardScalerStandardScaler for using only by give label.
-
class
sklearn_utils.preprocessing.FunctionalEnrichmentAnalysis(reference_label, feature_groups, method='fisher_exact', alternative='two-sided', filter_func=None)[source]¶ Bases:
sklearn.base.TransformerMixinFunctional Enrichment Analysis
-
__init__(reference_label, feature_groups, method='fisher_exact', alternative='two-sided', filter_func=None)[source]¶ Reference_label: label of refence values in the calculation Method: only fisher exact test avaliable so far Feature_groups: list of dict where keys are new feature and values are list of old features Filter_func: function return true or false
-
-
class
sklearn_utils.preprocessing.FeatureMerger(features, strategy='mean')[source]¶ Bases:
sklearn.base.TransformerMixinMerge some features based on given strategy.
Utils¶
-
sklearn_utils.utils.filter_by_label(X, y, ref_label, reverse=False)[source]¶ Select items with label from dataset.
Parameters: - X – dataset
- y – labels
- ref_label – reference label
- reverse (bool) – if false selects ref_labels else eliminates
-
sklearn_utils.utils.average_by_label(X, y, ref_label)[source]¶ Calculates average dictinary from list of dictionary for give label
Parameters: - X (List[Dict]) – dataset
- y (list) – labels
- ref_label – reference label
-
sklearn_utils.utils.map_dict(d, key_func=None, value_func=None, if_func=None)[source]¶ Parameters: - d (dict) – dictionary
- key_func (func) – func which will run on key.
- value_func (func) – func which will run on values.
-
sklearn_utils.utils.map_dict_list(ds, key_func=None, value_func=None, if_func=None)[source]¶ Parameters: - ds (List[Dict]) – list of dict
- key_func (func) – func which will run on key.
- value_func (func) – func which will run on values.
Noise¶
-
class
sklearn_utils.noise.SelectNotKBest(**kwargs)[source]¶ Bases:
sklearn.base.TransformerMixinSelect all feature except best K feature
-
class
sklearn_utils.noise.NoiseGenerator(noise_func, noise_func_args)[source]¶ Bases:
sklearn.base.TransformerMixinAdd noise to dataset
-
__init__(noise_func, noise_func_args)[source]¶ Add noise to data :noise_func: a function which generator noise with same shape with data :noise_func_args: arguments of noise function
-