sklearn_utils¶
Utility functions, preprocessing steps, and class I need during in my research and developement projects in scikit learn.
Examples¶
If you want to scale your data based on reference values you may use StandardScalerByLabel. For example, I scale all the blood sample by healthy samples.
from sklearn_utils.preprocessing import StandardScalerByLabel
preprocessing = StandardScalerByLabel('healthy')
X_t = preprocessing.fit_transform(X, y)
Or you may want your list of dict in the end of sklearn pipeline, after set of operations and feature selection.
from sklearn_utils.preprocessing import InverseDictVectorizer
vect = DictVectorizer(sparse=False)
skb = SelectKBest(k=100)
pipe = Pipeline([
('vect', vect),
('skb', skb),
('inv_vect', InverseDictVectorizer(vect, skb))
])
X_t = pipe.fit_transform(X, y)
For more features, You can check the documentation.
Documentation¶
The documentation of the project avaiable in http://sklearn-utils.rtfd.io .
API Documentation¶
Preprocessing¶
-
class
sklearn_utils.preprocessing.
DictInput
(transformer, feature_selection=False, sparse=False)[source]¶ Bases:
sklearn.base.TransformerMixin
Converts a preprocessing step to accept list of dict.
-
class
sklearn_utils.preprocessing.
FoldChangeScaler
(reference_label, bounds=(-10, 10))[source]¶ Bases:
sklearn.base.TransformerMixin
Scales by measured value by distance to mean according to time of value. Useful when you want to standart scale but no varience.
-
class
sklearn_utils.preprocessing.
FeatureRenaming
(names, case_sensetive=False)[source]¶ Bases:
sklearn.base.TransformerMixin
Preprocessing to re-name features.
-
class
sklearn_utils.preprocessing.
StandardScalerByLabel
(reference_label)[source]¶ Bases:
sklearn.preprocessing.data.StandardScaler
StandardScaler for using only by give label.
-
class
sklearn_utils.preprocessing.
FunctionalEnrichmentAnalysis
(reference_label, feature_groups, method='fisher_exact', alternative='two-sided', filter_func=None)[source]¶ Bases:
sklearn.base.TransformerMixin
Functional Enrichment Analysis
-
__init__
(reference_label, feature_groups, method='fisher_exact', alternative='two-sided', filter_func=None)[source]¶ Reference_label: label of refence values in the calculation Method: only fisher exact test avaliable so far Feature_groups: list of dict where keys are new feature and values are list of old features Filter_func: function return true or false
-
-
class
sklearn_utils.preprocessing.
FeatureMerger
(features, strategy='mean')[source]¶ Bases:
sklearn.base.TransformerMixin
Merge some features based on given strategy.
Utils¶
-
sklearn_utils.utils.
filter_by_label
(X, y, ref_label, reverse=False)[source]¶ Select items with label from dataset.
Parameters: - X – dataset
- y – labels
- ref_label – reference label
- reverse (bool) – if false selects ref_labels else eliminates
-
sklearn_utils.utils.
average_by_label
(X, y, ref_label)[source]¶ Calculates average dictinary from list of dictionary for give label
Parameters: - X (List[Dict]) – dataset
- y (list) – labels
- ref_label – reference label
-
sklearn_utils.utils.
map_dict
(d, key_func=None, value_func=None, if_func=None)[source]¶ Parameters: - d (dict) – dictionary
- key_func (func) – func which will run on key.
- value_func (func) – func which will run on values.
-
sklearn_utils.utils.
map_dict_list
(ds, key_func=None, value_func=None, if_func=None)[source]¶ Parameters: - ds (List[Dict]) – list of dict
- key_func (func) – func which will run on key.
- value_func (func) – func which will run on values.
Noise¶
-
class
sklearn_utils.noise.
SelectNotKBest
(**kwargs)[source]¶ Bases:
sklearn.base.TransformerMixin
Select all feature except best K feature
-
class
sklearn_utils.noise.
NoiseGenerator
(noise_func, noise_func_args)[source]¶ Bases:
sklearn.base.TransformerMixin
Add noise to dataset
-
__init__
(noise_func, noise_func_args)[source]¶ Add noise to data :noise_func: a function which generator noise with same shape with data :noise_func_args: arguments of noise function
-