PyData Seattle 2023

skbase - a workbench for creating scikit-learn like parametric objects and libraries
04-26, 15:30–17:00 (America/Los_Angeles), St. Helens

skbase provides a meta-toolkit that makes it easy to build your own package that follows scikit-learn design patterns, e.g., parametric composable objects, and fittable objects. It contains a standalone BaseObject/BaseEstimator base class, base class templates to write your own base classes, templateable test classes and object checks, object retrieval and inspection, and more.


The workshop will walk the audience through an example of creating their own package with parametric objects, custom base classes and objects inheriting from these, and a full testing framework.

This will also showcase skbase's (https://github.com/sktime/skbase) core functionality which is contained in submodules:

  • skbase.base provides: BaseObject - parameteric object with get/set_params, tag system, etc; BaseEstimator, for objects with fit, with is_fitted, get_fitted_params; mixin classes such as BaseMetaObject for homogenous and heterogeneous composites (e.g., ensembles, pipelines, graph objects).
  • skbase.lookup provides search tools such as all_objects that retrieves all BaseObject-s with certain tags from a project.
  • skbase.validate provides tools for validating and comparing BaseObject-s and collections of BaseObject-s
  • skbase.testing provides tools for testing BaseObject-s, and for setting up testing frameworks and object checkers, for dependent base classes.

Prior Knowledge Expected

Previous knowledge expected

Franz Kiraly is a research scientist with interest in open source toolbox design, model quality assurance, and time series related modelling tasks. He is also the founder and core developer of the sktime package, and a core developer of the skbase package.

Jonathan is a data scientist at DSML Research where he oversees internal data science operations and outreach efforts by delivering seminars and public talks on the latest topics in Data Science and Machine Learning. In the past he's worked as a consultant for General Assembly, the NYPD, Amber Capital and Advent International to help them productionize their data and develop their internal analytics capabilities. He's the author of the KerasBeats deep learning package and has helped contribute to sktime and tensorflowjs. He has an MS in Analytics from Georgia Tech and resides in the NYC area. His particular passion is time series modeling since he believes it's the most practical way to align innovations in data science with business interests.