alkaline-ml

Conda envs in Pyspark

3 reasons you should be deploying your Conda environments for your Pyspark jobs

Posted on July 2, 2018

If you’ve only ever tinkered with Hadoop within the context of a sandbox, you may never have encountered one of the inevitabililities of Enterprise-scale distributed computing: different machines have different configurations. Even when synchronized with tools such as Puppet, datanodes in a Hadoop cluster may not be a mirror image... [Read More]

Tags: python pyspark tutorials

An intro to dummy encoding with Skoot

Using Skoot to accelerate your ML pre-processing workflow

Posted on June 18, 2018

This post will introduce you to dummy coding in skoot, one of my projects dedicated to helping machine learning practitioners automate as much of their workflow as possible. Those who have worked in the field for a while know that 80 - 90% of a data scientist’s time is spent... [Read More]

Tags: skoot python machine-learning dummy-encoding tutorials