Kubeflow’s missing helm chart

Kubeflow’s missing helm chart

Kubecon is here

This week is Kubecon EU ‘24 in Paris!

We’re following the latest trends in cloud-native networking, security, observability and more. There is one aspect, however, that we care about most: Artificial Intelligence applications are more popular than ever — how do we take advantage of Kubernetes when building them?

AI uses the cloud quite intensively:

  • Development requires cloud notebook environments

  • Deploying inference services requires autoscaling deployments

  • Training jobs may require spark or ray clusters

  • Generative AI requires hardware accelerators and vector DBs

The exciting thing for developers is that these needs can be met by platforms that run inside Kubernetes. A lot of this is centred around the Kubeflow project.

Kubeflow

I like to describe Kubeflow as a “cloud platform in a box”.

The Kubeflow console. (source: kubeflow.org)

For machine learning engineers it contains all of the services needed to iterate through the machine learning engineering flow — notebook servers, training pipelines, artifact stores, and inference servers. But cloud platforms are complicated, even ones that fit into a box. That box contains:

  1. Istio and Dex: Foundational networking and auth tools which secure users and isolate them from each others’ data

  2. A cloud console: The central dashboard is complete with features to give users access to projects, with various levels of permissions

  3. Lots of Kubernetes operators: Extensions to Kubernetes that let Kubeflow users work with machine learning abstractions such as inference endpoints, training jobs, or notebook servers

This is a lot to manage even for experienced K8s users. Installing Kubeflow is challenging because Istio must be fully initialised before installing the K8s operators. If this is not done correctly the system will fail in confusing ways.

As a result, going from zero to Kubeflow is hard. Common package managers like Helm don’t support dependencies like this.

Bootstrapping

Whilst a simple helm chart cannot get you started with Kubeflow, we can use helm in the context of cluster bootstrapping.

Cluster bootstrapping is its whole own area of interest for Kubernetes users. The key is to ensure that it is both possible to create a new instance of a cluster for your use case (e.g. when a new team needs a Kubeflow instance), and maintain it. This is challenging because of the complexity of working with the cloud in general, on top of your own K8s cluster.

Fortunately, we now have a solution for bootstrapping Kubeflow: Introducing Treebeardtech’s Kubeflow bootstrap chart!

Kubeflow-bootstrap

Kubeflow-bootstrap solves some key challenges with bootstrapping Kubeflow:

  • It orders the installation of components to avoid errors

  • It provides industry-standard interfaces for slotting into your infrastructure as code (e.g. terraform)

  • It can be extended to handle add-ons that enable features such as GPUs

Our contribution to the Kubeflow management problem is combining helm with ArgoCD. ArgoCD is a GitOps tool. It runs as a daemon inside your cluster and ensures that the state of the cluster matches some state defined in a remote repository — in this case, it watches a helm registry. ArgoCD balances a smooth initial setup with an answer to the long-term question posed by Day 2 operations: how do we keep updating and reconfiguring our platform?

If you want to try out Kubeflow, this project will get you started even if you’re a novice with Kubernetes. It contains three main entry points for users at different stages of their journey with Kubeflow:

  1. The Terraform interface provides a 1-command experience that sets up an entire Argo deployment and Kubeflow instance at whichever cluster you direct it at

  2. The helm chart interface allows more established teams that can run ArgoCD themselves a higher level of control

  3. For teams with advanced ArgoCD installations already, we provide a lower-level core chart that lets them customise our Kubeflow packages more directly.

Kubeflow-bootstrap architecture

Try out Kubeflow Bootstrap

The best part of this project is that by default it runs in small and inexpensive environments. You don’t even have to install the dependencies yourself — if you have access to GitHub codespaces, or a device with 2 CPUs and docker installed, you can try out Kubeflow using the devcontainer in the repo.

Fun and Profit

When I’ve shown people that they can run Kubeflow in a GitHub codespace, I see reactions of surprise, delight, and sometimes awe. Yes, it’s awesome but there’s more to it.

Kubeflow is the missing layer between your Kubernetes environment and your machine learning team. If your organisation becomes effective at using platforms like Kubeflow, you will deliver AI products more scalably. The Kubernetes space is so rich with adjacent products and hosting platforms that we believe giving people a nudge to start will help them take the path that leads to a production deployment.