使用Kubeflow构建机器学习流水线

发布时间：2020-06-24 22:37:36 所属栏目：模式来源：51cto

导读：在此前的文章中，我已经向你介绍了Kubeflow，这是一个为团队设置的机器学习平台，需要构建机器学习流水线。在本文中，我们将了解如何采用现有的机器学习详细并将其变成Kubeflow的机器学习流水线，进而可以部署在Kubernetes上。在进行本次练习的时候，请考

% conda env create -f environment.yml % source activate kubeflow-mnist % python preprocessing.py --data_dir=/path/to/data % python train.py --data_dir=/path/to/data

现在我们来回顾一下我们流水线中的几个步骤：

Git clone代码库下载并预处理训练和测试数据训练并进行评估

在我们开始写代码之前，需要从宏观上了解Kubeflow流水线。

流水线由连接组件构成。一个组件的输出成为另一个组件的输入，每个组件实际上都在容器中执行（在本例中为Docker）。将发生的情况是，我们会执行一个我们稍后将要指定的Docker镜像，它包含了我们运行preprocessing.py和train.py所需的一切。当然，这两个阶段会有它们的组件。

我们还需要额外的一个镜像以git clone项目。我们需要将项目bake到Docker镜像，但在实际项目中，这可能会导致Docker镜像的大小膨胀。

说到Docker镜像，我们应该先创建一个。

Step0：创建一个Docker镜像

如果你只是想进行测试，那么这个步骤不是必须的，因为我已经在Docker Hub上准备了一个镜像。这是Dockerfile的全貌：

FROM tensorflow/tensorflow:1.14.0-gpu-py3 LABEL MAINTAINER "Benjamin Tan <benjamintanweihao@gmail.com>" SHELL ["/bin/bash", "-c"] # Set the locale RUN echo 'Acquire {http::Pipeline-Depth "0";};' >> /etc/apt/apt.conf RUN DEBIAN_FRONTEND="noninteractive" RUN apt-get update && apt-get -y install --no-install-recommends locales && locale-gen en_US.UTF-8 ENV LANG en_US.UTF-8 ENV LANGUAGE en_US:en ENV LC_ALL en_US.UTF-8 RUN apt-get install -y --no-install-recommends wget git python3-pip openssh-client python3-setuptools google-perftools && rm -rf /var/lib/apt/lists/* # install conda WORKDIR /tmp RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-4.7.12-Linux-x86_64.sh -O ~/miniconda.sh && /bin/bash ~/miniconda.sh -b -p /opt/conda && rm ~/miniconda.sh && ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc # build conda environments COPY environment.yml /tmp/kubeflow-mnist/conda/ RUN /opt/conda/bin/conda update -n base -c defaults conda RUN /opt/conda/bin/conda env create -f /tmp/kubeflow-mnist/conda/environment.yml RUN /opt/conda/bin/conda clean -afy # Cleanup RUN rm -rf /workspace/{nvidia,docker}-examples && rm -rf /usr/local/nvidia-examples && rm /tmp/kubeflow-mnist/conda/environment.yml # switch to the conda environment RUN echo "conda activate kubeflow-mnist" >> ~/.bashrc ENV PATH /opt/conda/envs/kubeflow-mnist/bin:$PATH RUN /opt/conda/bin/activate kubeflow-mnist # make /bin/sh symlink to bash instead of dash: RUN echo "dash dash/sh boolean false" | debconf-set-selections && DEBIAN_FRONTEND=noninteractive dpkg-reconfigure dash # Set the new Allocator ENV LD_PRELOAD /usr/lib/x86_64-linux-gnu/libtcmalloc.so.

关于Dockerfile值得关注的重要一点是Conda环境是否设置完成并准备就绪。要构建镜像：

% docker build -t your-user-name/kubeflow-mnist . -f Dockerfile % docker push your-user-name/kubeflow-mnist

那么，现在让我们来创建第一个组件！

在pipeline.py中可以找到以下代码片段。

Step1：Git Clone

（编辑：应用网_阳江站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

2/5

首页

尾页