Search…
The Configuration file
The ELBO configuration is specified in YAML in a configuration file. Let's look at its contents.
The configuration file, typically named elbo.yaml has the following properties:
Option
Description
Examples
name
The name of your ML training task
"Hello, ELBO πŸ’ͺ"
gpu_class
The class of GPU you want to request. This can be one of the following:
  • Economy - Economy class GPUs - Tesla K80, Tesla M60 etc. These can be used for simple training tasks or just for testing purposes. Usually, these cost less than $1 per hour.
  • MidRange - Mid range GPUs - V100s or equivalent. These are more powerful GPUs and can be used for more compute-intensive tasks. These GPUs also have more GPU RAM (24Gb+) which is useful in generative models.
  • HighEnd - These are the latest and greatest GPU compute environment. Typically an Nvidia A100. These can be very expensive ~ $9 - $30 / hour depending on usage.
  • All - This options shows all the GPUs options that are available.
MidRange
setup (Optional)
​
A setup script that will be run prior to calling the training code
sudo apt-get install fish
requirements (Optional)
​
A requirements.txt file path that lists all the dependencies of the training code.
​
​
pandas
numpy
torch
pytorch_lightning
tqdm
torchvision
wandb
run
The main training code. The task execution will call this file directly.
main.py
task_dir
The directory where this task is present. Usually the current directory. This directory will be zipped and uploaded for running the task.
​
Please make sure all the files and scripts needed to run the training code are present in this directory.
.
artifacts
The directory where your code will place model checkpoints, plots, generated files etc. The ELBO service will package this directory and save it for you to download after the task is complete.
artifacts
keep_alive
Setting this to True will ensure the node running the job is not stopped after the job is complete.
True
​
Tip: If you are submitting the task for the first time, you may want to run the training task on an Economy class machine and then move to higher classes when you see the model converging after a few epochs.
Here is a sample configuration with comments on what each property means:
#
# ELBO Sample Config File for MNIST Classifier Task
#
# All paths are relative to where the `elbo.yaml` file is placed
​
name: "Train MNIST Classifier"
​
# The GPU class to use - Economy, MidRange, HighEnd, All
gpu_class: Economy
​
# The script to run for setting up the environment. For example - installing packages
# on Ubuntu
setup: setup.sh
​
# The PIP requirements file. ELBO will install the requirements specified in this
# file before launching the task.
requirements: requirements.txt
​
# The main entry point in the task. Once the script exits or terminates, the task
# is considered complete.
run: main.py
​
# The task directory, relative to this file. This directory will be tar-balled and sent to ELBO task executor for
# execution
task_dir: .
​
# Artifacts directory. This is the directory that will be copied over as output. All model related files -
# checkpoints, generated samples, evaluation results etc. should be placed in this directory.
artifacts: ~/artifacts
​
Last modified 7mo ago
Export as PDF
Copy link