Only this pageAll pages
Powered by GitBook
1 of 9

ELBO AI API and Command Line Documentation

Loading...

Loading...

Loading...

Reference

Loading...

Loading...

Loading...

Loading...

Loading...

Welcome

Making ML stuff cheap and easy 💪

Looking for image generation API, sign up here 👇

ELBO.AI is a service that makes training of ML models easier and cheaper. Use our service to train your models, receive timely training notifications, choose compute types that fit your budget and integrate with our API to add the service to your workflow.

Want to jump right in?

Jump in to the quick start docs and get making your first task submission:

Want to deep dive?

Dive a little deeper and start exploring our API reference to get an idea of everything that's possible with the API:

Quick Start

Here is the guide to help setup ELBO environment in your local machine.

Good to know: We are just getting started with this service and are actively building it. If you face any problems with the service or API, please reach out to us at [email protected]

Get your API keys

Your API requests are authenticated using API keys. Any request that doesn't include an API key will return an HTTP Authentication error.

Sign up for an account (with a 14 day trial period). You can get the API key from your here at any time on the website.

Setup your Virtual Environment

It's better to run Python in a virtual environment or use conda. To install your virtual environment run:

pip3 install virtualenv virtualenvwrapper

And create an environment using:

virtualenv -p python3 .venv

or if virtualenv is not in path:

~/Library/Python/3.9/bin/virtualenv -p python3 .venv

This creates a virtual Python environment in the .venv folder. To activate this environment use the command:

. .venv/bin/activate

Or the following if you are using the fish shell:

. .venv/bin/activate.fish

If you hit a Command not found error while running virtualenvthen try running virtual env from the user install location. This happens if the package was installed in the user path instead of the system global path.

~/Library/Python/3.9/bin/virtualenv

Install the library

The best way to interact with our API is to use our elbo library. You can install it using the command line below:

pip3 install elbo --upgrade

Good to know: The elbo package still resides in the test pypi repository. We will move it to the official repository once we are out of beta development.

Login to ELBO

Use the command line tool to login.

elbo login

This will prompt you to enter your token. The token can be obtained by logging into the ELBO welcome page.

Make your first task submission

Try out one of the sample ML submission from our examples Github repository. First clone the repository:

git clone https://github.com/elbo-ai/elbo-examples.git
cd elbo-examples/pytorch/mnist_classifier/

Submit the sample task:

elbo run --config elbo.yaml

Here is a sample output of the command that prompts with a list of compute options from our providers:

elbo.client is starting 'Train MNIST Classifier' submission ...
elbo.client Hey Anu 👋, welcome!
elbo.client is uploading sources from ....
elbo.client upload successful.

elbo.client number of compute choices - 28
? Please choose: (Use arrow keys)
 »  $ 0.0028/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS (spot)
    $ 0.0150/hour     Standard (for testing)   1 cpu     2Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 0.0770/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS
    $ 0.2700/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS (spot)
    $ 0.6100/hour         Nvidia Quadro 4000  16 cpu    32Gb mem    8Gb gpu-mem TensorDock
    $ 0.9000/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS
    $ 0.9180/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS (spot)
    $ 0.9200/hour         Nvidia Quadro 5000   2 cpu     4Gb mem   16Gb gpu-mem FluidStack
    $ 0.9600/hour               Nvidia A5000   2 cpu    16Gb mem   24Gb gpu-mem TensorDock
    $ 1.4900/hour               Nvidia A4000  12 cpu    64Gb mem   16Gb gpu-mem FluidStack
    $ 1.4940/hour                 Nvidia A40   2 cpu    12Gb mem   48Gb gpu-mem TensorDock
    $ 1.5000/hour         Nvidia Quadro 6000   8 cpu    32Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 1.5140/hour               Nvidia A6000   2 cpu    16Gb mem   48Gb gpu-mem TensorDock
    $ 2.1600/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS (spot)
    $ 3.0000/hour      2x Nvidia Quadro 6000  16 cpu    64Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 3.0600/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS
    $ 3.6720/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS (spot)
    $ 3.7460/hour             7x Nvidia V100   6 cpu     8Gb mem   16Gb gpu-mem TensorDock
    $ 4.3200/hour       16x Nvidia Tesla K80  64 cpu   732Gb mem   12Gb gpu-mem AWS (spot)
    $ 4.5000/hour      3x Nvidia Quadro 6000  20 cpu    96Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 6.0000/hour      4x Nvidia Quadro 6000  24 cpu   128Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 7.3440/hour             8x Nvidia V100  64 cpu   488Gb mem   16Gb gpu-mem AWS (spot)
    $ 7.9200/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS
    $ 9.8318/hour             8x Nvidia A100  96 cpu  1152Gb mem   80Gb gpu-mem AWS (spot)
    $13.0360/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS
    $14.4000/hour       16x Nvidia Tesla K80  64 cpu   732Gb mem   12Gb gpu-mem AWS
    $24.4800/hour             8x Nvidia V100  64 cpu   488Gb mem   16Gb gpu-mem AWS
    $32.7726/hour             8x Nvidia A100  96 cpu  1152Gb mem   80Gb gpu-mem AWS

Thats it! 🥳 Monitor your task progression using elbo show <task_id>.

Good to know: The list of compute options is sorted in the order of best price to performance. Note that the cheapest option may not always be the best nor is the most expensive option.

Quick Start
API Reference

The abstract elbo.ElboModel

An ElboModel is an abstract class that allows ELBO service to automatically checkpoint your training.

Extending your model class

Extend the abstractElboModel along with nn.Model in your PyTorch model class. With this you will be required to implement two methods:

  • save_state - This method should save the state of the model and other state information needed.

  • load_state - This method should load the state of the model from the input directory.

Good to know: These methods will be called by the training loop at periodic intervals to keep saving the state of training. Please make sure anything thats needed for a training to resume from a previous checkpoint is saved and loaded through this method.

class MNISTClassifier(ElboModel, nn.Module):
    def get_artifacts_directory(self):
        return 'artifacts'

    def save_state(self):
        model_path = os.path.join(self.get_artifacts_directory(), "mnist_model")
        torch.save(self.state_dict(), model_path)
        print(f"Saving model to {model_path}")

    def load_state(self, state_dir):
        model_path = os.path.join(self.get_artifacts_directory(), "mnist_model")
        print(f"Loading model from {model_path}")
        self.load_state_dict(torch.load(model_path))

CLI Reference

Use the command-line tool to run tasks, show task status, cancel tasks and SSH into tasks.

Help

elbo --help

(.venv) joy@elbo ~> elbo
Usage: elbo [OPTIONS] COMMAND [ARGS]...

  elbo.ai - Train more, pay less

Options:
  --help  Show this message and exit.

Commands:
  balance   Show the users balance
  create    Create an instance and get SSH access to it.
  download  Download the artifacts for the task.
  kill      Stop the task.
  login     Login to the ELBO service.
  notebook  Start a Jupyter Lab session.
  ps        Show list of all tasks.
  run       Submit a task specified by the config file.
  show      Show the task.
  ssh       SSH into the machine running the task.
  status    Get ELBO server status.

Start a notebook

elbo notebook

(.venv) joy@elbo ~/p/elbo-examples (main)> elbo notebook
elbo.client creating notebook using config at project [email protected]:elbo-ai/elbo-examples.git ...
elbo.client cloning [email protected]:elbo-ai/elbo-examples.git to /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90 ...
elbo.client Submitting notebook run config : /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90/notebook/elbo.yaml
elbo.client is starting 'Start a jupyter notebook' submission ...
elbo.client Hey Anu 👋, welcome!
elbo.client is uploading sources from /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90/notebook/....
elbo.client upload successful.

elbo.client number of compute choices - 28
? Please choose:  $ 0.4200/hour         Nvidia Quadro 4000   2 cpu     4Gb mem    8Gb gpu-mem FluidStack
elbo.client compute node ip 216.153.51.67
elbo.client task with ID 125 is submitted successfully.
elbo.client ----------------------------------------------
elbo.client ssh using - ssh [email protected] -p 2222
elbo.client scp using - scp [email protected] -p 2222
elbo.client password: BZ7qNxpVJAsAXEequQ
elbo.client ----------------------------------------------

elbo.client here are URLS for task logs ...
elbo.client setup logs        - http://216.153.51.67/setup
elbo.client requirements logs - http://216.153.51.67/requirements
elbo.client task logs         - http://216.153.51.67/task

elbo.client TIP: 💡 see task details with command: `elbo show 125`

elbo.client ⏳ It may take a minute or two for the node to be reachable.

elbo.client node started ..

elbo.client Notebook URL = http://216.153.51.67:8080/?token=5824d0cfbbc3ed1710969d4cfe8404c6dfdcc37e206d931d

Run a task

elbo run --config <config_file_path>

(.venv) joy@elbo ~/p/elbo-examples (main)> elbo run --config pytorch/mnist_classifier/elbo.yaml
elbo.client is starting 'Train MNIST Classifier' submission ...
elbo.client Hey Anu 👋, welcome!
elbo.client is uploading sources from pytorch/mnist_classifier/....
elbo.client upload successful.

elbo.client number of compute choices - 27
? Please choose: (Use arrow keys)
 »  $ 0.0028/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS (spot)
    $ 0.0150/hour     Standard (for testing)   1 cpu     2Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 0.0770/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS
    $ 0.2700/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS (spot)
    $ 0.7220/hour               Nvidia A4000   2 cpu     4Gb mem   16Gb gpu-mem TensorDock
    $ 0.9000/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS
    $ 0.9180/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS (spot)
    $ 0.9200/hour         Nvidia Quadro 5000   2 cpu     4Gb mem   16Gb gpu-mem FluidStack
    $ 0.9600/hour               Nvidia A5000   2 cpu    16Gb mem   24Gb gpu-mem TensorDock
    $ 1.4940/hour                 Nvidia A40   2 cpu    12Gb mem   48Gb gpu-mem TensorDock
    $ 1.5000/hour         Nvidia Quadro 6000   8 cpu    32Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 1.5140/hour               Nvidia A6000   2 cpu    16Gb mem   48Gb gpu-mem TensorDock
    $ 2.1600/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS (spot)
    $ 3.0000/hour      2x Nvidia Quadro 6000  16 cpu    64Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 3.0600/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS
    $ 3.6720/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS (spot)
    $ 3.7460/hour             7x Nvidia V100   6 cpu     8Gb mem   16Gb gpu-mem TensorDock
    $ 4.3200/hour       16x Nvidia Tesla K80  64 cpu   732Gb mem   12Gb gpu-mem AWS (spot)
    $ 4.5000/hour      3x Nvidia Quadro 6000  20 cpu    96Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 6.0000/hour      4x Nvidia Quadro 6000  24 cpu   128Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
    $ 7.3440/hour             8x Nvidia V100  64 cpu   488Gb mem   16Gb gpu-mem AWS (spot)
    $ 7.9200/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS
    $ 9.8318/hour             8x Nvidia A100  96 cpu  1152Gb mem   80Gb gpu-mem AWS (spot)
    $13.0360/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS.

Cancel a task

elbo kill <task_id>

(.venv) joy@elbo ~/p/elbo-examples (main)> elbo kill 153
elbo.client Stopping task - 153
elbo.client Task with id=153 is marked for cancellation.

Show task attributes

elbo show <task_id>

(.venv) joy@elbo ~/p/elbo-examples (main)> elbo show 123
elbo.client Fetching task - 123
elbo.client Task with id = 123:
Billed Cost             : 0.2100000
Billed Upto Time        : 03/07/22 12:58
Bucket Key              : [email protected]/elbo-archive-26b7975b.tgz
Completion Time         : 03/07/22 12:58
Compute Type            : FluidStack None Nvidia Quadro 4000x1(8Gb) CPU=2(4Gb) Cost=0.42 Cost/Transistor=0.028767123287671233 CUDA Cores=2304
Config File Path        : None
Cost Per Hour           : 0.4200000
Created Time            : 03/07/22 12:27
Customer Billed         : True
Instance ID             : recbPXDkeuR3SBTV7
Instance Type           : Dedicated
Keep Alive              : True
Last Modified Time      : 03/07/22 12:26
Name                    : Start a jupyter notebook
Password                : ou4zebZ2XCoaMhdDrQ
Previous Task ID        : None
Provider                : FluidStack
Record ID               : 185
Requirements Log Path   : http://216.153.51.67/requirements
Run Time                : 00h:31m:28s
SSH Only                : False
Session ID              : 6381c2c835e340f6957542720dee8d13
Setup Log Path          : http://216.153.51.67/setup
Status                  : Archived
Submission Time         : 03/07/22 12:26
Target File Path        : [email protected]/elbo-6381c2c835e340f6957542720dee8d13-artifacts.tgz
Task ID                 : 123
Task Log Path           : http://216.153.51.67/task
Total Cost              : 0.2170000
User ID                 : [email protected]
ip                      : 216.153.51.67

Download task artifacts

elbo download <task_id>

(.venv) joy@elbo ~/p/elbo-examples (main)> elbo download 159
elbo.client Downloading Artifacts for - 159
elbo.client Artifacts for task id = 159 downloaded to /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmp8uetxm69/elbo-3dc59be0e9b545378b6a679345175f1c-artifacts.tgz

Show running task

elbo ps -r

(.venv) joy@elbo ~> elbo ps -r
elbo.client your running tasks:
+---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
| Task ID | Compute Type | Cost Per Hour | Provider |     Start Time      | Status  |   Submission Time   |       Task Name        | Completion Time | Run Time | Total Cost |
+---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
|   153   | Economy GPU  |    0.0028     |   AWS    | 02/04/2022 14:26 PM | Running | 02/04/2022 14:25 PM |    SSH only session    |                 |          |            |
|   155   | Economy GPU  |    0.0028     |   AWS    | 02/04/2022 14:29 PM | Running | 02/04/2022 14:28 PM |    SSH only session    |                 |          |            |
|   156   | Economy GPU  |     0.27      |   AWS    | 02/04/2022 14:31 PM | Running | 02/04/2022 14:29 PM | Train MNIST Classifier |                 |          |            |
|   158   | Economy GPU  |     0.27      |   AWS    | 02/04/2022 15:05 PM | Running | 02/04/2022 15:04 PM | Train MNIST Classifier |                 |          |            |
|   159   | Economy GPU  |    0.0028     |   AWS    | 02/04/2022 15:45 PM | Running | 02/04/2022 15:45 PM | Train MNIST Classifier |                 |          |            |
+---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+

SSH into a running task

(.venv) joy@elbo ~> elbo ssh 159
elbo.client Trying to SSH into task 159...
elbo.client SSH:
elbo.client Running Command : ssh [email protected] -p 2222
elbo.client Enter this password when prompted: elbo
Warning: Permanently added '[44.234.188.107]:2222' (ED25519) to the list of known hosts.
[email protected]'s password:
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1061-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@6b63a53b9691:~#

Show ELBO Server Status

(.venv) joy@elbo ~> elbo status
elbo.client Membership: ✅
elbo.client Database  : ✅
elbo.client Server    : ✅

The ELBO Tracker

The ELBO tracker allows you to monitor your ML tasks on your Phone using the ELBO Tracker App

The ELBO Tracker API is an easy way to monitor your tasks on your Phone. Using the API you can log messages, key metrics (numbers), and images. These would show up in your task list on your Phone.

Please install the ELBO Tracker App and log in with your authentication token to see the results of your tasks.

To start instantiate an instance of the Tracker

from elbo.tracker.tracker import TaskTracker
tracker = TaskTracker("Hello World")

Where "Hello World" is the experiment name. Now to log a message just do:

tracker.log_message("Hi there! 👋")

That's it! Similarly logging a metric or an image is as simple:

    tracker.log_key_metric("Accuracy", 100.0)
    
    tracker.log_image("An AI generated image of a Cat 🐱", "images/aicat.png")

And finally, upload the logs using:

tracker.upload_logs()

Make sure you don't forget this step. The upload_logs() API can be called as many times as you would like. Each time it will append to the existing logs. For example, if you are training a model, it may make sense to call this API every epoch.

Once this is done, now you can see the results in your App!

The task list view
The task details view

The elbo.ElboEpochIterator

The iterator class is a decorator on top of a traditional Python iterator. Add this to your training epoch loop.

Usage

The elbo.EpochIterator takes the following arguments:

  • The range of epochs usually - range(0, num_epochs)

  • The PyTorch model it is training

  • save_state_interval How often should the model state be saved to artifacts directory. A value of 5 means the model state will be saved every 5 epochs.

if __name__ == '__main__':
    print(f"Training MNIST classifier")
    train_data = datasets.MNIST("data", train=True, transform=transforms.ToTensor(), download=True)
    test_data = datasets.MNIST("data", train=False, transform=transforms.ToTensor(), download=True)
    model = MNISTClassifier()
    num_epochs = 10

    for epoch in elbo.elbo.ElboEpochIterator(range(0, num_epochs), model, save_state_interval=1):
        loss = train(model, train_data)
        print(f"Epoch = {epoch} Loss = {loss}")

    test(model, test_data)

API Reference

You can use ELBO Python API to automatically save the state of the training.

PyTorch

We currently support only PyTorch, but please reach out to us if you need other frameworks. There are two main concepts:

Good to know: These APIs are being built and may change as we enhance them.

The Configuration file

The ELBO configuration is specified in YAML in a configuration file. Let's look at its contents.

The configuration file, typically named elbo.yaml has the following properties:

Option
Description
Examples

name

The name of your ML training task

"Hello, ELBO 💪"

gpu_class

The class of GPU you want to request. This can be one of the following:

  • Economy - Economy class GPUs - Tesla K80, Tesla M60 etc. These can be used for simple training tasks or just for testing purposes. Usually, these cost less than $1 per hour.

  • MidRange - Mid range GPUs - V100s or equivalent. These are more powerful GPUs and can be used for more compute-intensive tasks. These GPUs also have more GPU RAM (24Gb+) which is useful in generative models.

  • HighEnd - These are the latest and greatest GPU compute environment. Typically an Nvidia A100. These can be very expensive ~ $9 - $30 / hour depending on usage.

  • All - This options shows all the GPUs options that are available.

MidRange

setup (Optional)

A setup script that will be run prior to calling the training code

sudo apt-get install fish

requirements (Optional)

A requirements.txt file path that lists all the dependencies of the training code.

run

The main training code. The task execution will call this file directly.

main.py

task_dir

The directory where this task is present. Usually the current directory. This directory will be zipped and uploaded for running the task.

Please make sure all the files and scripts needed to run the training code are present in this directory.

.

artifacts

The directory where your code will place model checkpoints, plots, generated files etc. The ELBO service will package this directory and save it for you to download after the task is complete.

artifacts

keep_alive

Setting this to True will ensure the node running the job is not stopped after the job is complete.

True

Tip: If you are submitting the task for the first time, you may want to run the training task on an Economy class machine and then move to higher classes when you see the model converging after a few epochs.

Here is a sample configuration with comments on what each property means:

#
# ELBO Sample Config File for MNIST Classifier Task
#
# All paths are relative to where the `elbo.yaml` file is placed

name: "Train MNIST Classifier"

# The GPU class to use - Economy, MidRange, HighEnd, All
gpu_class: Economy

# The script to run for setting up the environment. For example - installing packages 
# on Ubuntu
setup: setup.sh

# The PIP requirements file. ELBO will install the requirements specified in this 
# file before launching the task.
requirements: requirements.txt

# The main entry point in the task. Once the script exits or terminates, the task
# is considered complete.
run: main.py

# The task directory, relative to this file. This directory will be tar-balled and sent to ELBO task executor for
# execution
task_dir: .

# Artifacts directory. This is the directory that will be copied over as output. All model related files - 
# checkpoints, generated samples, evaluation results etc. should be placed in this directory. 
artifacts: ~/artifacts
pandas
numpy
torch
pytorch_lightning
tqdm
torchvision
wandb    
The abstract elbo.ElboModel
The elbo.ElboEpochIterator