Only this pageAll pages
Powered by GitBook
1 of 9

ELBO AI API and Command Line Documentation

Loading...

Loading...

Loading...

Reference

Loading...

Loading...

Loading...

Loading...

Loading...

The Configuration file

The ELBO configuration is specified in YAML in a configuration file. Let's look at its contents.

The configuration file, typically named elbo.yaml has the following properties:

Option
Description
Examples

name

The name of your ML training task

"Hello, ELBO 💪"

gpu_class

The class of GPU you want to request. This can be one of the following:

  • Economy - Economy class GPUs - Tesla K80, Tesla M60 etc. These can be used for simple training tasks or just for testing purposes. Usually, these cost less than $1 per hour.

  • MidRange - Mid range GPUs - V100s or equivalent. These are more powerful GPUs and can be used for more compute-intensive tasks. These GPUs also have more GPU RAM (24Gb+) which is useful in generative models.

Tip: If you are submitting the task for the first time, you may want to run the training task on an Economy class machine and then move to higher classes when you see the model converging after a few epochs.

Here is a sample configuration with comments on what each property means:

HighEnd - These are the latest and greatest GPU compute environment. Typically an Nvidia A100. These can be very expensive ~ $9 - $30 / hour depending on usage.
  • All - This options shows all the GPUs options that are available.

  • MidRange

    setup (Optional)

    A setup script that will be run prior to calling the training code

    sudo apt-get install fish

    requirements (Optional)

    A requirements.txt file path that lists all the dependencies of the training code.

    run

    The main training code. The task execution will call this file directly.

    main.py

    task_dir

    The directory where this task is present. Usually the current directory. This directory will be zipped and uploaded for running the task.

    Please make sure all the files and scripts needed to run the training code are present in this directory.

    .

    artifacts

    The directory where your code will place model checkpoints, plots, generated files etc. The ELBO service will package this directory and save it for you to download after the task is complete.

    artifacts

    keep_alive

    Setting this to True will ensure the node running the job is not stopped after the job is complete.

    True

    pandas
    numpy
    torch
    pytorch_lightning
    tqdm
    torchvision
    wandb    
    #
    # ELBO Sample Config File for MNIST Classifier Task
    #
    # All paths are relative to where the `elbo.yaml` file is placed
    
    name: "Train MNIST Classifier"
    
    # The GPU class to use - Economy, MidRange, HighEnd, All
    gpu_class: Economy
    
    # The script to run for setting up the environment. For example - installing packages 
    # on Ubuntu
    setup: setup.sh
    
    # The PIP requirements file. ELBO will install the requirements specified in this 
    # file before launching the task.
    requirements: requirements.txt
    
    # The main entry point in the task. Once the script exits or terminates, the task
    # is considered complete.
    run: main.py
    
    # The task directory, relative to this file. This directory will be tar-balled and sent to ELBO task executor for
    # execution
    task_dir: .
    
    # Artifacts directory. This is the directory that will be copied over as output. All model related files - 
    # checkpoints, generated samples, evaluation results etc. should be placed in this directory. 
    artifacts: ~/artifacts
    

    API Reference

    You can use ELBO Python API to automatically save the state of the training.

    PyTorch

    We currently support only PyTorch, but please reach out to us if you need other frameworks. There are two main concepts:

    The abstract elbo.ElboModelThe elbo.ElboEpochIterator

    Good to know: These APIs are being built and may change as we enhance them.

    Quick Start

    Here is the guide to help setup ELBO environment in your local machine.

    Good to know: We are just getting started with this service and are actively building it. If you face any problems with the service or API, please reach out to us at

    Get your API keys

    Welcome

    Making ML stuff cheap and easy 💪

    Looking for image generation API, sign up here 👇

    is a service that makes training of ML models easier and cheaper. Use our service to train your models, receive timely training notifications, choose compute types that fit your budget and integrate with our API to add the service to your workflow.

    Want to jump right in?

    Jump in to the quick start docs and get making your first task submission:

    Your API requests are authenticated using API keys. Any request that doesn't include an API key will return an HTTP Authentication error.

    Sign up for an account (with a 14 day trial period). You can get the API key from your here at any time on the website.

    Setup your Virtual Environment

    It's better to run Python in a virtual environment or use conda. To install your virtual environment run:

    And create an environment using:

    or if virtualenv is not in path:

    This creates a virtual Python environment in the .venv folder. To activate this environment use the command:

    Or the following if you are using the fish shell:

    If you hit a Command not found error while running virtualenvthen try running virtual env from the user install location. This happens if the package was installed in the user path instead of the system global path.

    ~/Library/Python/3.9/bin/virtualenv

    Install the library

    The best way to interact with our API is to use our elbo library. You can install it using the command line below:

    Good to know: The elbo package still resides in the test pypi repository. We will move it to the official repository once we are out of beta development.

    Login to ELBO

    Use the command line tool to login.

    This will prompt you to enter your token. The token can be obtained by logging into the ELBO welcome page.

    Make your first task submission

    Try out one of the sample ML submission from our examples Github repository. First clone the repository:

    Submit the sample task:

    Here is a sample output of the command that prompts with a list of compute options from our providers:

    Thats it! 🥳 Monitor your task progression using elbo show <task_id>.

    Good to know: The list of compute options is sorted in the order of best price to performance. Note that the cheapest option may not always be the best nor is the most expensive option.

    [email protected]
    pip3 install elbo --upgrade
    pip3 install virtualenv virtualenvwrapper
    virtualenv -p python3 .venv
    ~/Library/Python/3.9/bin/virtualenv -p python3 .venv
    . .venv/bin/activate
    . .venv/bin/activate.fish
    elbo login
    git clone https://github.com/elbo-ai/elbo-examples.git
    cd elbo-examples/pytorch/mnist_classifier/
    elbo run --config elbo.yaml
    elbo.client is starting 'Train MNIST Classifier' submission ...
    elbo.client Hey Anu 👋, welcome!
    elbo.client is uploading sources from ....
    elbo.client upload successful.
    
    elbo.client number of compute choices - 28
    ? Please choose: (Use arrow keys)
     »  $ 0.0028/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS (spot)
        $ 0.0150/hour     Standard (for testing)   1 cpu     2Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 0.0770/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS
        $ 0.2700/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS (spot)
        $ 0.6100/hour         Nvidia Quadro 4000  16 cpu    32Gb mem    8Gb gpu-mem TensorDock
        $ 0.9000/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS
        $ 0.9180/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS (spot)
        $ 0.9200/hour         Nvidia Quadro 5000   2 cpu     4Gb mem   16Gb gpu-mem FluidStack
        $ 0.9600/hour               Nvidia A5000   2 cpu    16Gb mem   24Gb gpu-mem TensorDock
        $ 1.4900/hour               Nvidia A4000  12 cpu    64Gb mem   16Gb gpu-mem FluidStack
        $ 1.4940/hour                 Nvidia A40   2 cpu    12Gb mem   48Gb gpu-mem TensorDock
        $ 1.5000/hour         Nvidia Quadro 6000   8 cpu    32Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 1.5140/hour               Nvidia A6000   2 cpu    16Gb mem   48Gb gpu-mem TensorDock
        $ 2.1600/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS (spot)
        $ 3.0000/hour      2x Nvidia Quadro 6000  16 cpu    64Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 3.0600/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS
        $ 3.6720/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS (spot)
        $ 3.7460/hour             7x Nvidia V100   6 cpu     8Gb mem   16Gb gpu-mem TensorDock
        $ 4.3200/hour       16x Nvidia Tesla K80  64 cpu   732Gb mem   12Gb gpu-mem AWS (spot)
        $ 4.5000/hour      3x Nvidia Quadro 6000  20 cpu    96Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 6.0000/hour      4x Nvidia Quadro 6000  24 cpu   128Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 7.3440/hour             8x Nvidia V100  64 cpu   488Gb mem   16Gb gpu-mem AWS (spot)
        $ 7.9200/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS
        $ 9.8318/hour             8x Nvidia A100  96 cpu  1152Gb mem   80Gb gpu-mem AWS (spot)
        $13.0360/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS
        $14.4000/hour       16x Nvidia Tesla K80  64 cpu   732Gb mem   12Gb gpu-mem AWS
        $24.4800/hour             8x Nvidia V100  64 cpu   488Gb mem   16Gb gpu-mem AWS
        $32.7726/hour             8x Nvidia A100  96 cpu  1152Gb mem   80Gb gpu-mem AWS
    Want to deep dive?

    Dive a little deeper and start exploring our API reference to get an idea of everything that's possible with the API:

    ELBO.AI
    Quick Start
    API Reference

    The abstract elbo.ElboModel

    An ElboModel is an abstract class that allows ELBO service to automatically checkpoint your training.

    Extending your model class

    Extend the abstractElboModel along with nn.Model in your PyTorch model class. With this you will be required to implement two methods:

    The elbo.ElboEpochIterator

    The iterator class is a decorator on top of a traditional Python iterator. Add this to your training epoch loop.

    Usage

    The elbo.EpochIterator takes the following arguments:

    • The range of epochs usually - range(0, num_epochs)

    The ELBO Tracker

    The ELBO tracker allows you to monitor your ML tasks on your Phone using the ELBO Tracker App

    The ELBO Tracker API is an easy way to monitor your tasks on your Phone. Using the API you can log messages, key metrics (numbers), and images. These would show up in your task list on your Phone.

    Please install the and log in with your to see the results of your tasks.

    To start instantiate an instance of the Tracker

    Where "Hello World" is the experiment name. Now to log a message just do:

    That's it! Similarly logging a metric or an image is as simple:

    And finally, upload the logs using:

    Make sure you don't forget this step. The upload_logs() API can be called as many times as you would like. Each time it will append to the existing logs. For example, if you are training a model, it may make sense to call this API every epoch.

    save_state - This method should save the state of the model and other state information needed.
  • load_state - This method should load the state of the model from the input directory.

  • Good to know: These methods will be called by the training loop at periodic intervals to keep saving the state of training. Please make sure anything thats needed for a training to resume from a previous checkpoint is saved and loaded through this method.

    class MNISTClassifier(ElboModel, nn.Module):
        def get_artifacts_directory(self):
            return 'artifacts'
    
        def save_state(self):
            model_path = os.path.join(self.get_artifacts_directory(), "mnist_model")
            torch.save(self.state_dict(), model_path)
            print(f"Saving model to {model_path}")
    
        def load_state(self, state_dir):
            model_path = os.path.join(self.get_artifacts_directory(), "mnist_model")
            print(f"Loading model from {model_path}")
            self.load_state_dict(torch.load(model_path))
  • The PyTorch model it is training

  • save_state_interval How often should the model state be saved to artifacts directory. A value of 5 means the model state will be saved every 5 epochs.

  • if __name__ == '__main__':
        print(f"Training MNIST classifier")
        train_data = datasets.MNIST("data", train=True, transform=transforms.ToTensor(), download=True)
        test_data = datasets.MNIST("data", train=False, transform=transforms.ToTensor(), download=True)
        model = MNISTClassifier()
        num_epochs = 10
    
        for epoch in elbo.elbo.ElboEpochIterator(range(0, num_epochs), model, save_state_interval=1):
            loss = train(model, train_data)
            print(f"Epoch = {epoch} Loss = {loss}")
    
        test(model, test_data)

    Once this is done, now you can see the results in your App!

    The task list view
    The task details view
    from elbo.tracker.tracker import TaskTracker
    tracker = TaskTracker("Hello World")
    tracker.log_message("Hi there! 👋")
        tracker.log_key_metric("Accuracy", 100.0)
        
        tracker.log_image("An AI generated image of a Cat 🐱", "images/aicat.png")
    ELBO Tracker App
    authentication token
    tracker.upload_logs()

    CLI Reference

    Use the command-line tool to run tasks, show task status, cancel tasks and SSH into tasks.

    Help

    elbo --help

    Start a notebook

    elbo notebook

    Run a task

    elbo run --config <config_file_path>

    Cancel a task

    elbo kill <task_id>

    Show task attributes

    elbo show <task_id>

    Download task artifacts

    elbo download <task_id>

    Show running task

    elbo ps -r

    SSH into a running task

    Show ELBO Server Status

    (.venv) joy@elbo ~> elbo
    Usage: elbo [OPTIONS] COMMAND [ARGS]...
    
      elbo.ai - Train more, pay less
    
    Options:
      --help  Show this message and exit.
    
    Commands:
      balance   Show the users balance
      create    Create an instance and get SSH access to it.
      download  Download the artifacts for the task.
      kill      Stop the task.
      login     Login to the ELBO service.
      notebook  Start a Jupyter Lab session.
      ps        Show list of all tasks.
      run       Submit a task specified by the config file.
      show      Show the task.
      ssh       SSH into the machine running the task.
      status    Get ELBO server status.
    (.venv) joy@elbo ~/p/elbo-examples (main)> elbo notebook
    elbo.client creating notebook using config at project [email protected]:elbo-ai/elbo-examples.git ...
    elbo.client cloning [email protected]:elbo-ai/elbo-examples.git to /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90 ...
    elbo.client Submitting notebook run config : /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90/notebook/elbo.yaml
    elbo.client is starting 'Start a jupyter notebook' submission ...
    elbo.client Hey Anu 👋, welcome!
    elbo.client is uploading sources from /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90/notebook/....
    elbo.client upload successful.
    
    elbo.client number of compute choices - 28
    ? Please choose:  $ 0.4200/hour         Nvidia Quadro 4000   2 cpu     4Gb mem    8Gb gpu-mem FluidStack
    elbo.client compute node ip 216.153.51.67
    elbo.client task with ID 125 is submitted successfully.
    elbo.client ----------------------------------------------
    elbo.client ssh using - ssh [email protected] -p 2222
    elbo.client scp using - scp [email protected] -p 2222
    elbo.client password: BZ7qNxpVJAsAXEequQ
    elbo.client ----------------------------------------------
    
    elbo.client here are URLS for task logs ...
    elbo.client setup logs        - http://216.153.51.67/setup
    elbo.client requirements logs - http://216.153.51.67/requirements
    elbo.client task logs         - http://216.153.51.67/task
    
    elbo.client TIP: 💡 see task details with command: `elbo show 125`
    
    elbo.client ⏳ It may take a minute or two for the node to be reachable.
    
    elbo.client node started ..
    
    elbo.client Notebook URL = http://216.153.51.67:8080/?token=5824d0cfbbc3ed1710969d4cfe8404c6dfdcc37e206d931d
    (.venv) joy@elbo ~/p/elbo-examples (main)> elbo run --config pytorch/mnist_classifier/elbo.yaml
    elbo.client is starting 'Train MNIST Classifier' submission ...
    elbo.client Hey Anu 👋, welcome!
    elbo.client is uploading sources from pytorch/mnist_classifier/....
    elbo.client upload successful.
    
    elbo.client number of compute choices - 27
    ? Please choose: (Use arrow keys)
     »  $ 0.0028/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS (spot)
        $ 0.0150/hour     Standard (for testing)   1 cpu     2Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 0.0770/hour        Micro (for testing)   2 cpu     1Gb mem    0Gb gpu-mem AWS
        $ 0.2700/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS (spot)
        $ 0.7220/hour               Nvidia A4000   2 cpu     4Gb mem   16Gb gpu-mem TensorDock
        $ 0.9000/hour           Nvidia Tesla K80   4 cpu    61Gb mem   12Gb gpu-mem AWS
        $ 0.9180/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS (spot)
        $ 0.9200/hour         Nvidia Quadro 5000   2 cpu     4Gb mem   16Gb gpu-mem FluidStack
        $ 0.9600/hour               Nvidia A5000   2 cpu    16Gb mem   24Gb gpu-mem TensorDock
        $ 1.4940/hour                 Nvidia A40   2 cpu    12Gb mem   48Gb gpu-mem TensorDock
        $ 1.5000/hour         Nvidia Quadro 6000   8 cpu    32Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 1.5140/hour               Nvidia A6000   2 cpu    16Gb mem   48Gb gpu-mem TensorDock
        $ 2.1600/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS (spot)
        $ 3.0000/hour      2x Nvidia Quadro 6000  16 cpu    64Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 3.0600/hour                Nvidia V100   8 cpu    61Gb mem   16Gb gpu-mem AWS
        $ 3.6720/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS (spot)
        $ 3.7460/hour             7x Nvidia V100   6 cpu     8Gb mem   16Gb gpu-mem TensorDock
        $ 4.3200/hour       16x Nvidia Tesla K80  64 cpu   732Gb mem   12Gb gpu-mem AWS (spot)
        $ 4.5000/hour      3x Nvidia Quadro 6000  20 cpu    96Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 6.0000/hour      4x Nvidia Quadro 6000  24 cpu   128Gb mem    0Gb gpu-mem Linode (~ 9 mins to provision)
        $ 7.3440/hour             8x Nvidia V100  64 cpu   488Gb mem   16Gb gpu-mem AWS (spot)
        $ 7.9200/hour        8x Nvidia Tesla K80  32 cpu   488Gb mem   12Gb gpu-mem AWS
        $ 9.8318/hour             8x Nvidia A100  96 cpu  1152Gb mem   80Gb gpu-mem AWS (spot)
        $13.0360/hour             4x Nvidia V100  32 cpu   244Gb mem   16Gb gpu-mem AWS.
    (.venv) joy@elbo ~/p/elbo-examples (main)> elbo kill 153
    elbo.client Stopping task - 153
    elbo.client Task with id=153 is marked for cancellation.
    (.venv) joy@elbo ~/p/elbo-examples (main)> elbo show 123
    elbo.client Fetching task - 123
    elbo.client Task with id = 123:
    Billed Cost             : 0.2100000
    Billed Upto Time        : 03/07/22 12:58
    Bucket Key              : [email protected]/elbo-archive-26b7975b.tgz
    Completion Time         : 03/07/22 12:58
    Compute Type            : FluidStack None Nvidia Quadro 4000x1(8Gb) CPU=2(4Gb) Cost=0.42 Cost/Transistor=0.028767123287671233 CUDA Cores=2304
    Config File Path        : None
    Cost Per Hour           : 0.4200000
    Created Time            : 03/07/22 12:27
    Customer Billed         : True
    Instance ID             : recbPXDkeuR3SBTV7
    Instance Type           : Dedicated
    Keep Alive              : True
    Last Modified Time      : 03/07/22 12:26
    Name                    : Start a jupyter notebook
    Password                : ou4zebZ2XCoaMhdDrQ
    Previous Task ID        : None
    Provider                : FluidStack
    Record ID               : 185
    Requirements Log Path   : http://216.153.51.67/requirements
    Run Time                : 00h:31m:28s
    SSH Only                : False
    Session ID              : 6381c2c835e340f6957542720dee8d13
    Setup Log Path          : http://216.153.51.67/setup
    Status                  : Archived
    Submission Time         : 03/07/22 12:26
    Target File Path        : [email protected]/elbo-6381c2c835e340f6957542720dee8d13-artifacts.tgz
    Task ID                 : 123
    Task Log Path           : http://216.153.51.67/task
    Total Cost              : 0.2170000
    User ID                 : [email protected]
    ip                      : 216.153.51.67
    (.venv) joy@elbo ~/p/elbo-examples (main)> elbo download 159
    elbo.client Downloading Artifacts for - 159
    elbo.client Artifacts for task id = 159 downloaded to /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmp8uetxm69/elbo-3dc59be0e9b545378b6a679345175f1c-artifacts.tgz
    (.venv) joy@elbo ~> elbo ps -r
    elbo.client your running tasks:
    +---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
    | Task ID | Compute Type | Cost Per Hour | Provider |     Start Time      | Status  |   Submission Time   |       Task Name        | Completion Time | Run Time | Total Cost |
    +---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
    |   153   | Economy GPU  |    0.0028     |   AWS    | 02/04/2022 14:26 PM | Running | 02/04/2022 14:25 PM |    SSH only session    |                 |          |            |
    |   155   | Economy GPU  |    0.0028     |   AWS    | 02/04/2022 14:29 PM | Running | 02/04/2022 14:28 PM |    SSH only session    |                 |          |            |
    |   156   | Economy GPU  |     0.27      |   AWS    | 02/04/2022 14:31 PM | Running | 02/04/2022 14:29 PM | Train MNIST Classifier |                 |          |            |
    |   158   | Economy GPU  |     0.27      |   AWS    | 02/04/2022 15:05 PM | Running | 02/04/2022 15:04 PM | Train MNIST Classifier |                 |          |            |
    |   159   | Economy GPU  |    0.0028     |   AWS    | 02/04/2022 15:45 PM | Running | 02/04/2022 15:45 PM | Train MNIST Classifier |                 |          |            |
    +---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
    (.venv) joy@elbo ~> elbo ssh 159
    elbo.client Trying to SSH into task 159...
    elbo.client SSH:
    elbo.client Running Command : ssh [email protected] -p 2222
    elbo.client Enter this password when prompted: elbo
    Warning: Permanently added '[44.234.188.107]:2222' (ED25519) to the list of known hosts.
    [email protected]'s password:
    Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1061-aws x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
    This system has been minimized by removing packages and content that are
    not required on a system that users do not log into.
    
    To restore this content, you can run the 'unminimize' command.
    
    The programs included with the Ubuntu system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.
    
    Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
    applicable law.
    
    root@6b63a53b9691:~#
    (.venv) joy@elbo ~> elbo status
    elbo.client Membership: ✅
    elbo.client Database  : ✅
    elbo.client Server    : ✅