Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Making ML stuff cheap and easy 💪
Looking for image generation API, sign up here 👇
ELBO.AI is a service that makes training of ML models easier and cheaper. Use our service to train your models, receive timely training notifications, choose compute types that fit your budget and integrate with our API to add the service to your workflow.
Jump in to the quick start docs and get making your first task submission:
Dive a little deeper and start exploring our API reference to get an idea of everything that's possible with the API:
Here is the guide to help setup ELBO environment in your local machine.
Your API requests are authenticated using API keys. Any request that doesn't include an API key will return an HTTP Authentication error.
Sign up for an account (with a 14 day trial period). You can get the API key from your here at any time on the website.
It's better to run Python in a virtual environment or use conda. To install your virtual environment run:
pip3 install virtualenv virtualenvwrapper
And create an environment using:
virtualenv -p python3 .venv
or if virtualenv
is not in path:
~/Library/Python/3.9/bin/virtualenv -p python3 .venv
This creates a virtual Python environment in the .venv
folder. To activate this environment use the command:
. .venv/bin/activate
Or the following if you are using the fish
shell:
. .venv/bin/activate.fish
The best way to interact with our API is to use our elbo
library. You can install it using the command line below:
pip3 install elbo --upgrade
Use the command line tool to login.
elbo login
This will prompt you to enter your token. The token can be obtained by logging into the ELBO welcome page.
Try out one of the sample ML submission from our examples Github repository. First clone the repository:
git clone https://github.com/elbo-ai/elbo-examples.git
cd elbo-examples/pytorch/mnist_classifier/
Submit the sample task:
elbo run --config elbo.yaml
Here is a sample output of the command that prompts with a list of compute options from our providers:
elbo.client is starting 'Train MNIST Classifier' submission ...
elbo.client Hey Anu 👋, welcome!
elbo.client is uploading sources from ....
elbo.client upload successful.
elbo.client number of compute choices - 28
? Please choose: (Use arrow keys)
» $ 0.0028/hour Micro (for testing) 2 cpu 1Gb mem 0Gb gpu-mem AWS (spot)
$ 0.0150/hour Standard (for testing) 1 cpu 2Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 0.0770/hour Micro (for testing) 2 cpu 1Gb mem 0Gb gpu-mem AWS
$ 0.2700/hour Nvidia Tesla K80 4 cpu 61Gb mem 12Gb gpu-mem AWS (spot)
$ 0.6100/hour Nvidia Quadro 4000 16 cpu 32Gb mem 8Gb gpu-mem TensorDock
$ 0.9000/hour Nvidia Tesla K80 4 cpu 61Gb mem 12Gb gpu-mem AWS
$ 0.9180/hour Nvidia V100 8 cpu 61Gb mem 16Gb gpu-mem AWS (spot)
$ 0.9200/hour Nvidia Quadro 5000 2 cpu 4Gb mem 16Gb gpu-mem FluidStack
$ 0.9600/hour Nvidia A5000 2 cpu 16Gb mem 24Gb gpu-mem TensorDock
$ 1.4900/hour Nvidia A4000 12 cpu 64Gb mem 16Gb gpu-mem FluidStack
$ 1.4940/hour Nvidia A40 2 cpu 12Gb mem 48Gb gpu-mem TensorDock
$ 1.5000/hour Nvidia Quadro 6000 8 cpu 32Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 1.5140/hour Nvidia A6000 2 cpu 16Gb mem 48Gb gpu-mem TensorDock
$ 2.1600/hour 8x Nvidia Tesla K80 32 cpu 488Gb mem 12Gb gpu-mem AWS (spot)
$ 3.0000/hour 2x Nvidia Quadro 6000 16 cpu 64Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 3.0600/hour Nvidia V100 8 cpu 61Gb mem 16Gb gpu-mem AWS
$ 3.6720/hour 4x Nvidia V100 32 cpu 244Gb mem 16Gb gpu-mem AWS (spot)
$ 3.7460/hour 7x Nvidia V100 6 cpu 8Gb mem 16Gb gpu-mem TensorDock
$ 4.3200/hour 16x Nvidia Tesla K80 64 cpu 732Gb mem 12Gb gpu-mem AWS (spot)
$ 4.5000/hour 3x Nvidia Quadro 6000 20 cpu 96Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 6.0000/hour 4x Nvidia Quadro 6000 24 cpu 128Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 7.3440/hour 8x Nvidia V100 64 cpu 488Gb mem 16Gb gpu-mem AWS (spot)
$ 7.9200/hour 8x Nvidia Tesla K80 32 cpu 488Gb mem 12Gb gpu-mem AWS
$ 9.8318/hour 8x Nvidia A100 96 cpu 1152Gb mem 80Gb gpu-mem AWS (spot)
$13.0360/hour 4x Nvidia V100 32 cpu 244Gb mem 16Gb gpu-mem AWS
$14.4000/hour 16x Nvidia Tesla K80 64 cpu 732Gb mem 12Gb gpu-mem AWS
$24.4800/hour 8x Nvidia V100 64 cpu 488Gb mem 16Gb gpu-mem AWS
$32.7726/hour 8x Nvidia A100 96 cpu 1152Gb mem 80Gb gpu-mem AWS
Thats it! 🥳 Monitor your task progression using elbo show <task_id>
.
An ElboModel is an abstract class that allows ELBO service to automatically checkpoint your training.
Extend the abstractElboModel
along with nn.Model
in your PyTorch model class. With this you will be required to implement two methods:
save_state
- This method should save the state of the model and other state information needed.
load_state
- This method should load the state of the model from the input directory.
class MNISTClassifier(ElboModel, nn.Module):
def get_artifacts_directory(self):
return 'artifacts'
def save_state(self):
model_path = os.path.join(self.get_artifacts_directory(), "mnist_model")
torch.save(self.state_dict(), model_path)
print(f"Saving model to {model_path}")
def load_state(self, state_dir):
model_path = os.path.join(self.get_artifacts_directory(), "mnist_model")
print(f"Loading model from {model_path}")
self.load_state_dict(torch.load(model_path))
Use the command-line tool to run tasks, show task status, cancel tasks and SSH into tasks.
elbo --help
(.venv) joy@elbo ~> elbo
Usage: elbo [OPTIONS] COMMAND [ARGS]...
elbo.ai - Train more, pay less
Options:
--help Show this message and exit.
Commands:
balance Show the users balance
create Create an instance and get SSH access to it.
download Download the artifacts for the task.
kill Stop the task.
login Login to the ELBO service.
notebook Start a Jupyter Lab session.
ps Show list of all tasks.
run Submit a task specified by the config file.
show Show the task.
ssh SSH into the machine running the task.
status Get ELBO server status.
elbo notebook
(.venv) joy@elbo ~/p/elbo-examples (main)> elbo notebook
elbo.client creating notebook using config at project [email protected]:elbo-ai/elbo-examples.git ...
elbo.client cloning [email protected]:elbo-ai/elbo-examples.git to /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90 ...
elbo.client Submitting notebook run config : /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90/notebook/elbo.yaml
elbo.client is starting 'Start a jupyter notebook' submission ...
elbo.client Hey Anu 👋, welcome!
elbo.client is uploading sources from /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmpfl7mum90/notebook/....
elbo.client upload successful.
elbo.client number of compute choices - 28
? Please choose: $ 0.4200/hour Nvidia Quadro 4000 2 cpu 4Gb mem 8Gb gpu-mem FluidStack
elbo.client compute node ip 216.153.51.67
elbo.client task with ID 125 is submitted successfully.
elbo.client ----------------------------------------------
elbo.client ssh using - ssh [email protected] -p 2222
elbo.client scp using - scp [email protected] -p 2222
elbo.client password: BZ7qNxpVJAsAXEequQ
elbo.client ----------------------------------------------
elbo.client here are URLS for task logs ...
elbo.client setup logs - http://216.153.51.67/setup
elbo.client requirements logs - http://216.153.51.67/requirements
elbo.client task logs - http://216.153.51.67/task
elbo.client TIP: 💡 see task details with command: `elbo show 125`
elbo.client ⏳ It may take a minute or two for the node to be reachable.
elbo.client node started ..
elbo.client Notebook URL = http://216.153.51.67:8080/?token=5824d0cfbbc3ed1710969d4cfe8404c6dfdcc37e206d931d
elbo run --config <config_file_path>
(.venv) joy@elbo ~/p/elbo-examples (main)> elbo run --config pytorch/mnist_classifier/elbo.yaml
elbo.client is starting 'Train MNIST Classifier' submission ...
elbo.client Hey Anu 👋, welcome!
elbo.client is uploading sources from pytorch/mnist_classifier/....
elbo.client upload successful.
elbo.client number of compute choices - 27
? Please choose: (Use arrow keys)
» $ 0.0028/hour Micro (for testing) 2 cpu 1Gb mem 0Gb gpu-mem AWS (spot)
$ 0.0150/hour Standard (for testing) 1 cpu 2Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 0.0770/hour Micro (for testing) 2 cpu 1Gb mem 0Gb gpu-mem AWS
$ 0.2700/hour Nvidia Tesla K80 4 cpu 61Gb mem 12Gb gpu-mem AWS (spot)
$ 0.7220/hour Nvidia A4000 2 cpu 4Gb mem 16Gb gpu-mem TensorDock
$ 0.9000/hour Nvidia Tesla K80 4 cpu 61Gb mem 12Gb gpu-mem AWS
$ 0.9180/hour Nvidia V100 8 cpu 61Gb mem 16Gb gpu-mem AWS (spot)
$ 0.9200/hour Nvidia Quadro 5000 2 cpu 4Gb mem 16Gb gpu-mem FluidStack
$ 0.9600/hour Nvidia A5000 2 cpu 16Gb mem 24Gb gpu-mem TensorDock
$ 1.4940/hour Nvidia A40 2 cpu 12Gb mem 48Gb gpu-mem TensorDock
$ 1.5000/hour Nvidia Quadro 6000 8 cpu 32Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 1.5140/hour Nvidia A6000 2 cpu 16Gb mem 48Gb gpu-mem TensorDock
$ 2.1600/hour 8x Nvidia Tesla K80 32 cpu 488Gb mem 12Gb gpu-mem AWS (spot)
$ 3.0000/hour 2x Nvidia Quadro 6000 16 cpu 64Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 3.0600/hour Nvidia V100 8 cpu 61Gb mem 16Gb gpu-mem AWS
$ 3.6720/hour 4x Nvidia V100 32 cpu 244Gb mem 16Gb gpu-mem AWS (spot)
$ 3.7460/hour 7x Nvidia V100 6 cpu 8Gb mem 16Gb gpu-mem TensorDock
$ 4.3200/hour 16x Nvidia Tesla K80 64 cpu 732Gb mem 12Gb gpu-mem AWS (spot)
$ 4.5000/hour 3x Nvidia Quadro 6000 20 cpu 96Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 6.0000/hour 4x Nvidia Quadro 6000 24 cpu 128Gb mem 0Gb gpu-mem Linode (~ 9 mins to provision)
$ 7.3440/hour 8x Nvidia V100 64 cpu 488Gb mem 16Gb gpu-mem AWS (spot)
$ 7.9200/hour 8x Nvidia Tesla K80 32 cpu 488Gb mem 12Gb gpu-mem AWS
$ 9.8318/hour 8x Nvidia A100 96 cpu 1152Gb mem 80Gb gpu-mem AWS (spot)
$13.0360/hour 4x Nvidia V100 32 cpu 244Gb mem 16Gb gpu-mem AWS.
elbo kill <task_id>
(.venv) joy@elbo ~/p/elbo-examples (main)> elbo kill 153
elbo.client Stopping task - 153
elbo.client Task with id=153 is marked for cancellation.
elbo show <task_id>
(.venv) joy@elbo ~/p/elbo-examples (main)> elbo show 123
elbo.client Fetching task - 123
elbo.client Task with id = 123:
Billed Cost : 0.2100000
Billed Upto Time : 03/07/22 12:58
Bucket Key : [email protected]/elbo-archive-26b7975b.tgz
Completion Time : 03/07/22 12:58
Compute Type : FluidStack None Nvidia Quadro 4000x1(8Gb) CPU=2(4Gb) Cost=0.42 Cost/Transistor=0.028767123287671233 CUDA Cores=2304
Config File Path : None
Cost Per Hour : 0.4200000
Created Time : 03/07/22 12:27
Customer Billed : True
Instance ID : recbPXDkeuR3SBTV7
Instance Type : Dedicated
Keep Alive : True
Last Modified Time : 03/07/22 12:26
Name : Start a jupyter notebook
Password : ou4zebZ2XCoaMhdDrQ
Previous Task ID : None
Provider : FluidStack
Record ID : 185
Requirements Log Path : http://216.153.51.67/requirements
Run Time : 00h:31m:28s
SSH Only : False
Session ID : 6381c2c835e340f6957542720dee8d13
Setup Log Path : http://216.153.51.67/setup
Status : Archived
Submission Time : 03/07/22 12:26
Target File Path : [email protected]/elbo-6381c2c835e340f6957542720dee8d13-artifacts.tgz
Task ID : 123
Task Log Path : http://216.153.51.67/task
Total Cost : 0.2170000
User ID : [email protected]
ip : 216.153.51.67
elbo download <task_id>
(.venv) joy@elbo ~/p/elbo-examples (main)> elbo download 159
elbo.client Downloading Artifacts for - 159
elbo.client Artifacts for task id = 159 downloaded to /var/folders/8f/vcfd13292kl6p93zxf1yypl40000gn/T/tmp8uetxm69/elbo-3dc59be0e9b545378b6a679345175f1c-artifacts.tgz
elbo ps -r
(.venv) joy@elbo ~> elbo ps -r
elbo.client your running tasks:
+---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
| Task ID | Compute Type | Cost Per Hour | Provider | Start Time | Status | Submission Time | Task Name | Completion Time | Run Time | Total Cost |
+---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
| 153 | Economy GPU | 0.0028 | AWS | 02/04/2022 14:26 PM | Running | 02/04/2022 14:25 PM | SSH only session | | | |
| 155 | Economy GPU | 0.0028 | AWS | 02/04/2022 14:29 PM | Running | 02/04/2022 14:28 PM | SSH only session | | | |
| 156 | Economy GPU | 0.27 | AWS | 02/04/2022 14:31 PM | Running | 02/04/2022 14:29 PM | Train MNIST Classifier | | | |
| 158 | Economy GPU | 0.27 | AWS | 02/04/2022 15:05 PM | Running | 02/04/2022 15:04 PM | Train MNIST Classifier | | | |
| 159 | Economy GPU | 0.0028 | AWS | 02/04/2022 15:45 PM | Running | 02/04/2022 15:45 PM | Train MNIST Classifier | | | |
+---------+--------------+---------------+----------+---------------------+---------+---------------------+------------------------+-----------------+----------+------------+
(.venv) joy@elbo ~> elbo ssh 159
elbo.client Trying to SSH into task 159...
elbo.client SSH:
elbo.client Running Command : ssh [email protected] -p 2222
elbo.client Enter this password when prompted: elbo
Warning: Permanently added '[44.234.188.107]:2222' (ED25519) to the list of known hosts.
[email protected]'s password:
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1061-aws x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
root@6b63a53b9691:~#
(.venv) joy@elbo ~> elbo status
elbo.client Membership: ✅
elbo.client Database : ✅
elbo.client Server : ✅
The ELBO tracker allows you to monitor your ML tasks on your Phone using the ELBO Tracker App
The ELBO Tracker API is an easy way to monitor your tasks on your Phone. Using the API you can log messages, key metrics (numbers), and images. These would show up in your task list on your Phone.
Please install the ELBO Tracker App and log in with your authentication token to see the results of your tasks.
To start instantiate an instance of the Tracker
from elbo.tracker.tracker import TaskTracker
tracker = TaskTracker("Hello World")
Where "Hello World" is the experiment name. Now to log a message just do:
tracker.log_message("Hi there! 👋")
That's it! Similarly logging a metric or an image is as simple:
tracker.log_key_metric("Accuracy", 100.0)
tracker.log_image("An AI generated image of a Cat 🐱", "images/aicat.png")
And finally, upload the logs using:
tracker.upload_logs()
Make sure you don't forget this step. The upload_logs()
API can be called as many times as you would like. Each time it will append to the existing logs. For example, if you are training a model, it may make sense to call this API every epoch.
Once this is done, now you can see the results in your App!
The iterator class is a decorator on top of a traditional Python iterator. Add this to your training epoch loop.
The elbo.EpochIterator
takes the following arguments:
The range of epochs usually - range(0, num_epochs)
The PyTorch model it is training
save_state_interval
How often should the model state be saved to artifacts directory. A value of 5
means the model state will be saved every 5 epochs.
if __name__ == '__main__':
print(f"Training MNIST classifier")
train_data = datasets.MNIST("data", train=True, transform=transforms.ToTensor(), download=True)
test_data = datasets.MNIST("data", train=False, transform=transforms.ToTensor(), download=True)
model = MNISTClassifier()
num_epochs = 10
for epoch in elbo.elbo.ElboEpochIterator(range(0, num_epochs), model, save_state_interval=1):
loss = train(model, train_data)
print(f"Epoch = {epoch} Loss = {loss}")
test(model, test_data)
You can use ELBO Python API to automatically save the state of the training.
We currently support only PyTorch, but please reach out to us if you need other frameworks. There are two main concepts:
The ELBO configuration is specified in YAML in a configuration file. Let's look at its contents.
The configuration file, typically named elbo.yaml
has the following properties:
name
The name of your ML training task
"Hello, ELBO 💪"
gpu_class
The class of GPU you want to request. This can be one of the following:
Economy
- Economy class GPUs - Tesla K80, Tesla M60 etc. These can be used for simple training tasks or just for testing purposes. Usually, these cost less than $1 per hour.
MidRange
- Mid range GPUs - V100s or equivalent. These are more powerful GPUs and can be used for more compute-intensive tasks. These GPUs also have more GPU RAM (24Gb+) which is useful in generative models.
HighEnd
- These are the latest and greatest GPU compute environment. Typically an Nvidia A100. These can be very expensive ~ $9 - $30 / hour depending on usage.
All
- This options shows all the GPUs options that are available.
MidRange
setup
(Optional)
A setup script that will be run prior to calling the training code
sudo apt-get install fish
requirements
(Optional)
A requirements.txt
file path that lists all the dependencies of the training code.
run
The main training code. The task execution will call this file directly.
main.py
task_dir
The directory where this task is present. Usually the current directory. This directory will be zipped and uploaded for running the task.
Please make sure all the files and scripts needed to run the training code are present in this directory.
.
artifacts
The directory where your code will place model checkpoints, plots, generated files etc. The ELBO service will package this directory and save it for you to download after the task is complete.
artifacts
keep_alive
Setting this to True
will ensure the node running the job is not stopped after the job is complete.
True
Here is a sample configuration with comments on what each property means:
#
# ELBO Sample Config File for MNIST Classifier Task
#
# All paths are relative to where the `elbo.yaml` file is placed
name: "Train MNIST Classifier"
# The GPU class to use - Economy, MidRange, HighEnd, All
gpu_class: Economy
# The script to run for setting up the environment. For example - installing packages
# on Ubuntu
setup: setup.sh
# The PIP requirements file. ELBO will install the requirements specified in this
# file before launching the task.
requirements: requirements.txt
# The main entry point in the task. Once the script exits or terminates, the task
# is considered complete.
run: main.py
# The task directory, relative to this file. This directory will be tar-balled and sent to ELBO task executor for
# execution
task_dir: .
# Artifacts directory. This is the directory that will be copied over as output. All model related files -
# checkpoints, generated samples, evaluation results etc. should be placed in this directory.
artifacts: ~/artifacts
pandas
numpy
torch
pytorch_lightning
tqdm
torchvision
wandb