Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The iterator class is a decorator on top of a traditional Python iterator. Add this to your training epoch loop.
Making ML stuff cheap and easy 💪
is a service that makes training of ML models easier and cheaper. Use our service to train your models, receive timely training notifications, choose compute types that fit your budget and integrate with our API to add the service to your workflow.
The ELBO tracker allows you to monitor your ML tasks on your Phone using the ELBO Tracker App
The ELBO Tracker API is an easy way to monitor your tasks on your Phone. Using the API you can log messages, key metrics (numbers), and images. These would show up in your task list on your Phone.
To start instantiate an instance of the Tracker
Where "Hello World" is the experiment name. Now to log a message just do:
That's it! Similarly logging a metric or an image is as simple:
And finally, upload the logs using:
Make sure you don't forget this step. The upload_logs()
API can be called as many times as you would like. Each time it will append to the existing logs. For example, if you are training a model, it may make sense to call this API every epoch.
Once this is done, now you can see the results in your App!
Please install the and log in with your to see the results of your tasks.
An ElboModel is an abstract class that allows ELBO service to automatically checkpoint your training.
Extend the abstractElboModel
along with nn.Model
in your PyTorch model class. With this you will be required to implement two methods:
save_state
- This method should save the state of the model and other state information needed.
load_state
- This method should load the state of the model from the input directory.
We currently support only PyTorch, but please out to us if you need other frameworks. There are two main concepts:
Here is the guide to help setup ELBO environment in your local machine.
Your API requests are authenticated using API keys. Any request that doesn't include an API key will return an HTTP Authentication error.
And create an environment using:
or if virtualenv
is not in path:
This creates a virtual Python environment in the .venv
folder. To activate this environment use the command:
Or the following if you are using the fish
shell:
The best way to interact with our API is to use our elbo
library. You can install it using the command line below:
Use the command line tool to login.
Submit the sample task:
Here is a sample output of the command that prompts with a list of compute options from our providers:
Thats it! 🥳 Monitor your task progression using elbo show <task_id>
.
Good to know: We are just getting started with this service and are actively building it. If you face any problems with the service or API, please reach out to us at
for an account (with a 14 day trial period). You can get the API key from your at any time on the website.
It's better to run Python in a virtual environment or use . To install your virtual environment run:
This will prompt you to enter your token. The token can be obtained by logging into the ELBO .
Try out one of the sample ML submission from our Github repository. First clone the repository:
Use the command-line tool to run tasks, show task status, cancel tasks and SSH into tasks.
The ELBO configuration is specified in YAML in a configuration file. Let's look at its contents.
The configuration file, typically named elbo.yaml
has the following properties:
name
The name of your ML training task
"Hello, ELBO 💪"
gpu_class
The class of GPU you want to request. This can be one of the following:
Economy
- Economy class GPUs - Tesla K80, Tesla M60 etc. These can be used for simple training tasks or just for testing purposes. Usually, these cost less than $1 per hour.
MidRange
- Mid range GPUs - V100s or equivalent. These are more powerful GPUs and can be used for more compute-intensive tasks. These GPUs also have more GPU RAM (24Gb+) which is useful in generative models.
HighEnd
- These are the latest and greatest GPU compute environment. Typically an Nvidia A100. These can be very expensive ~ $9 - $30 / hour depending on usage.
All
- This options shows all the GPUs options that are available.
MidRange
setup
(Optional)
A setup script that will be run prior to calling the training code
sudo apt-get install fish
requirements
(Optional)
A requirements.txt
file path that lists all the dependencies of the training code.
run
The main training code. The task execution will call this file directly.
main.py
task_dir
The directory where this task is present. Usually the current directory. This directory will be zipped and uploaded for running the task.
Please make sure all the files and scripts needed to run the training code are present in this directory.
.
artifacts
The directory where your code will place model checkpoints, plots, generated files etc. The ELBO service will package this directory and save it for you to download after the task is complete.
artifacts
keep_alive
Setting this to True
will ensure the node running the job is not stopped after the job is complete.
True
Here is a sample configuration with comments on what each property means: