The future of AI is distributed, said Ion Stoica, co-founder, executive chairman and president of Anyscale on the first day of VB Transform. And that’s because model complexity shows no signs of slowing down.
“For the past couple of years, the compute requirements to train a state-of-the-art model, depending on the data set, grow between 10 times and 35 times every 18 months,” he said.
Just five years ago the largest models were fitting on a single GPU; fast forward to today and just to fit the parameters of the most advanced models, it takes hundreds or even thousands of GPUs. PaLM, or the Pathway Language Model from Google, has 530 billion parameters — and that’s only about half of the largest, at more than 1 trillion parameters. The company uses more than 6,000 GPUs to train the most recent.
Even if these models stopped growing and GPUs continued to progress at the same rapid rate as in previous years, it would still take about 19 years before it’s sophisticated enough to run these state-of-the-art models on a single GPU, Stoica added.
“Fundamentally, this is a huge gap, which is growing month by month, between the demands of machine learning applications and the capabilities of a single processor or a single server,” he said. “There’s no other way to support these workloads than distributing them. It’s as simple as that. Writing these distributed applications is hard. It’s even harder than before, actually.”
The unique challenges of scaling applications and workloads
There are multiple stages in building a machine learning application, from data labeling and preprocessing, to training, hyperparameter tuning, serving, reinforcement learning and so on — and each of these stages need to scale. Typically each step requires a different distributed system. In order to build end-to-end machine learning pipelines or applications, it’s now necessary to stitch these systems together, but to also manage each of them. And it requires development against a variety of APIs, too. All of this adds a tremendous amount of complexity to an AI/ML project.
The mission of the open-source Ray Distributed Computing project, and Anyscale, is to make scaling of these distributed computing workloads easier, Stoica said.
“With Ray, we tried to provide a compute framework on which you can build these applications end-to-end,” he said. “W Anyscale is basically providing a hosted, managed Ray, and of course security features and tools to make the development, deployment and management of these applications easier.”
Hybrid stateful and stateless computation
The company recently launched a serverless product, which abstracts away the required functions, eliminating the need to worry where these functions are going to run, and easing the burden on developers and programmers as they scale. But with a transparent infrastructure, functions are limited in their functionality — they do computations, write the data back on S3, for instance, and then they’re gone — but many applications require stateful operators.
For instance, training, which takes a great deal of data, would become far too expensive if they were being written back to S3 after each iteration, or even just moved from the GPU memory into the machine memory, because of the overhead of getting the data in, and then also typically serializing and de-serializing that data.
“Ray, from day one, was also built around these kind of operators which can keep the state and can update the state continuously, which in software engineering lingo we call ‘actors,’” he says. “Ray has always supported this dual mode of this kind of stateless and stateful computation.”
What inning is AI implementation in?
There’s a temptation to say that AI implementation has finally reached the walking stage, shoved ahead in the AI transformation journey by the recent acceleration in digital growth — but we’ve just seen the tip of the iceberg, Stoica said. There’s still a gap in the current market size, compared to the opportunity — similar to the state of big data about 10 years ago.
“It’s taking time because the time [needed] is not only for developing tools,” he said. “It’s training people. Training experts. That takes even more time. If you look at big data and what happened, eight years ago a lot of universities started to provide degrees in data science. And of course there are a lot of courses now, AI courses, but I think that you’ll see more and more applied AI and data courses, of which there aren’t many today.”
Learn more about how distributed AI is helping companies ramp up their business strategy and catch up on all Transform sessions by registering for a free virtual pass right here.
Discussion about this post