Deepmind’s AlphaCode AI Can Program Like Human Coders
December 14, 2022
DeepMind researchers have trained an AI to solve computer coding challenges as well as the average person. In a paper published last week in the journal Science, the group from Google’s AI division described how AlphaCode performed when pitted against human programmers, ranking in the top 54.3 percent in simulated tests, commensurate with “approximately human-level performance.” “This performance in competitions approximately corresponds to a novice programmer with a few months to a year of training,” according to Science, which says about half the humans who compete in coding contests could outperform the AI.
Since the challenges presented to humans in coding contests are usually quite specific, a big hurdle in training a code-capable AI was imparting enough general familiarity so that it could tackle a range of problems. As Ars Technica puts it, “an AI trained on one class of challenge would fail when asked to tackle an unrelated challenge.”
In generalizing the challenges, DeepMind approached them like language translation problems. “AlphaCode was designed to have two parts: one that ingested the description and converted it to an internal representation, and a second that used the internal representation to generate functional code,” Ars Technica explains.
The system was then asked to process more than 700GB of code on GitHub (which might not sound like much, but code is dense).
DeepMind mounted its own programming contests, feeding the results into the system: the problem description, the working code, the code that failed, and the test cases they were checked against.
“Similar approaches had been tried previously, but DeepMind indicates that it was just able to throw more resources at the training,” writes Ars Technica, quoting the Science article as saying that “a key driver of AlphaCode’s performance came from scaling the number of model samples to orders of magnitude more than previous work.”
Even then, “on its own, the resulting code wasn’t always especially good,” with more than 40 percent of the solutions either exhausting the system’s memory or failing to produce a solution within a reasonable time frame. After some filtering, about one percent of AlphaCode’s efforts passed the basic test.
A second level of filtering involved code clustering: “solutions that worked would often be similar to each other and thus form a cluster of similar code within the vast sea of potential code,” notes Ars Technica, explaining that since incorrect answers are randomly distributed, “the system identified the 10 largest clusters of related code that produced the same output given a set of inputs and picked a single example from each cluster.”
The end result produced successful, working programs in just below a third of the cases. The task was “resource-intensive,” according to Ars Technica, which writes that “training the system took over 2,000 petaflops and sucked down about 16 times the annual power budget of a U.S. household.”
The researchers concluded that even more resources could further improve the results.
No Comments Yet
You can be the first to comment!
Sorry, comments for this entry are closed at this time.