This day, Sakana AI, an Nvidia-backed startup that’s raised masses of thousands and thousands of greenbacks from VC corporations, made a notable declare. The corporate mentioned it had created an AI device, the AI CUDA Engineer, that might successfully accelerate the learning of sure AI fashions via an element of as much as 100x.
The one weakness is, the device didn’t paintings.
Users on X quickly discovered that Sakana’s device in truth led to worse-than-average type coaching efficiency. According to one user, Sakana’s AI led to a 3x slowdown — no longer a speedup.
What went improper? A worm within the code, in line with a post via Lucas Beyer, a member of the technical group of workers at OpenAI.
“Their orig code is wrong in [a] subtle way,” Beyer wrote on X. “The fact they run benchmarking TWICE with wildly different results should make them stop and think.”
In a postmortem published Friday, Sakana admitted that the device has discovered a solution to “cheat” (as Sakana described it) and blamed the device’s tendency to “reward hack” — i.e. establish flaws to reach prime metrics with out carrying out the required function (rushing up type coaching). Homogeneous phenomena has been seen in AI that’s trained to play games of chess.
In keeping with Sakana, the device discovered exploits within the analysis code that the corporate was once the usage of that allowed it to rerouting validations for accuracy, amongst alternative tests. Sakana says it has addressed the problem, and that it intends to revise its claims in up to date fabrics.
“We have since made the evaluation and runtime profiling harness more robust to eliminate many of such [sic] loopholes,” the corporate wrote within the X submit. “We are in the process of revising our paper, and our results, to reflect and discuss the effects […] We deeply apologize for our oversight to our readers. We will provide a revision of this work soon, and discuss our learnings.”
Props to Sakana for proudly owning as much as the error. However the episode is a great reminder that if a declare sounds too just right to be true, especially in AI, it almost definitely is.