• 0 Posts
  • 6 Comments
Joined 2 years ago
cake
Cake day: June 16th, 2023

help-circle


  • Dylan’s just being deliberately obtuse. Deepseek developed a way to increase training efficiency and backed it up by quoting the training cost in terms of the market price of the GPU time. They didn’t include the cost of the rest of their datacenter, researcher salaries, etc., because why would you include those numbers when evaluating model training efficiency???

    The training efficiency improvement passes the sniff test based on the theory in their paper, and people have done back of the envelope calculations that also agree with the outcome. There’s little reason to doubt it. In fact people have made the opposite criticism, that none of Deepseek’s optimizations are individually groundbreaking and all they did is “merely engineering” in terms of putting a dozen or so known optimization ideas together.