5 comments

  • LuxBennu 2 hours ago
    I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.
    • MediaSquirrel 2 hours ago
      Memory usage increases quadratically with sequence length. Therefore, using shorter sequences during fine-tuning can prevent memory explosions. On my 64GB RAM machine, I'm limited to input sequences of about 2,000 tokens, considering my average output for the fine-tuning task is around 1,000 tokens (~3k tokens total).
      • LuxBennu 6 minutes ago
        Ah that makes sense, quadratic scaling is brutal. So with 96gb i'd probably get somewhere around 4-5k total sequence length before hitting the wall, which is still pretty limiting for anything multimodal. Do you do any gradient checkpointing or is that not worth the speed tradeoff at these sizes?
  • craze3 2 hours ago
    Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too
  • yousifa 2 hours ago
    This is super cool, will definitely try it out! Nice work
  • dsabanin 2 hours ago
    Thanks for doing this. Looks interesting, I'm going to check it out soon.
    • MediaSquirrel 2 hours ago
      you are welcome! It was a fun side quest
  • pivoshenko 1 hour ago
    nice!