Hello everyone! I post-trained small LLM (Qwen Instruct 2.5 0.5B) to speak like GenZ using SFT+RL. Training was done in Google Colab using the cheapest GPU runtime.
Honestly, you could probably get better results with simple prompting of frontier models, as I wasn’t optimizing for results with the smallest model, cheapest GPU and synthetic data. But it was a good and fun learning exercise: how you can actually run RL to improve models without huge investments using the current advancements in AI, tooling and infrastructure.
Overall, it cost me <$2 using Colab's pay as you go plan to train the model, which was surprisingly less than I expected.
Notebook example is on Github, feel free to give it a try in your own free plan!
Honestly, you could probably get better results with simple prompting of frontier models, as I wasn’t optimizing for results with the smallest model, cheapest GPU and synthetic data. But it was a good and fun learning exercise: how you can actually run RL to improve models without huge investments using the current advancements in AI, tooling and infrastructure.
Overall, it cost me <$2 using Colab's pay as you go plan to train the model, which was surprisingly less than I expected.
Notebook example is on Github, feel free to give it a try in your own free plan!