Meta synthetically generated lots of PHP from Python for Llama 3 for training purposes. Meta writes a crazy amount of PHP internally.
Translation tends to be way easier than unconstrained generation for LLMs. But if you can translate and filter a large amount of code, you can learn to generate. If you also translate and run the unittests, you get another layer of error checking.
https://arxiv.org/abs/2407.21783
See figure 8.