LLM pretraining data-mixture sankey | ChatGPT Image 2 Prompt | BgRemovit