How Data Drives LLM Pretraining: Methods, Tips, and Best Practices