Google Interview Question

How would you optimize a large- scale model for real-time inference while maintaining a strict latency budget