SmartNICs are on the rise as a packet processing platform, with the trend towards a uniform P4 programming model. However, unleashing SmartNIC packet processing performance in P4 is a formidable task. Traditional SmartNIC optimizations rely on low-level program tuning, but P4 abstractions operate at one level above. At the same time, today's P4 optimizations primarily focus on resource packing rather than performance tuning. We develop Pipeleon, an automated performance optimization framework for P4 programmable SmartNICs. We introduce techniques that are tailored to the performance characteristics of SmartNICs, and further leverage dynamic workload patterns for profile-guided optimization. Pipeleon pinpoints program hotspots at the P4 level and computes runtime optimization plans to specialize the program layout based on the latest profile. We have prototyped Pipeleon and applied it to optimize two popular P4 SmartNICs - -Nvidia BlueField2 and Netronome Agilio CX - -as well as a software SmartNIC emulator extended based on BMv2. Our results show that Pipeleon significantly improves SmartNIC packet processing performance in realistic scenarios.