Direct localization (DLOC) methods, which use the observed data to localize a source at an unknown position in a one-step procedure, generally outperform their indirect two-step counterparts (e.g., using time-difference of arrivals). However, underwater acoustic DLOC methods require prior knowledge of the environment, and are computationally costly, hence slow. We propose, what is to the best of our knowledge, the first data-driven DLOC method. Inspired by classical and contemporary optimal model-based DLOC solutions, and leveraging the capabilities of convolutional neural networks (CNNs), we devise a holistic CNN-based solution. Our method includes a specifically-tailored input structure, architecture, loss function, and a progressive training procedure, which are of independent interest in the broader context of machine learning. We demonstrate that our method outperforms attractive alternatives, and asymptotically matches the performance of an oracle optimal model-based solution.