Sorting is one of a set of fundamental problems in computer science. In this paper we present the first wait-free algorithm for sorting an input array of size N using P ≤ N processors to achieve optimal running time. Known sorting algorithms, when made wait-free through previously established transformation techniques have complexity O(log3 N). The randomized algorithm we present here, when run in the CRCW PRAM model executes in optimal O(log N) time when P = N and O(N log N/P) otherwise. The wait-free property guarantees that the sort will complete despite any delays or failures incurred by the processors. This is a very desirable property from an operating systems point of view, since it allows oblivious thread scheduling as well as thread creation and deletion, without fear of losing the algorithm's correctness. We further present a variant of the algorithm which is shown to suffer no more than O(√P) contention when run synchronously.