The first challenge is dealing with shared memory accesses. Shared memory accesses in UPC are usually converted to calls to the runtime system (RTS). xlupc implements a compiler optimization called locality analysis that determines at compile time which array accesses are local. The compiler then inserts memory loads and stores to local adresses instead of inserting the expensive RTS calls. The second challenge is dealing with the "upc_forall" control construct provided by UPC. The upc_forall statement includes an 'affinity test' to determine, at runtime, the iterations to be executed by each thread. We present a compile-time analysis implemented in xlupc to eliminate affinity tests when possible. Simple benchmark results show that the two optimizations applied together can yield substantial (sometimes 1000% or more) improvement in the execution time of the application.