This is thanks to the reference collapsing rules. Assume that U is a non-reference type; then:
T = U T & = U & T && = U &&
If T = U & , then T & = U & and T && = U & .
T = U && T & = U & T && = U &&
Therefore, if your function argument can bind to an lvalue reference of type U, then T must be deduced as U & in order for T && to become U &, and this is the only choice, since lvalues cannot bind to rvalue references. On the other hand, if your argument is an rvalue of type U, then T is deduced as U so that T && becomes U && and your argument can bind.
The key point is that matching reference type is T && (and not T!). However, since arg itself is a named variable and thus an lvalue, you must use std::forward<T>(arg) to create an expression that's identical to the one with which your function was called.