I mentioned earlier that you want a weaker ordering constraint for implementing a mutex. This is the main use case for the acquire-and-release constraints. They provide a pair of boundaries around a set of operations.
If one thread performs an acquire operation on an address and then a release operation, anything that happens between the two must be visible to other threads when they perform an acquire operation on the same memory address. More formally, this design guarantees that all loads written after the acquire actually take place after it, and that all stores written before the release are committed before the release.
When implementing a mutex, for example, you'll perform an acquire operation when locking and a release operation when unlocking. This approach ensures that any updates to data structures protected by the mutex are visible to other threads when they hold the mutex.
For efficiency, there are a couple of extra variants. One is consume. This is a (much!) weaker form of an acquire ordering. Like an acquire ordering, it ensures that loads reordered after it are not moved before it, but only in cases where they depend on that value. For example, if you perform a consume operation to load a pointer and then you dereference it, the dereferencing won't be moved before the consume.
The other variant is a combined acquire-and-release. The rationale for including this in the specification proposed using a bitfield for a large set of locks as a use case. You would use a single acquire-and-release operation when updating a word in the bitfield. For example, you could protect every field in a structure with one bit in a bitfield, and then use a single compare-and-exchange operation with acquire-release semantics to release the locks on some fields, and acquire-and-release others.