- 3.1 Motivation for Virtual Threads
- 3.2 Virtual Thread Execution Model
- 3.3 Using Thread Class to Create Virtual Threads
- 3.4 Using Thread Builders to Create Virtual Threads
- 3.5 Using Thread Factory to Create Threads
- 3.6 Using Thread Executor Services
- 3.7 Scalability of Throughput with Virtual Threads
- 3.8 Best Practices for Using Virtual Threads
- Review Questions
3.8 Best Practices for Using Virtual Threads
In this section, we summarize some of the dos and don’ts of using virtual threads. Ultimately, benchmarking the performance of the concurrent application is the best way to determine any gains from using virtual threads. However, use of virtual threads boosts the throughput of one-thread-per-task-based applications under the following conditions:
Sufficiently large number of virtual threads
Frequent short-lived blocking tasks
These two conditions result in a high ratio of number of virtual threads to number of platform threads and the virtual threads being frequently unmounted so that their carrier threads can be scheduled to mount other virtual threads that are ready to execute their tasks, thereby effectively increasing the throughput of the application.
Avoid Pinning of Virtual Threads
As we have seen, a virtual thread is designed so that it can be unmounted from its carrier thread on executing a blocking operation, thereby allowing the JVM thread scheduler to mount another virtual thread on the carrier thread. However, there are situations where it is not possible to unmount a virtual thread from its associated carrier thread—called pinning. The virtual thread monopolizes its carrier thread, preventing it from servicing other virtual threads. Pinning of a virtual thread to its carrier thread can potentially impact both the scalability and the performance of a concurrent application, especially if more virtual threads become progressively pinned and thereby their associated carried threads cannot service other virtual threads.
Pinning of a virtual thread to its carrier thread can primarily occur in the following two contexts:
When the virtual thread is running code inside a synchronized block or method.
When the virtual thread is calling a native method or a foreign function. (This topic is beyond the scope of this book and will not be discussed further.)
Note that pinning does not render the application incorrect. Carrier threads have bounded availability (i.e., only a finite number of them can be created) and since pinning reduces the number of carrier threads available for executing virtual threads, pinning can have negative impact on the scalability of the application, especially if it is frequent and long-lived. Note that a pinned virtual thread does not block its associated carrier thread unless a blocking operation is executed. If that happens, the associated carrier thread remains idle when blocked, further increasing the impact of pinning.
Example 3.8 illustrates both pinning of virtual threads in a synchronized block and how refactoring the code to use a reentrant lock can alleviate the problem. The example prints the schedule trace of carrier threads on which a virtual thread is mounted during the execution of its task.
In Example 3.8, a blocking operation is defined by the method blockingOp() at (3). The method returns a string of the form "worker-n -> worker-m" that identifies the scheduling of carrier threads the virtual thread was mounted on before and after the blocking operation, respectively.
Pinning in Synchronized Block
Example 3.8 defines a task at (4) that uses a synchronized block at (5) to implement a critical region. At the most only one thread can be executing the synchronized block. The code in the task traces the carrier threads that the virtual thread was mounted at various points in the code: before obtaining the lock of the synchronized object at (6), after obtaining the lock of the synchronized object at (7), executing the blocking operation at (8), and after the completion of the synchronized block at (9). Note that only one thread at a time can execute the synchronized block, other threads trying to obtain the lock of the synchronized object are blocked and have to wait their turn to execute the synchronized block.
If an application is run with the flag jdk.tracePinnedThreads having the value short or full on the command line, the JVM can print relevant stack trace information to identify a carrier thread that gets blocked while its virtual thread is pinned.
>java -Djdk.tracePinnedThreads=short MyApp
Example 3.8 is run with the above flag having the value short for tracing pinned threads. The scheduling trace of the carrier threads is printed at (10). For example, we see the following output for virtual thread #28:
Thread[#35,ForkJoinPool-1-worker-7,5,CarrierThreads]
vt.VTPinningDemo.lambda$0(VTPinningDemo.java:41) <== monitors:1
[10:27:31] INFO: vt #28: LockAcquiring(worker-7 -> worker-7) ->
BlockingOp(worker-7 -> worker-7) -> worker-7
The first two lines above show that carrier thread #35, having the name worker-7, was blocked during the execution of the blocking operation in the synchronized block. From the output we can see that virtual thread #28 is pinned to carrier thread #35 with the name worker-7 that is blocked.
The last two lines show that virtual thread #28 was pinned to carrier thread worker-7 during the entire execution of the synchronized block: when acquiring the lock of the synchronized object, during the blocking operation, and after the synchronized block. It is important to note that not only was the virtual thread pinned to its carrier thread, but the carrier thread was also blocked during the blocking operation. Pinning not only takes the associated carrier thread out of scheduling for other virtual threads, but during a blocking operation, it is also idle for the duration of the blocking period.
Similarly, pinning of the virtual threads can be traced in all runs of the task containing the synchronized block. Note that each virtual thread is assigned to a new carrier thread as virtual threads get pinned executing the synchronized block.
Example 3.8 Pinning of Virtual Threads
package vt;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.ReentrantLock;
import java.util.logging.Logger;
import java.util.stream.IntStream;
public class VTPinningDemo {
private static final Logger logger =
Logger.getLogger(VTPinningDemo.class.getName());
static {
System.setProperty("java.util.logging.SimpleFormatter.format",
"[%1$tT] %4$s: %5$s%n");
}
public static final int NUMBER_OF_VT = 8; // (1)
public static final int DURATION = 1000; // (2)
// Blocking operation:
private static String blockingOp() { // (3)
try {
var ctNameBefore = getCarrierThreadName();
TimeUnit.MILLISECONDS.sleep(DURATION);
var ctNameAfter = getCarrierThreadName();
return String.format("%s -> %s", ctNameBefore, ctNameAfter);
} catch (InterruptedException e) {
e.printStackTrace();
}
return "? -> ?";
}
// Task uses synchronized block:
static final Runnable task1 = () -> { // (4)
String ctBeforeLock = "", ctAfterLock = "", ctAfterSynch = "",
blockTrace = "";
ctBeforeLock = getCarrierThreadName(); // (5)
synchronized (VTPinningDemo.class) { // (6)
ctAfterLock = getCarrierThreadName(); // (7)
blockTrace = blockingOp(); // (8)
}
ctAfterSynch = getCarrierThreadName(); // (9)
logger.info(String.format( // (10)
"vt %4s: LockAcquiring(%s -> %s) -> BlockingOp(%s) -> %s",
vtID(), ctBeforeLock, ctAfterLock, blockTrace, ctAfterSynch));
};
// Reentrant lock:
public static final ReentrantLock lock = new ReentrantLock(); // (11)
// Task uses reentrant lock:
static final Runnable task2 = () -> { // (12)
String ctBeforeLock = "", ctAfterLock = "", ctAfterUnlock = "",
blockTrace = "";
ctBeforeLock = getCarrierThreadName(); // (13)
lock.lock(); // (14)
ctAfterLock = getCarrierThreadName(); // (15)
try {
blockTrace = blockingOp(); // (16)
} finally {
lock.unlock();
}
ctAfterUnlock = getCarrierThreadName(); // (17)
logger.info(String.format( // (18)
"vt %4s: LockAcquiring(%s -> %s) -> BlockingOp(%s) -> %s",
vtID(), ctBeforeLock, ctAfterLock, blockTrace, ctAfterUnlock
));
};
public static void main(String[] args) { // (19)
logger.info("-----Synchronized block-----");
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
IntStream.range(0, NUMBER_OF_VT).forEach(i -> executor.submit(task1));
}
logger.info("-----Reentrant lock-----");
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
IntStream.range(0, NUMBER_OF_VT).forEach(i -> executor.submit(task2));
}
}
static String vtID() {
return "#" + Thread.currentThread().threadId();
}
static String getCarrierThreadName() {
var vtInfo = Thread.currentThread().toString();
return vtInfo.substring(vtInfo.indexOf('w'));
}
}
Probable output from the program (edited to fit in page width):
[10:27:30] INFO: -----Synchronized block-----
Thread[#35,ForkJoinPool-1-worker-7,5,CarrierThreads]
vt.VTPinningDemo.lambda$0(VTPinningDemo.java:41) <== monitors:1
[10:27:31] INFO: vt #28: LockAcquiring(worker-7 -> worker-7) ->
BlockingOp(worker-7 -> worker-7) -> worker-7
[10:27:32] INFO: vt #23: LockAcquiring(worker-2 -> worker-2) ->
BlockingOp(worker-2 -> worker-2) -> worker-2
[10:27:33] INFO: vt #29: LockAcquiring(worker-8 -> worker-8) ->
BlockingOp(worker-8 -> worker-8) -> worker-8
[10:27:35] INFO: vt #26: LockAcquiring(worker-5 -> worker-5) ->
BlockingOp(worker-5 -> worker-5) -> worker-5
[10:27:36] INFO: vt #21: LockAcquiring(worker-1 -> worker-1) ->
BlockingOp(worker-1 -> worker-1) -> worker-1
[10:27:37] INFO: vt #27: LockAcquiring(worker-6 -> worker-6) ->
BlockingOp(worker-6 -> worker-6) -> worker-6
[10:27:38] INFO: vt #24: LockAcquiring(worker-3 -> worker-3) ->
BlockingOp(worker-3 -> worker-3) -> worker-3
[10:27:39] INFO: vt #25: LockAcquiring(worker-4 -> worker-4) ->
BlockingOp(worker-4 -> worker-4) -> worker-4
[10:27:39] INFO: -----Reentrant lock-----
[10:27:40] INFO: vt #38: LockAcquiring(worker-4 -> worker-4) ->
BlockingOp(worker-4 -> worker-1) -> worker-1
[10:27:41] INFO: vt #39: LockAcquiring(worker-3 -> worker-7) ->
BlockingOp(worker-7 -> worker-1) -> worker-1
[10:27:42] INFO: vt #41: LockAcquiring(worker-1 -> worker-7) ->
BlockingOp(worker-7 -> worker-1) -> worker-1
[10:27:43] INFO: vt #40: LockAcquiring(worker-6 -> worker-7) ->
BlockingOp(worker-7 -> worker-1) -> worker-1
[10:27:44] INFO: vt #42: LockAcquiring(worker-5 -> worker-7) ->
BlockingOp(worker-7 -> worker-1) -> worker-1
[10:27:45] INFO: vt #43: LockAcquiring(worker-8 -> worker-7) ->
BlockingOp(worker-7 -> worker-1) -> worker-1
[10:27:46] INFO: vt #44: LockAcquiring(worker-2 -> worker-7) ->
BlockingOp(worker-7 -> worker-1) -> worker-1
[10:27:47] INFO: vt #45: LockAcquiring(worker-7 -> worker-7) ->
BlockingOp(worker-7 -> worker-1) -> worker-1
Avoiding Pinning with a Reentrant Lock
Example 3.8 defines a task at (12) that uses a reentrant lock (declared at (11)) instead of a synchronized block to implement a critical region. The task uses the classical idiom for using a reentrant lock:
lock.lock(); // Acquire the lock.
try {
// Critical region
} finally {
lock.unlock(); // Release the lock.
}
The method lock() of the ReentrantLock class is a blocking operation. Other threads will wait if the lock is already taken. Especially if a virtual thread gets blocked because the lock is taken, the virtual thread is unmounted and its carrier thread can be scheduled to service other virtual threads by the JVM thread scheduler. There is no pinning involved.
As before, the code in the task traces the carrier threads that the virtual thread was mounted at various points in the code: before obtaining the lock at (13), after obtaining the lock at (15), executing the blocking operation at (16), and after the lock is freed at (17).
The scheduling trace of the carrier threads is printed at (18). For example, we see the following output for virtual thread #38:
[10:27:40] INFO: vt #38: LockAcquiring(worker-4 -> worker-4) ->
BlockingOp(worker-4 -> worker-1) -> worker-1
The trace for acquiring the lock shows that virtual thread #38 was mounted on carrier thread worker-4, most probably acquired the lock straight away, since the trace shows the same carrier thread before and after acquiring the lock.
The trace for the blocking operation shows that virtual thread #38 was mounted on carrier thread worker-4 before the blocking operation and was mounted on carrier thread worker-1 when it was allowed to resume execution after blocking. While it was blocked, its carrier thread worker-4 can be scheduled to execute other virtual threads. Again, there is no pinning during the blocking operation.
The unlock() method of the ReentrantLock class is not a blocking operation. The scheduling trace shows that virtual thread #38 continued execution while mounted on carrier thread worker-1.
Similarly, the scheduling traces of the other virtual threads show that there is no pinning when using a reentrant lock in other runs of the task. Note that since the virtual threads were not pinned, their associated carrier threads can be scheduled to service other virtual threads waiting to execute, as evident from the runs where the same carrier thread was involved in the execution of several other virtual threads.
Avoid Using Virtual Threads for CPU-Bound Tasks
The benefit of virtual threads is best harnessed when virtual threads execute frequent short-lived blocking operations—that is the nature of virtual threads. In the Java APIs, I/O operations and blocking operations on relevant data structures have been refactored to unmount virtual threads, without any explicit action on the part of the application. Long-running CPU-intensive tasks will not unmount virtual threads, thus providing no additional advantage and are best executed by platform threads.
Avoid Pooling of Virtual Threads
A thread pool manages a fixed number of threads to limit the number of tasks that can execute concurrently—also called limiting concurrency. It does not create new threads, only allocating new tasks to existing threads as these become available.
Platform threads are a scarce resource, expensive to create and destroy. This is in contrast to virtual threads that are lightweight, cheap to create and destroy, specially designed for one-thread-per-task model of concurrent programming. Using a thread pool for virtual threads is thus inconsequential.
However, if it is necessary to limit the number of virtual threads that can execute concurrently, the interested reader should refer to the API of the java.util.concurrent.Semaphore class, as the Concurrency API does not provide any executor service that allows a fixed number of virtual threads.
Minimize Using Thread-Local Variables with Virtual Threads
A thread-local variable allows a thread to store a value that is only accessible in the scope of the thread where each thread has a private copy of the variable—thus ensuring its thread-safety.
Virtual threads work with thread-local variables, but as virtual threads can be created in the thousands, the sheer number of copies of each thread-local variable for each virtual thread can put a premium on memory space, especially if the data stored in the thread-local variables has a large memory footprint. This issue is less of a problem with platform threads, as these threads are seldom created in such large numbers as virtual threads.
For details on thread-local variables and their usage, the curious reader should refer to the API of the java.lang.ThreadLocal<T> class.
Avoid Substituting Virtual Threads for Platform Threads
Substituting virtual threads for platform threads is not always the answer to improve performance because virtual threads are not faster than platform threads. Under the right circumstances—large number of concurrent tasks that perform short-lived blocking operations—virtual threads can substantially increase the scalability of one-virtual-thread-per-task-based concurrent applications.
Future releases of Java aim to alleviate many of the issues regarding virtual threads that have been raised in this section.
