Modern computers have multiple processing cores. Assuming there is enough processing so that each core can remain continuously busy. they will be assigned a queue of computational work items. Or threads to complete by the scheduler.
During executing these threads, it’s possible to spawn new threads or work items. These are separate threads that can be processed simultaneously. They may need to feed results back to the spawning programs or remain completely separate indefinitely. Typically, these child threads are assigned to the same processing core as the parent.
All this assumes that all the cores are kept busy. This will happen if no thread ends or new threads are spawned at the same rate or faster than existing threads end. In the real world, though, the long-term workload is rarely that simple, especially in end-user computing devices. Eventually, a processing core will likely complete all the assigned tasks. When this happens, rather than sitting idle and wasting potential performance, it instead checks on the work queues of the other processing cores and steals a work item from them.
Benefits and Downsides
Work stealing means that an idle processing core will actively search out work for it to complete. This prevents a potentially large fraction of the overall processor from sitting idle, which is helpful. Work stealing can come with some costs, though. For example, the new processing core will likely have to load any relevant data into its cache memory.
This can take time, especially if it has to be requested from system RAM rather than being served by a shared cache tier. It’s possible that the original processor would have been able to resume that work item in that timeframe, leading to faster overall execution. This can even be the case if the processing core from which the work item was stolen had never begun processing it. Some cached values may be identical between parent and child threads.
Several programming languages have runtimes that can schedule work directly on dedicated processors. For example, the Cilk programming language, the Rust Tokio runtime, and the .Net Task Parallel Library can do this. Alternatively, the operating system may be in charge of scheduling actual processor time. With the program simply adding tasks to a pool of “worker threads,” which are themselves scheduled by the operating system.
This happens in systems where the program does not have dedicated direct access to processing cores but must share access with other processes. Extra care must be taken in this scenario to ensure that a thread isn’t repeatedly stolen as it sits idle.
There are various approaches to how work items are selected to be stolen. In the original concept, the approach was to choose another random core. If it had one or more work items in its queue, take the last one. Depending on the preference for whether a child process is immediately executed by the originating processor. Or, if it’s pushed to the processor’s queue and the parent process continues to be executed, the parent or child thread will be stolen.
It can all be summed up as Work Stealing, a load balancing technique that ensures that the word load is spread evenly among available processors. That way, all processors are doing something to help.
Work stealing is a process that happens automatically in multicore CPUs. Each core has a queue of tasks to perform. When a processor completes its tasks, it then steals another task from the queue of another processing core. This helps to prevent the processor from having some cores sit idle while others still have a queue of tasks to perform.