scaling.shared_array
Unified cross-process numpy arrays with pluggable OS backends.
This module provides a single SharedArray facade for sharing
numpy.ndarray buffers between Python processes, backed by one of
three interchangeable mechanisms:
shm—multiprocessing.shared_memory.SharedMemory(named POSIX shared memory, stored under/dev/shm).mmap— anonymousmmap(MAP_SHARED)(no filesystem footprint, inherited by forked children).memfd— Linuxmemfd_create(2)(anonymous, file-descriptor based, compatible with bothforkandspawn).
The active backend is chosen once at process startup, either explicitly or
by auto-detection. All SharedArray instances created afterwards use
that backend transparently.
Why three backends?
Each backend trades off three concerns:
Storage location — whether the buffer lives in
/dev/shm(atmpfspartition of fixed and often small size) or in the address space of the process.Process-start compatibility — whether the buffer is reachable from children started with
fork, withspawn, or both.Platform — Linux, macOS, or Windows.
The pluggable design lets the same code run unchanged across:
Tightly constrained Docker containers where
/dev/shmis too small.Future Python releases that default to
spawnon Linux (3.14+).Cross-platform developer workstations.
fork vs spawn — a refresher
When multiprocessing starts child processes, two fundamentally
different mechanisms can be used.
fork (default on Linux, available on macOS)
fork() duplicates the parent’s address space (copy-on-write at the page
level). The child inherits every Python object, file descriptor, mmap
region and OS resource the parent had at the moment of the fork. No data
is serialized, and the operation is essentially free even on multi-gigabyte
processes.
For SharedArray, a forked child simply calls
SharedArray.load() to rebuild the numpy view on a buffer it already
has in its address space.
spawn (default on Windows, opt-in elsewhere, Linux default in 3.14+)
spawn() starts a fresh Python interpreter and re-executes the entry
point. The child knows nothing of the parent’s state; whatever it needs
must be transmitted explicitly via pickle or a fd-passing mechanism.
For SharedArray, the parent calls
SharedArray.get_passing_payload() to obtain a serializable
description, and the child rebuilds the array with
SharedArray.from_payload(). Whether this works depends on the chosen
backend:
shm: works (segment re-attached by name).memfd: works, but requires the file descriptor to be transmitted viamultiprocessing.reduction.send_handle()orSCM_RIGHTS.mmap: does not work (anonymous mappings have no transmissible handle).
Default start method by platform
Platform |
Python ≤ 3.13 |
Python 3.14+ |
Other modes |
|---|---|---|---|
Linux |
|
|
|
macOS |
|
|
|
Windows |
|
|
|
Backend reference
shm — multiprocessing.shared_memory
A named POSIX shared memory segment stored under /dev/shm.
Strengths
Works on every Python platform (Linux, macOS, Windows).
Compatible with both
forkandspawn— children re-attach by name.Inspectable from the shell with
ls /dev/shm.
Weaknesses
Constrained by
/dev/shmtotal size — 64 MB by default in Docker.Segments must be explicitly unlinked, or they leak until reboot.
CPython bug pre-3.13:
ResourceWarningmay unlink the segment when the creator exits even if children are still attached.
Best for: development environments, cross-platform deployments,
spawn-based pipelines with ample /dev/shm.
mmap — anonymous MAP_SHARED
An anonymous mapping created with mmap.mmap(-1, size, MAP_SHARED). No
name, no filesystem entry; visible only to the creator and its forked
descendants.
Strengths
Zero filesystem footprint, independent of
/dev/shm.Released automatically when the last reference goes out of scope.
Weaknesses
fork-only. Not usable withspawnor on Windows.Cannot be re-attached after the creator dies.
Best for: Linux containers with tight /dev/shm and a fork start
method.
memfd — memfd_create + fd passing
A Linux-only backend using memfd_create(2) to allocate anonymous memory
accessible through a file descriptor, then mapped with mmap.
Strengths
Independent of
/dev/shm.Compatible with both
fork(fd inherited) andspawn(fd transmissible).Inspectable through
/proc/<pid>/fd.Cleaned up automatically when the last fd closes.
Weaknesses
Linux only (kernel ≥ 3.17, glibc ≥ 2.27).
spawnrequires explicit fd-passing code in the parent.
Best for: Linux deployments preparing for Python 3.14’s spawn
default while keeping anonymous shared memory.
Backend comparison
Property |
|
|
|
|---|---|---|---|
OS resource |
Named segment in |
Address-space mapping only |
Anonymous fd + mapping |
Uses |
Yes |
No |
No |
Works with |
Yes |
Yes |
Yes |
Works with |
Yes (by name) |
No |
Yes (by fd-passing) |
Linux |
Yes |
Yes |
Yes |
macOS |
Yes |
Yes |
No |
Windows |
Yes |
No |
No |
Cleanup |
Manual |
Automatic |
Automatic |
Backend selection
The backend is chosen once and reused for the lifetime of the process. Three mechanisms drive the choice, in order of precedence:
The
GRIDR_SHARED_BACKENDenvironment variable.An explicit call to
set_backend().Auto-detection if neither of the above is set.
Auto-detection logic
In order:
If the platform is Windows:
shm.If
/dev/shmreports strictly more thanGRIDR_SHM_MIN_FREEbytes free (default 64 MB):shm.If
forkis available:mmap.If
memfd_createis available:memfd.Otherwise:
mmapwith a warning.
Usage
Basic example (single process)
import numpy as np
from gridr.scaling.shared_array import SharedArray
sa = SharedArray(
shape=(1024, 1024),
dtype=np.float32,
name=SharedArray.build_name(prefix="buffer"),
)
sa.create()
sa.array[:] = 0.0
sa.destroy()
Sharing across forked workers
With the default fork start method on Linux, no special handling is
required:
import multiprocessing as mp
import numpy as np
from gridr.scaling.shared_array import SharedArray
def worker(sa, idx):
sa.load()
sa.array[idx] = idx ** 2
if __name__ == "__main__":
sa = SharedArray(shape=(100,), dtype=np.int64,
name=SharedArray.build_name("squares"))
sa.create()
ctx = mp.get_context("fork")
with ctx.Pool(4) as pool:
pool.starmap(worker, [(sa, i) for i in range(100)])
sa.destroy()
Sharing across spawned workers
With spawn, the child receives only what the parent transmits. The
parent calls SharedArray.get_passing_payload(), the child rebuilds
the array with SharedArray.from_payload(). This works for shm
and memfd backends; mmap raises RuntimeError.
For memfd, the file descriptor must additionally be transferred to the
child via multiprocessing.reduction.send_handle() or
SCM_RIGHTS over a Unix-domain socket.
Cleaning up registered buffers
from gridr.scaling.shared_array import SharedArray, create_and_register
buffers = []
sa1 = create_and_register((512, 512), np.float32, buffers, prefix="grid")
sa2 = create_and_register((256, 256), np.uint8, buffers, prefix="mask")
# ... pipeline ...
SharedArray.clear_buffers(buffers)
Concurrent access
SharedArray does not provide synchronization. The caller is
responsible for consistency. Common patterns:
Disjoint write regions — workers each write to a disjoint slice. No locking needed.
Phased access — write phase,
multiprocessing.Barrier, then read phase.Per-region atomic flags — fine-grained progress tracking using atomic flags placed in a second shared buffer.
Environment variables
- GRIDR_SHARED_MEMORY_BACKEND
Forces the backend used by all subsequent
SharedArrayinstances. Accepted values:shm,mmap,memfd.
- GRIDR_SHM_MIN_FREE
- Threshold in bytes used by the
autoselector. DTheshmbackend is selected only when
/dev/shmhas strictly more than this many bytes free. Defaults to67108864(64 MiB).
- Threshold in bytes used by the
Compatibility
Python 3.10+
Linux: all three backends
macOS:
shm,mmapWindows:
shmonly
See also
multiprocessing.shared_memorymmapmemfd_create(2)
multiprocessing.reduction
- class gridr.scaling.shared_array.SharedArray(shape, dtype, name, array_slice=None, _backend=None)[source]
Process-shared numpy array with a pluggable backend.
Drop-in replacement for the previous SharedMemoryArray. The active backend is determined by get_backend() and can be controlled via set_backend() or the GRIDR_SHARED_BACKEND env var.
- property array: ndarray | None
Numpy view onto the shared buffer.
Returns the writable
numpy.ndarrayexposing the underlying shared memory. If anarray_slicewas provided at construction, the sliced sub-view is returned; otherwise the full-shape array is returned.- Returns:
The shared array view, or
Noneif the resource has not been allocated yet (nocreate()orload()call) or has been released viaclose()ordestroy().- Return type:
numpy.ndarray or None
Notes
The returned array shares memory with all other processes attached to the same buffer. Modifications are visible immediately to every attached process; no synchronization is performed by this property.
- classmethod build_name(prefix=None)[source]
Generates a supposedly unique name for a memory segment.
The name is constructed using a class-level counter, an optional prefix, the current timestamp, and a UUID4 string to maximize uniqueness. The class counter is incremented with each call.
- Parameters:
prefix (str, optional) – An optional string prefix to include in the generated name. Defaults to None, resulting in an empty prefix.
- Returns:
A unique string suitable for use as a shared memory segment name. Example: “1-my_prefix-202310-2715-3000-abcdef12-3456-7890-abcd-ef1234567890”
- Return type:
str
- classmethod clear_buffers(buffers)[source]
Release a list of buffers.
Accepts SharedArray instances (preferred) or legacy str names. Names only meaningful for the “shm” backend.
This method iterates through a list of shared memory names and attempts to unlink (delete) each corresponding shared memory segment from the operating system. This effectively cleans up shared memory resources.
- Parameters:
buffer_names (list of str) – A list of unique names of the shared memory buffers to be unlinked.
- Return type:
None
- classmethod clone(sa, **override)[source]
Build a new SharedArray description from an existing one.
For mmap / memfd, the clone shares the underlying mapping. For shm, the clone targets the same segment; load() in the target process to attach.
- Return type:
- close()[source]
Detach the local numpy view from the shared buffer.
Releases this process’s reference to the numpy view but leaves the underlying OS resource intact, so other processes can keep using it. Safe to call from worker processes after they are done with the buffer.
After
close(), thearrayproperty returnsNoneuntilload()is called again. :rtype:None
- create()[source]
Creates the memory buffer and associates a NumPy array view.
This method allocates a memory segment with the specified name and size (derived from shape and dtype), then creates a NumPy array view that points to this memory segment. The array_slice attribute is not applied during creation; it’s used when the array is loaded (e.g., by another process, or via the load() method).
- Return type:
None
- destroy()[source]
Release the underlying OS resource backing this shared array.
Performs the backend-specific cleanup: :rtype:
Noneshm— closes and unlinks the named POSIX segment from/dev/shm.mmap— unmaps the anonymous memory region.memfd— closes the file descriptor and unmaps the region.
Should only be called from the process that created the
SharedArray, after all other processes have finished using it. Callingdestroy()while workers are still attached results in undefined behaviour for the workers.The call is idempotent: a second
destroy()is a no-op.See also
closedetach the local view without releasing the OS resource.
clear_buffersrelease multiple shared arrays in bulk.
- classmethod from_payload(payload)[source]
Reconstruct a SharedArray in a child process.
- Return type:
- get_passing_payload()[source]
Return a serializable dict to reconstruct this SharedArray in another process.
For “memfd”, the payload contains an fd that must be transferred via SCM_RIGHTS / multiprocessing.reduction.send_handle. For “shm”, only the name is needed (already in the payload). For “mmap”, not supported (use fork). :rtype:
Dict[str,Any]Note
Only neede when using
spawnworkers. With the defaultforkstart method on Linux, you do not need this - simply pass theSharedArrayinstance to workers and callload()in the worker.
- load()[source]
Attach this process to the existing shared buffer.
Re-attaches to a buffer previously allocated by
create()in another (or the same) process, and rebuilds the localnumpy.ndarrayview onto the shared memory. The attach mechanism is backend-specific: :rtype:Noneshm— reconnect to the named POSIX segment by itsname.mmap— rebuild the numpy view on the mapping already inherited viafork. The mapping is located in a per-process registry undername.memfd— rebuild the numpy view on the file descriptor inherited viaforkor transmitted via SCM_RIGHTS.
For
mmapandmemfd, the buffer must have been created by an ancestor process before the current process was forked, or by the current process itself. Forshm, any process can attach by name.- Raises:
RuntimeError – If the underlying OS resource cannot be found, typically because
create()was not called or because the worker was started withspawnfor a backend that requires inheritance.
- gridr.scaling.shared_array.create_and_register(shape, dtype, register, prefix=None)[source]
Create a SharedArray and append it to a tracking list.
- Return type:
- gridr.scaling.shared_array.get_backend()[source]
Return the currently-active concrete backend.
- Return type:
str
- gridr.scaling.shared_array.set_backend(name)[source]
Force the backend for subsequent SharedArray creations.
- Return type:
None
- gridr.scaling.shared_array.shared_array_wrap(func)[source]
Auto-load and auto-close SharedArray arguments around a function call.
This helper function simplifies working with SharedArray objects by automatically handling their load() and close() operations. It’s intended for functions that operate on NumPy arrays but might receive SharedMemoryArray instances as inputs.
- Parameters:
func (callable) – The function to be wrapped. Its arguments will be inspected for SharedArray instances.
- Returns:
A wrapper function that handles the loading and closing of SharedArray arguments before and after executing the original func.
- Return type:
callable
Notes
This decorator should be used with caution as it modifies the arguments passed to the wrapped function by replacing SharedArray instances with their underlying NumPy arrays. It ensures close() is called on all detected SharedArray instances, even if the wrapped function raises an exception.