Version:

Using the xarray structured data interface within Salvus

Originally, Salvus entirely relied on obspy.Stream objects to internally represent simulated and observed data. While obspy.Stream's are a great representation type for seismological data and come with convenient processing and filtering functions, there are a few limitations to the original data interface used within Salvus.
  • Data could only be retrieved per station. If an event contained multiple receivers, the data for each receiver had to be retrieved separately in a loop, which could become slow in cases where a high number of receivers was used.
  • All data processing, windowing, and the computation of misfits was performed trace-by-trace. Besides the already-mentioned performance aspects, this also meant that more sophisticated time-and-space-dependent processing and misfit computations (e.g. in the fk-domain) were not supported.
  • obspy.Stream objects are mainly used within the seismological community. For non-seismology users, learning how to work with these objects would provide an unnecessary extra challenge.
For these reasons, with salvus release 2025.1.0, we have changed the way that data is represented internally within Salvus. Salvus now uses xarray data structures with a specific format to represent data internally. In this tutorial, you will learn what this format looks like and how you can start using it within Salvus. If you prefer to continue to use the old interface based on obspy.Stream objects, do not worry, all the old functionality is still supported!
This tutorial is structured in the following way:
  1. Introduction to xarray (if you are already familiar with xarray, you can skip this step).
  2. Retrieval of xarray data from a Salvus project.
  3. Custom data processing, data selection, and misfit functions using Salvus' xarray interface.
Copy
import salvus.namespace as sn

import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

import scipy.signal

1. Introduction to xarray

Xarray provides a convenient interface to represent structured data. In a nutshell, an xarray data structure is simply a multidimensional array with a label for each dimension. It can optionally store additional information on the coordinates of each data point. Additionally, xarray comes with some convenient features like data selection, plotting, and sorting methods.
To familiarize ourselves with the xarray interface, let us generate a simple 2D data set analogous to a seismic data set. In this simple example, we model a space- and time-dependent seismic wavefield with the following simple monochromatic plane-wave solution to the wave equation
u(x,t)=sin(ωtkx)u(x, t) = \sin(\omega t - k x),
with xx being the space coordinate, tt the time coordinate, ω\omega the angular frequency, and kk the wavenumber, which can be expressed in terms of the velocity cc, as k=ω/ck=\omega/c.
Let us use this function to generate a discrete seismic data set for a monochromatic plane wavefield at 10 Hz where we sample the wavefield along xx with a linear array of 25 sensors spaced at 10 meters. The time axis is resolved with 20 samples per period.
# Frequency of the plane-harmonic wave in Hz.
f = 10.0
# Velocity in m/s.
velocity = 3000.0

# Wavenumber
k = 2 * np.pi * f / velocity

# Total number of periods included in the data.
number_of_periods = 4
# Temporal resolution of the data
samples_per_period = 20

# Time axis
t = np.linspace(
    0,
    number_of_periods * (1 / f),
    int(number_of_periods * (1 / f) / (1 / (samples_per_period * f))),
)
# Space axis
x = np.arange(25) * 10

# Generate plane-harmonic data at the sampling points
u = np.sin(2 * np.pi * f * t[np.newaxis, :] - k * x[:, np.newaxis])

# Let's visualize the data
plt.imshow(u, aspect="auto")
plt.xlabel("t (# of samples)")
plt.ylabel("x (# of samples)")

# What it would look like to include axis data manually
# plt.imshow(u,extent=[t.min(), t.max(), x.min(), x.max()], aspect="auto")
# The axes are now labeled in whatever units the dimensions are defined.
# plt.xlabel("t")
# plt.ylabel("x")
Text(0, 0.5, 'x (# of samples)')
Note that we obtained our data as a 2D numpy array, which does not contain any information on the time and the space coordinates. Here, xarray can help us to additionally store this information in a single data structure. Let us therefore wrap our data into an xarray.DataArray structure. Upon creation of the xarray object can provide the following information:
  • data: The numpy array containing our data.
  • dims: A list with the labels for each dimension of our dataset. Our data is 2D, so we add two labels, one for the time and one for the space dimension.
  • coords: Here, we can optionally provide the coordinates for each dimension. We pass the time and space axis as a dictionary. The keys of the dictionary have to agree with the dimensions the we set in the previous row.
  • name: An optional name for our data.
u_xr = xr.DataArray(
    data=u,
    dims=["x", "t"],
    coords={"t": t, "x": x},
    name="u(x, t)",
)

u_xr
<xarray.DataArray 'u(x, t)' (x: 25, t: 80)> Size: 16kB
array([[ 0.00000000e+00,  3.12796607e-01,  5.94201029e-01, ...,
        -5.94201029e-01, -3.12796607e-01, -9.79717439e-16],
       [-2.07911691e-01,  1.08482541e-01,  4.13989494e-01, ...,
        -7.48443128e-01, -5.03439960e-01, -2.07911691e-01],
       [-4.06736643e-01, -1.00572732e-01,  2.15684631e-01, ...,
        -8.69974671e-01, -6.72080571e-01, -4.06736643e-01],
       ...,
       [ 9.94521895e-01,  9.11920769e-01,  7.37799515e-01, ...,
         8.62021355e-01,  9.77313066e-01,  9.94521895e-01],
       [ 9.94521895e-01,  9.77313066e-01,  8.62021355e-01, ...,
         7.37799515e-01,  9.11920769e-01,  9.94521895e-01],
       [ 9.51056516e-01,  9.99992093e-01,  9.48568727e-01, ...,
         5.81332295e-01,  8.06673158e-01,  9.51056516e-01]])
Coordinates:
  * t        (t) float64 640B 0.0 0.005063 0.01013 0.01519 ... 0.3899 0.3949 0.4
  * x        (x) int64 200B 0 10 20 30 40 50 60 ... 180 190 200 210 220 230 240
<xarray.DataArray 'u(x, t)' (x: 25, t: 80)> Size: 16kB
array([[ 0.00000000e+00,  3.12796607e-01,  5.94201029e-01, ...,
        -5.94201029e-01, -3.12796607e-01, -9.79717439e-16],
       [-2.07911691e-01,  1.08482541e-01,  4.13989494e-01, ...,
        -7.48443128e-01, -5.03439960e-01, -2.07911691e-01],
       [-4.06736643e-01, -1.00572732e-01,  2.15684631e-01, ...,
        -8.69974671e-01, -6.72080571e-01, -4.06736643e-01],
       ...,
       [ 9.94521895e-01,  9.11920769e-01,  7.37799515e-01, ...,
         8.62021355e-01,  9.77313066e-01,  9.94521895e-01],
       [ 9.94521895e-01,  9.77313066e-01,  8.62021355e-01, ...,
         7.37799515e-01,  9.11920769e-01,  9.94521895e-01],
       [ 9.51056516e-01,  9.99992093e-01,  9.48568727e-01, ...,
         5.81332295e-01,  8.06673158e-01,  9.51056516e-01]])
Coordinates:
  * t        (t) float64 640B 0.0 0.005063 0.01013 0.01519 ... 0.3899 0.3949 0.4
  * x        (x) int64 200B 0 10 20 30 40 50 60 ... 180 190 200 210 220 230 240
Once we have stored our data in an xarray.DataArray, we can use the plot() method for a quick visualization. Note that the time and space coordinates are now represented correctly, as well as adding an appropriate color scale automatically.
# Note how we can perform conventional numpy
# operations on the xarray data arrays, such as
# the applying the transform operator here.
u_xr.T.plot()
<matplotlib.collections.QuadMesh at 0x7dec3f73a7d0>
We can easily retrieve the underlying data as numpy arrays in the following way using the data attribute:
# Get the data as a numpy array
u = u_xr.data
u
array([[ 0.00000000e+00,  3.12796607e-01,  5.94201029e-01, ...,
        -5.94201029e-01, -3.12796607e-01, -9.79717439e-16],
       [-2.07911691e-01,  1.08482541e-01,  4.13989494e-01, ...,
        -7.48443128e-01, -5.03439960e-01, -2.07911691e-01],
       [-4.06736643e-01, -1.00572732e-01,  2.15684631e-01, ...,
        -8.69974671e-01, -6.72080571e-01, -4.06736643e-01],
       ...,
       [ 9.94521895e-01,  9.11920769e-01,  7.37799515e-01, ...,
         8.62021355e-01,  9.77313066e-01,  9.94521895e-01],
       [ 9.94521895e-01,  9.77313066e-01,  8.62021355e-01, ...,
         7.37799515e-01,  9.11920769e-01,  9.94521895e-01],
       [ 9.51056516e-01,  9.99992093e-01,  9.48568727e-01, ...,
         5.81332295e-01,  8.06673158e-01,  9.51056516e-01]])
# Get the time axis as a numpy array
t = u_xr.t.data
t
array([0.        , 0.00506329, 0.01012658, 0.01518987, 0.02025316,
       0.02531646, 0.03037975, 0.03544304, 0.04050633, 0.04556962,
       0.05063291, 0.0556962 , 0.06075949, 0.06582278, 0.07088608,
       0.07594937, 0.08101266, 0.08607595, 0.09113924, 0.09620253,
       0.10126582, 0.10632911, 0.11139241, 0.1164557 , 0.12151899,
       0.12658228, 0.13164557, 0.13670886, 0.14177215, 0.14683544,
       0.15189873, 0.15696203, 0.16202532, 0.16708861, 0.1721519 ,
       0.17721519, 0.18227848, 0.18734177, 0.19240506, 0.19746835,
       0.20253165, 0.20759494, 0.21265823, 0.21772152, 0.22278481,
       0.2278481 , 0.23291139, 0.23797468, 0.24303797, 0.24810127,
       0.25316456, 0.25822785, 0.26329114, 0.26835443, 0.27341772,
       0.27848101, 0.2835443 , 0.28860759, 0.29367089, 0.29873418,
       0.30379747, 0.30886076, 0.31392405, 0.31898734, 0.32405063,
       0.32911392, 0.33417722, 0.33924051, 0.3443038 , 0.34936709,
       0.35443038, 0.35949367, 0.36455696, 0.36962025, 0.37468354,
       0.37974684, 0.38481013, 0.38987342, 0.39493671, 0.4       ])
# Get the space axis as a numpy array
x = u_xr.x.data
x
array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120,
       130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240])
Coordinates do not necessarily need to be numerical values. For example, instead of the x-coordinate, we might want to store something like a receiver name that uniquely identifies the receiver that recorded the data.
# Generate a list of unique receiver identifiers
receiver_id = [f"R{i:02d}" for i in range(u.shape[0])]

print(f"`receiver_id` is a list of strings, of length: {len(receiver_id)}")
print(receiver_id, "\n")

data_array = xr.DataArray(
    data=u,
    dims=["receiver_id", "time"],
    coords={"time": t, "receiver_id": receiver_id},
    name="wave_recording",  # Only used for visualization & labelling
)

data_array.plot()
`receiver_id` is a list of strings, of length: 25
['R00', 'R01', 'R02', 'R03', 'R04', 'R05', 'R06', 'R07', 'R08', 'R09', 'R10', 'R11', 'R12', 'R13', 'R14', 'R15', 'R16', 'R17', 'R18', 'R19', 'R20', 'R21', 'R22', 'R23', 'R24'] 

<matplotlib.collections.QuadMesh at 0x7dec33fec610>
To select data of a specific receiver, we can use the sel() method on the "receiver_id" dimension:
# Select the receiver "R15"
data_array.sel(receiver_id="R15").plot()
[<matplotlib.lines.Line2D at 0x7dec33d46450>]
For index based selection, we can use the isel() method:
# Select the three receivers by their index
data_array.isel(receiver_id=[15, 18, 21]).plot()
<matplotlib.collections.QuadMesh at 0x7dec33d2e490>
Finally, it might be interesting to know that one can construct Datasets from multiple DataArrays. These objects can be useful when dealing with multiple quantities together, such as medium models or multiple recording fields.
ds = xr.Dataset(
    {
        "var_1": data_array,
        "var_2": data_array**2,
        "var_3": data_array - 0.5,
    }
)

ds
<xarray.Dataset> Size: 49kB
Dimensions:      (time: 80, receiver_id: 25)
Coordinates:
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
  * receiver_id  (receiver_id) <U3 300B 'R00' 'R01' 'R02' ... 'R22' 'R23' 'R24'
Data variables:
    var_1        (receiver_id, time) float64 16kB 0.0 0.3128 ... 0.8067 0.9511
    var_2        (receiver_id, time) float64 16kB 0.0 0.09784 ... 0.6507 0.9045
    var_3        (receiver_id, time) float64 16kB -0.5 -0.1872 ... 0.3067 0.4511
<xarray.Dataset> Size: 49kB
Dimensions:      (time: 80, receiver_id: 25)
Coordinates:
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
  * receiver_id  (receiver_id) <U3 300B 'R00' 'R01' 'R02' ... 'R22' 'R23' 'R24'
Data variables:
    var_1        (receiver_id, time) float64 16kB 0.0 0.3128 ... 0.8067 0.9511
    var_2        (receiver_id, time) float64 16kB 0.0 0.09784 ... 0.6507 0.9045
    var_3        (receiver_id, time) float64 16kB -0.5 -0.1872 ... 0.3067 0.4511

Multi-index dimensions

In certain cases, our data will have more dimensions than just the space and time dimension. For example, elastic seismic wavefields are vector fields of which we might record multiple components. Our data will thus have an additional dimension corresponding to the component of the wavefield. For various reasons, data within Salvus are always represented as 2D xarray data structures. In order to keep track of information on additional dimensions such as the component, we make use a so-called multi-index dimension that can store information on both the "receiver_id" and the "component". Let us see how this works in practice.
First, we create a 3D data array, where the first dimension corresponds to the "receiver_id", the second dimension to the "component", and the third dimension to "time".
# Let's mock some multicomponent data
# where each component corresponds to a
# scaled version of our original data.
theta = np.pi / 8
# X component
u_x = np.cos(theta) * u

# Y component
u_y = np.sin(theta) * u

# 3D array with multicomponent data,
# where the first dimension corresponds
# to the receiver id, the second to
# the component, and the third to time.
u_multicomponent = np.stack([u_x, u_y], axis=1)
u_multicomponent.shape
(25, 2, 80)
Now, we set up a 3D xarray.DataArray from that.
components = ["X", "Y"]

# 3D data structure
data_array_multicomponent = xr.DataArray(
    # Data as 3D numpy array
    data=u_multicomponent,
    # Labels for each dimension
    dims=["receiver_id", "component", "time"],
    # Coordinates for each dimension
    coords={
        "receiver_id": receiver_id,
        "component": components,
        "time": t,
    },
)

data_array_multicomponent
<xarray.DataArray (receiver_id: 25, component: 2, time: 80)> Size: 32kB
array([[[ 0.00000000e+00,  2.88986383e-01,  5.48970169e-01, ...,
         -5.48970169e-01, -2.88986383e-01, -9.05140890e-16],
        [ 0.00000000e+00,  1.19702079e-01,  2.27390889e-01, ...,
         -2.27390889e-01, -1.19702079e-01, -3.74921632e-16]],

       [[-1.92085356e-01,  1.00224799e-01,  3.82476420e-01, ...,
         -6.91471287e-01, -4.65117875e-01, -1.92085356e-01],
        [-7.95643595e-02,  4.15144712e-02,  1.58426920e-01, ...,
         -2.86416785e-01, -1.92658132e-01, -7.95643595e-02]],

       [[-3.75775660e-01, -9.29170890e-02,  1.99266616e-01, ...,
         -8.03751792e-01, -6.20921484e-01, -3.75775660e-01],
        [-1.55651375e-01, -3.84875184e-02,  8.25389350e-02, ...,
         -3.32924893e-01, -2.57194100e-01, -1.55651375e-01]],

       ...,

       [[ 9.18818424e-01,  8.42504934e-01,  6.81637871e-01, ...,
          7.96403887e-01,  9.02919539e-01,  9.18818424e-01],
        [ 3.80587052e-01,  3.48976970e-01,  2.82343651e-01, ...,
          3.29881291e-01,  3.74001519e-01,  3.80587052e-01]],

       [[ 9.18818424e-01,  9.02919539e-01,  7.96403887e-01, ...,
          6.81637871e-01,  8.42504934e-01,  9.18818424e-01],
        [ 3.80587052e-01,  3.74001519e-01,  3.29881291e-01, ...,
          2.82343651e-01,  3.48976970e-01,  3.80587052e-01]],

       [[ 8.78661650e-01,  9.23872227e-01,  8.76363232e-01, ...,
          5.37081009e-01,  7.45268820e-01,  8.78661650e-01],
        [ 3.63953572e-01,  3.82680406e-01,  3.63001536e-01, ...,
          2.22466238e-01,  3.08700453e-01,  3.63953572e-01]]])
Coordinates:
  * receiver_id  (receiver_id) <U3 300B 'R00' 'R01' 'R02' ... 'R22' 'R23' 'R24'
  * component    (component) <U1 8B 'X' 'Y'
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
<xarray.DataArray (receiver_id: 25, component: 2, time: 80)> Size: 32kB
array([[[ 0.00000000e+00,  2.88986383e-01,  5.48970169e-01, ...,
         -5.48970169e-01, -2.88986383e-01, -9.05140890e-16],
        [ 0.00000000e+00,  1.19702079e-01,  2.27390889e-01, ...,
         -2.27390889e-01, -1.19702079e-01, -3.74921632e-16]],

       [[-1.92085356e-01,  1.00224799e-01,  3.82476420e-01, ...,
         -6.91471287e-01, -4.65117875e-01, -1.92085356e-01],
        [-7.95643595e-02,  4.15144712e-02,  1.58426920e-01, ...,
         -2.86416785e-01, -1.92658132e-01, -7.95643595e-02]],

       [[-3.75775660e-01, -9.29170890e-02,  1.99266616e-01, ...,
         -8.03751792e-01, -6.20921484e-01, -3.75775660e-01],
        [-1.55651375e-01, -3.84875184e-02,  8.25389350e-02, ...,
         -3.32924893e-01, -2.57194100e-01, -1.55651375e-01]],

       ...,

       [[ 9.18818424e-01,  8.42504934e-01,  6.81637871e-01, ...,
          7.96403887e-01,  9.02919539e-01,  9.18818424e-01],
        [ 3.80587052e-01,  3.48976970e-01,  2.82343651e-01, ...,
          3.29881291e-01,  3.74001519e-01,  3.80587052e-01]],

       [[ 9.18818424e-01,  9.02919539e-01,  7.96403887e-01, ...,
          6.81637871e-01,  8.42504934e-01,  9.18818424e-01],
        [ 3.80587052e-01,  3.74001519e-01,  3.29881291e-01, ...,
          2.82343651e-01,  3.48976970e-01,  3.80587052e-01]],

       [[ 8.78661650e-01,  9.23872227e-01,  8.76363232e-01, ...,
          5.37081009e-01,  7.45268820e-01,  8.78661650e-01],
        [ 3.63953572e-01,  3.82680406e-01,  3.63001536e-01, ...,
          2.22466238e-01,  3.08700453e-01,  3.63953572e-01]]])
Coordinates:
  * receiver_id  (receiver_id) <U3 300B 'R00' 'R01' 'R02' ... 'R22' 'R23' 'R24'
  * component    (component) <U1 8B 'X' 'Y'
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
We now convert this 3D array to a 2D array by combining the "receiver_id" and "component" dimensions into a multi-indexed "trace" dimension. To do so, we use the stack() method. This is exactly how data is represented internally within Salvus: any dimension that is not time is stacked into trace.
da_multi_indexed = data_array_multicomponent.stack(
    trace=("receiver_id", "component")
)
da_multi_indexed
<xarray.DataArray (time: 80, trace: 50)> Size: 32kB
array([[ 0.00000000e+00,  0.00000000e+00, -1.92085356e-01, ...,
         3.80587052e-01,  8.78661650e-01,  3.63953572e-01],
       [ 2.88986383e-01,  1.19702079e-01,  1.00224799e-01, ...,
         3.74001519e-01,  9.23872227e-01,  3.82680406e-01],
       [ 5.48970169e-01,  2.27390889e-01,  3.82476420e-01, ...,
         3.29881291e-01,  8.76363232e-01,  3.63001536e-01],
       ...,
       [-5.48970169e-01, -2.27390889e-01, -6.91471287e-01, ...,
         2.82343651e-01,  5.37081009e-01,  2.22466238e-01],
       [-2.88986383e-01, -1.19702079e-01, -4.65117875e-01, ...,
         3.48976970e-01,  7.45268820e-01,  3.08700453e-01],
       [-9.05140890e-16, -3.74921632e-16, -1.92085356e-01, ...,
         3.80587052e-01,  8.78661650e-01,  3.63953572e-01]])
Coordinates:
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
  * trace        (trace) object 400B MultiIndex
  * receiver_id  (trace) <U3 600B 'R00' 'R00' 'R01' 'R01' ... 'R23' 'R24' 'R24'
  * component    (trace) <U1 200B 'X' 'Y' 'X' 'Y' 'X' ... 'Y' 'X' 'Y' 'X' 'Y'
<xarray.DataArray (time: 80, trace: 50)> Size: 32kB
array([[ 0.00000000e+00,  0.00000000e+00, -1.92085356e-01, ...,
         3.80587052e-01,  8.78661650e-01,  3.63953572e-01],
       [ 2.88986383e-01,  1.19702079e-01,  1.00224799e-01, ...,
         3.74001519e-01,  9.23872227e-01,  3.82680406e-01],
       [ 5.48970169e-01,  2.27390889e-01,  3.82476420e-01, ...,
         3.29881291e-01,  8.76363232e-01,  3.63001536e-01],
       ...,
       [-5.48970169e-01, -2.27390889e-01, -6.91471287e-01, ...,
         2.82343651e-01,  5.37081009e-01,  2.22466238e-01],
       [-2.88986383e-01, -1.19702079e-01, -4.65117875e-01, ...,
         3.48976970e-01,  7.45268820e-01,  3.08700453e-01],
       [-9.05140890e-16, -3.74921632e-16, -1.92085356e-01, ...,
         3.80587052e-01,  8.78661650e-01,  3.63953572e-01]])
Coordinates:
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
  * trace        (trace) object 400B MultiIndex
  * receiver_id  (trace) <U3 600B 'R00' 'R00' 'R01' 'R01' ... 'R23' 'R24' 'R24'
  * component    (trace) <U1 200B 'X' 'Y' 'X' 'Y' 'X' ... 'Y' 'X' 'Y' 'X' 'Y'
# The shape of this new array is now 2D
print(f"Original shape:\t{data_array_multicomponent.shape}")
print(f"Stacked shape:\t{da_multi_indexed.shape}")
Original shape:	(25, 2, 80)
Stacked shape:	(80, 50)
We can still select data of one of the original dimensions, like a single component, the same way:
da_multi_indexed.sel(component="X").T.plot()
<matplotlib.collections.QuadMesh at 0x7dec33ccadd0>
Or, we can directly make a selection on the new combined "trace" dimension for a specific receiver and component pair like this.
# Select the Y component of receiver R11.
da_multi_indexed.sel(trace=("R11", "Y")).plot()
[<matplotlib.lines.Line2D at 0x7dec33c780d0>]
This won't work, as it is not the correct order of coordinates. da_multi_indexed.sel(trace=("Y", "R11")).plot()
To go back to a 3D representation of the data, we can use the unstack method, which gives us our original 3D array.
da_3D = da_multi_indexed.unstack()
da_3D
<xarray.DataArray (time: 80, receiver_id: 25, component: 2)> Size: 32kB
array([[[ 0.00000000e+00,  0.00000000e+00],
        [-1.92085356e-01, -7.95643595e-02],
        [-3.75775660e-01, -1.55651375e-01],
        ...,
        [ 9.18818424e-01,  3.80587052e-01],
        [ 9.18818424e-01,  3.80587052e-01],
        [ 8.78661650e-01,  3.63953572e-01]],

       [[ 2.88986383e-01,  1.19702079e-01],
        [ 1.00224799e-01,  4.15144712e-02],
        [-9.29170890e-02, -3.84875184e-02],
        ...,
        [ 8.42504934e-01,  3.48976970e-01],
        [ 9.02919539e-01,  3.74001519e-01],
        [ 9.23872227e-01,  3.82680406e-01]],

       [[ 5.48970169e-01,  2.27390889e-01],
        [ 3.82476420e-01,  1.58426920e-01],
        [ 1.99266616e-01,  8.25389350e-02],
        ...,
...
        ...,
        [ 7.96403887e-01,  3.29881291e-01],
        [ 6.81637871e-01,  2.82343651e-01],
        [ 5.37081009e-01,  2.22466238e-01]],

       [[-2.88986383e-01, -1.19702079e-01],
        [-4.65117875e-01, -1.92658132e-01],
        [-6.20921484e-01, -2.57194100e-01],
        ...,
        [ 9.02919539e-01,  3.74001519e-01],
        [ 8.42504934e-01,  3.48976970e-01],
        [ 7.45268820e-01,  3.08700453e-01]],

       [[-9.05140890e-16, -3.74921632e-16],
        [-1.92085356e-01, -7.95643595e-02],
        [-3.75775660e-01, -1.55651375e-01],
        ...,
        [ 9.18818424e-01,  3.80587052e-01],
        [ 9.18818424e-01,  3.80587052e-01],
        [ 8.78661650e-01,  3.63953572e-01]]])
Coordinates:
  * receiver_id  (receiver_id) <U3 300B 'R00' 'R01' 'R02' ... 'R22' 'R23' 'R24'
  * component    (component) <U1 8B 'X' 'Y'
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
<xarray.DataArray (time: 80, receiver_id: 25, component: 2)> Size: 32kB
array([[[ 0.00000000e+00,  0.00000000e+00],
        [-1.92085356e-01, -7.95643595e-02],
        [-3.75775660e-01, -1.55651375e-01],
        ...,
        [ 9.18818424e-01,  3.80587052e-01],
        [ 9.18818424e-01,  3.80587052e-01],
        [ 8.78661650e-01,  3.63953572e-01]],

       [[ 2.88986383e-01,  1.19702079e-01],
        [ 1.00224799e-01,  4.15144712e-02],
        [-9.29170890e-02, -3.84875184e-02],
        ...,
        [ 8.42504934e-01,  3.48976970e-01],
        [ 9.02919539e-01,  3.74001519e-01],
        [ 9.23872227e-01,  3.82680406e-01]],

       [[ 5.48970169e-01,  2.27390889e-01],
        [ 3.82476420e-01,  1.58426920e-01],
        [ 1.99266616e-01,  8.25389350e-02],
        ...,
...
        ...,
        [ 7.96403887e-01,  3.29881291e-01],
        [ 6.81637871e-01,  2.82343651e-01],
        [ 5.37081009e-01,  2.22466238e-01]],

       [[-2.88986383e-01, -1.19702079e-01],
        [-4.65117875e-01, -1.92658132e-01],
        [-6.20921484e-01, -2.57194100e-01],
        ...,
        [ 9.02919539e-01,  3.74001519e-01],
        [ 8.42504934e-01,  3.48976970e-01],
        [ 7.45268820e-01,  3.08700453e-01]],

       [[-9.05140890e-16, -3.74921632e-16],
        [-1.92085356e-01, -7.95643595e-02],
        [-3.75775660e-01, -1.55651375e-01],
        ...,
        [ 9.18818424e-01,  3.80587052e-01],
        [ 9.18818424e-01,  3.80587052e-01],
        [ 8.78661650e-01,  3.63953572e-01]]])
Coordinates:
  * receiver_id  (receiver_id) <U3 300B 'R00' 'R01' 'R02' ... 'R22' 'R23' 'R24'
  * component    (component) <U1 8B 'X' 'Y'
  * time         (time) float64 640B 0.0 0.005063 0.01013 ... 0.3899 0.3949 0.4
Now that we know some of the basics of xarray data structures, let us have a closer look at how we can use them within Salvus. In the following example, we use Salvus to simulate some simple 2D seismic line data as it would be recorded by a line of geophones at the Earth's surface. The following cell sets up a simple elastic 2D simulation in a homogeneous domain with the following elastic properties:
  • P-wave velocity = 3000 m/s
  • S-wave velocity = 1500 m/s
  • Density = 2000 kg/m3^3
We set up a single vertically-directed source and a line of receivers spaced at 10 meters. Please refer to the tutorial "My first Salvus simulation" for more details on how to set up a Salvus simulation.
# A 2-D Box.
d = sn.domain.dim2.BoxDomain(x0=0.0, x1=350.0, y0=-100.0, y1=0.0)

# Create a project.
p = sn.Project.from_domain(domain=d, path="project_2d", load_if_exists=True)

# Model.
mc = sn.ModelConfiguration(
    background_model=sn.model.background.homogeneous.IsotropicElastic(
        vp=3000.0, vs=1500.0, rho=2000.0
    )
)

# Event specific configuration.
ec = sn.EventConfiguration(
    # 10 Hz Ricker wavelet
    wavelet=sn.simple_config.stf.Ricker(center_frequency=10.0),
    waveform_simulation_configuration=sn.WaveformSimulationConfiguration(
        end_time_in_seconds=0.3
    ),
)

# Combine everything into a complete configuration.
p += sn.SimulationConfiguration(
    name="simulation",
    elements_per_wavelength=1.2,
    max_frequency_in_hertz=25.0,
    model_configuration=mc,
    event_configuration=ec,
    # Absorbing boundaries everywhere but the top.
    absorbing_boundaries=sn.AbsorbingBoundaryParameters(
        reference_frequency=10.0,
        number_of_wavelengths=0.0,
        reference_velocity=1500.0,
        free_surface=["y1"],
    ),
)

# Add an event.
p += sn.Event(
    event_name="event_a",
    # A single source in the center.
    sources=[
        sn.simple_config.source.cartesian.VectorPoint2D(
            x=50.0, y=0.0, fy=-10.0, fx=0.0
        )
    ],
    # Line of 25 receivers spaced @ 10 meters
    receivers=[
        sn.simple_config.receiver.cartesian.Point2D(
            x=50.0 + (i + 1) * 10.0,
            y=0.0,
            station_code=f"AA_{i}",
            fields=["displacement"],
        )
        for i in range(25)
    ],
)

# Visualize the setup.
p.viz.nb.simulation_setup("simulation", events=p.events.list())
[2025-05-21 03:01:00,193] INFO: Creating mesh. Hang on.
<salvus.flow.simple_config.simulation.waveform.Waveform object at 0x7dec33a22110>
Now we run the simulations.
# Run it.
p.simulations.launch(
    simulation_configuration="simulation",
    events=p.events.list(),
    site_name="local",
    ranks_per_job=1,
)
p.simulations.query(block=True)
[2025-05-21 03:01:00,557] INFO: Submitting job ...
Uploading 1 files...

🚀  Submitted job_2505210301237559_63235d1c27@local
True
So, how do we now access the data? First, let us revisit how we can retrieve data of a single receiver as an obspy stream object. To do so, we first retrieve an EventData object from which we can then get the data using the get_waveform_data() method. For example, to retrieve data for the first receiver with id XX.AA_1., we would do the following.
# Get the EventData object.
ed = p.waveforms.get("simulation", "event_a")[0]
# Retrieve an obspy stream object for receiver AA_1
st = ed.get_waveform_data(
    receiver_name="XX.AA_1.", receiver_field="displacement"
)
st.plot(show=False)
<Figure size 800x500 with 2 Axes>
If we want to avoid retrieving the data for each receiver separately by looping over all receiver names, we can directly access all data of this event as an xarray.DataArray using the function get_waveform_data_xarray() that was introduced with Salvus release 2025.1.0.
da = ed.get_waveform_data_xarray(receiver_field="displacement")
da
<xarray.DataArray 'displacement' (trace: 50, time: 266)> Size: 53kB
array([[ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -4.1478032e-12, -4.4830905e-12, -4.7161615e-12],
       [ 0.0000000e+00,  0.0000000e+00, -2.0031971e-19, ...,
         5.2360377e-12,  5.3063691e-12,  5.2815118e-12],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -4.4507397e-12, -4.5939342e-12, -4.6322750e-12],
       ...,
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -1.6913059e-11, -1.5586617e-11, -1.4017106e-11],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         5.6617484e-12,  5.1259292e-12,  4.5559281e-12],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -3.0636202e-11, -2.7980450e-11, -2.5016451e-11]], dtype=float32)
Coordinates:
  * time         (time) float64 2kB -0.1559 -0.1542 -0.1525 ... 0.2983 0.3
  * trace        (trace) object 400B MultiIndex
  * receiver_id  (trace) <U9 2kB 'XX.AA_0.' 'XX.AA_0.' ... 'XX.AA_24.'
  * component    (trace) <U1 200B 'X' 'Y' 'X' 'Y' 'X' ... 'Y' 'X' 'Y' 'X' 'Y'
Attributes:
    start_time_in_seconds:               -0.15593936024673521
    sampling_rate_in_hertz:              581.217642312331
    start_time_in_seconds_utc_datetime:  1969-12-31T23:59:59.844061Z
    end_time_in_seconds_utc_datetime:    1970-01-01T00:00:00.300000Z
    reference_time_utc:                  1970-01-01T00:00:00.000000Z
    reference_time_in_seconds:           0.0
    npts:                                266
    location:                            [[ 60.   0.]\n [ 70.   0.]\n [ 80.  ...
<xarray.DataArray 'displacement' (trace: 50, time: 266)> Size: 53kB
array([[ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -4.1478032e-12, -4.4830905e-12, -4.7161615e-12],
       [ 0.0000000e+00,  0.0000000e+00, -2.0031971e-19, ...,
         5.2360377e-12,  5.3063691e-12,  5.2815118e-12],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -4.4507397e-12, -4.5939342e-12, -4.6322750e-12],
       ...,
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -1.6913059e-11, -1.5586617e-11, -1.4017106e-11],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         5.6617484e-12,  5.1259292e-12,  4.5559281e-12],
       [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -3.0636202e-11, -2.7980450e-11, -2.5016451e-11]], dtype=float32)
Coordinates:
  * time         (time) float64 2kB -0.1559 -0.1542 -0.1525 ... 0.2983 0.3
  * trace        (trace) object 400B MultiIndex
  * receiver_id  (trace) <U9 2kB 'XX.AA_0.' 'XX.AA_0.' ... 'XX.AA_24.'
  * component    (trace) <U1 200B 'X' 'Y' 'X' 'Y' 'X' ... 'Y' 'X' 'Y' 'X' 'Y'
Attributes:
    start_time_in_seconds:               -0.15593936024673521
    sampling_rate_in_hertz:              581.217642312331
    start_time_in_seconds_utc_datetime:  1969-12-31T23:59:59.844061Z
    end_time_in_seconds_utc_datetime:    1970-01-01T00:00:00.300000Z
    reference_time_utc:                  1970-01-01T00:00:00.000000Z
    reference_time_in_seconds:           0.0
    npts:                                266
    location:                            [[ 60.   0.]\n [ 70.   0.]\n [ 80.  ...
Looking at the output, we observe the following structure of the data.
The data is wrapped in a 2D array with the name corresponding to the field of the data, in this case "displacement". The array has the following two dimensions, always in this order:
  • "trace": A multi-index dimension that combines the dimensions "receiver_id" and "component". For details on multi-indices, see section 1 of this tutorial.
  • "time: The time dimension.
The data contains additional attributes. The most important ones are the following:
  • "start_time_in_seconds": The time of the first sample.
  • "sampling_rate_in_hertz": The sampling rate of the data.
  • "npts": The number of time samples.
  • "location": The coordinates of each receiver location.
We can now in turn select the two components "X" and "Y" and plot them.
plt.figure()
# Note that the xarray plotting routines use
# matplotlib under the hood and accept the
# usual input arguments. Here, we make sure
# that both components are plotted with the same
# color range using the vmin and vmax arguments.
da.sel(component="X").plot(vmin=-6e-10, vmax=6e-10)
plt.figure()
da.sel(component="Y").plot(vmin=-6e-10, vmax=6e-10)
# Attributes are stored as a dictionary and can be accessed like this (e.g.,
# for the sampling rate):
<matplotlib.collections.QuadMesh at 0x7dec32a35050>
sr = da.attrs["sampling_rate_in_hertz"]
sr
581.217642312331
Again, we can use the unstack() method to obtain a 3D representation of the data with dimensions ("receiver_id", "component", "time").
da_3D = da.unstack()
print(
    f"The unstacked array has a shape of {da_3D.shape}, with the dimensions "
    f"corresponding to {da_3D.dims}."
)
The unstacked array has a shape of (266, 25, 2), with the dimensions corresponding to ('time', 'receiver_id', 'component').
You can use SalvusProject to apply custom processing, data selection, and misfit functions to you data. To do so, you need to format your custom functions in a specific way. As of Salvus release 2025.1.0, you can now define custom functions that operate on the new xarray data structures directly and add them to a project. Trace-by-trace based custom functions operating on obspy.Stream objects are still supported as well. However, you should observe a significant performance improvement when using the xarray interface for custom processing functions, especially when working with data with a large number of receivers.
In the following, we briefly describe the format of the new custom functions and give some examples on how you can use these functions within a Salvus project.
First, let us consider the case where we want to apply some filtering to our data. Here, we consider a simple bandpass filter that filters the data to a frequency band beteween 10 and 20 Hz. This function will directly operate on an xarray.DataArray object.
If we want Salvus to recognize our custom filtering function, we need to specify it in a specific way. The function needs to take exactly two mandatory input arguments:
  • data_array: The Salvus xarray.DataArray object on which the filtering will be performed.
  • event: The Salvus Event object of the Event that this data corresponds to.
def bandpass_filter(
    data_array: xr.DataArray,
    event: sn.Event,
) -> xr.DataArray:
    """
    A custom bandpass filtering function that operates on an xarray.DataArray
    object.

    Args:
        data_array: The input data array.
        event: The Salvus event that the data belongs to.

    Returns:
        data_array: The filtered output data array.
    """

    # Minimum frequency in Hz
    min_frequency = 10.0
    # Maximum frequency in Hz
    max_frequency = 20.0
    # The order of the filter.
    N = 8
    # If true, the resulting filter will be zero phase.
    zerophase = True

    # Here, we make a copy of the original data
    # to avoid modifying the input data when
    # applying the filter.
    data_array_filtered = data_array.copy()

    sampling_rate_in_hertz = data_array_filtered.attrs[
        "sampling_rate_in_hertz"
    ]
    nyquist = 0.5 * sampling_rate_in_hertz
    sos = scipy.signal.iirfilter(
        # The order of the filter.
        N=N,
        # The critical frequencies.
        Wn=(min_frequency / nyquist, max_frequency / nyquist),
        btype="band",
        # Choose a butterworth filter for the linear phase shift.
        ftype="butter",
        # Directly output the second order section coefficients.
        output="sos",
    )
    if zerophase:
        data_array_filtered.data[:] = scipy.signal.sosfiltfilt(
            sos, data_array_filtered.data, axis=1
        )
    else:
        data_array_filtered.data[:] = scipy.signal.sosfilt(
            sos, data_array_filtered.data, axis=1
        )
    # Apply the filter to the data
    return data_array_filtered
Let us apply this function to our simulated data to see if this works as intended.
# Plot the unfiltered data
plt.figure()
da.sel(component="X").plot()

# Plot the filtered data
plt.figure()
da_filt = bandpass_filter(da, ed)
da_filt.sel(component="X").plot()
<matplotlib.collections.QuadMesh at 0x7ded1046fbd0>
This looks as expected.
We can now directly add this function to our project so that Salvus can use it internally. This is achieved via a ProcessingConfiguration.
# Define a processing configuration
pc = sn.processing.ProcessingConfiguration(
    name="bandpass",
    data_source_name="SYNTHETIC_DATA:simulation",
    processing_function=bandpass_filter,
)
# Add it to the project
p.add_to_project(
    pc,
    overwrite=True,
)
The project is now aware of this function and we can use it, for example, when defining a new misfit configuration.
To retrieve filtered data from the project, we can access it in the following way, adding the function "bandpass" with a : to the data that is being queried:
ed_filt = p.waveforms.get("PROCESSED_DATA:bandpass", "event_a")[0]
da_filt = ed_filt.get_waveform_data_xarray(receiver_field="displacement")
da_filt.sel(component="X").plot()
<matplotlib.collections.QuadMesh at 0x7dec32d53750>
In a similar fashion, we can define custom data selection and windowing functions using the xarray interface. Here, we want to define a data selection function that performs a couple of operations on our input data:
First, the data selection function should only select a subset of the available receivers. This is important if we want to disregard certain receivers in an inversion (e.g., receivers that record data that is too noisy).
Second, the function should only select a single component. In this case, we want to select only the vertical component.
Last, the data selection function should apply a time window to the data.
Again, our data selection function will need to accept two mandatory input arguments:
  • data_array: The Salvus xarray.DataArray object on which the data selection will be performed.
  • event: The Salvus Event object of the event that this data corresponds to.
The output of the function should then again be a DataArray, with the same dimensions as the DataArray on which the selection was performed. However, please note that in this case, not all coordinates of the original DataArray need to be present: the omission of e.g. a receiver_id or component implicitly sets its weights to zero. The data of the weights array itself should correspond to the temporal weighting function (window) that will be applied to the data.
In the following, we define such a data selection function.
def select_data(
    data_array: xr.DataArray,
    event: sn.Event,
):
    """
    A custom data selection function that operates on an xarray.DataArray
    object.

    Args:
        data_array: The input data array.
        event: The Salvus event that the data belongs to.

    Returns:
        weights: A data array with the desired selection applied and data
            corresponding to the temporal weights to be applied to the data.
    """
    # Some stations that we wish to keep.
    selected_receiver_ids = ["XX.AA_1.", "XX.AA_2.", "XX.AA_3."]
    # The component that we wish to keep.
    selected_components = ["Y"]

    # This is an intersection of the selection (selected stations) and the
    # available (station, component) pairs. Important if not every station
    # has every component.
    selection = [
        (receiver_id, component)
        for receiver_id, component in set(data_array.trace.data)
        if receiver_id in selected_receiver_ids
        and component in selected_components
    ]
    weights = data_array.sel(trace=selection)

    # Set the weights to 1, if kept like this, no temporal selection on the
    # data is performed
    weights.data[:] = 1

    # Now we additionally apply a box window on the time dimension,
    # where the first half of the data is disregarded.
    weights.data[:, : weights.data.shape[1] // 2] = 0

    # With the coordinate "trace_weights", we can optionally give a specific
    # scalar weight to each trace that will be considered in the computation
    # of misfits. You can use it by uncommenting the following line.

    # weights.coords["trace_weights"] = ("trace", np.ones((weights.shape[0])))

    return weights
We can now create a custom DataSelectionConfiguration and add it to our project.
dsc = sn.DataSelectionConfiguration(
    name="data_selection",
    receiver_field="displacement",
    data_selection_function=select_data,
)
p.add_to_project(
    dsc,
    overwrite=True,
)
This data selection configuration can now be used when setting up a MisfitConfiguration. To check whether the data selection works as intended, we retrieve data from the project and specify that we want to use our new data selection configuration.
ed_filt_sel = p.waveforms.get(
    "PROCESSED_DATA:bandpass",
    "event_a",
    data_selection_configuration="data_selection",
)[0]
da_filt_sel = ed_filt_sel.get_waveform_data_xarray(
    receiver_field="displacement"
)
da_filt_sel
<xarray.DataArray 'displacement' (trace: 3, time: 266)> Size: 3kB
array([[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
...
        -2.90627435e-11, -2.62216724e-11, -2.28313167e-11,
        -1.90024246e-11, -1.48555161e-11, -1.05167671e-11,
        -6.11381041e-12, -1.77160532e-12,  2.39147287e-12,
         6.26719466e-12,  9.76056909e-12,  1.27923748e-11,
         1.53010521e-11,  1.72439233e-11,  1.85977067e-11,
         1.93583587e-11,  1.95402444e-11,  1.91747347e-11,
         1.83082664e-11,  1.70000004e-11,  1.53191453e-11,
         1.33420896e-11,  1.11494312e-11,  8.82304299e-12,
         6.44325912e-12,  4.08628730e-12,  1.82192846e-12,
        -2.88330639e-13, -2.19280670e-12, -3.85065851e-12,
        -5.23246680e-12, -6.32036264e-12, -7.10773150e-12,
        -7.59853118e-12, -7.80627559e-12, -7.75275243e-12,
        -7.46653260e-12, -6.98135820e-12, -6.33446334e-12,
        -5.56491441e-12, -4.71202287e-12, -3.81388714e-12,
        -2.90611719e-12, -2.02076637e-12, -1.18550536e-12,
        -4.23043897e-13,  2.49197391e-13,  8.19173288e-13,
         1.28007858e-12,  1.63001708e-12,  1.87150816e-12,
         2.01086673e-12,  2.05749567e-12,  2.02313145e-12,
         1.92107658e-12,  1.76545714e-12,  1.57053331e-12,
         1.35008693e-12,  1.11690626e-12]], dtype=float32)
Coordinates:
  * time         (time) float64 2kB -0.1559 -0.1542 -0.1525 ... 0.2983 0.3
  * trace        (trace) object 24B MultiIndex
  * receiver_id  (trace) <U9 108B 'XX.AA_3.' 'XX.AA_1.' 'XX.AA_2.'
  * component    (trace) <U1 12B 'Y' 'Y' 'Y'
Attributes:
    start_time_in_seconds:               -0.15593936024673521
    sampling_rate_in_hertz:              581.217642312331
    start_time_in_seconds_utc_datetime:  1969-12-31T23:59:59.844061Z
    end_time_in_seconds_utc_datetime:    1970-01-01T00:00:00.300000Z
    reference_time_utc:                  1970-01-01T00:00:00.000000Z
    reference_time_in_seconds:           0.0
    npts:                                266
    location:                            [[ 60.   0.]\n [ 70.   0.]\n [ 80.  ...
    temporal_weights:                    <xarray.DataArray 'displacement' (tr...
<xarray.DataArray 'displacement' (trace: 3, time: 266)> Size: 3kB
array([[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
...
        -2.90627435e-11, -2.62216724e-11, -2.28313167e-11,
        -1.90024246e-11, -1.48555161e-11, -1.05167671e-11,
        -6.11381041e-12, -1.77160532e-12,  2.39147287e-12,
         6.26719466e-12,  9.76056909e-12,  1.27923748e-11,
         1.53010521e-11,  1.72439233e-11,  1.85977067e-11,
         1.93583587e-11,  1.95402444e-11,  1.91747347e-11,
         1.83082664e-11,  1.70000004e-11,  1.53191453e-11,
         1.33420896e-11,  1.11494312e-11,  8.82304299e-12,
         6.44325912e-12,  4.08628730e-12,  1.82192846e-12,
        -2.88330639e-13, -2.19280670e-12, -3.85065851e-12,
        -5.23246680e-12, -6.32036264e-12, -7.10773150e-12,
        -7.59853118e-12, -7.80627559e-12, -7.75275243e-12,
        -7.46653260e-12, -6.98135820e-12, -6.33446334e-12,
        -5.56491441e-12, -4.71202287e-12, -3.81388714e-12,
        -2.90611719e-12, -2.02076637e-12, -1.18550536e-12,
        -4.23043897e-13,  2.49197391e-13,  8.19173288e-13,
         1.28007858e-12,  1.63001708e-12,  1.87150816e-12,
         2.01086673e-12,  2.05749567e-12,  2.02313145e-12,
         1.92107658e-12,  1.76545714e-12,  1.57053331e-12,
         1.35008693e-12,  1.11690626e-12]], dtype=float32)
Coordinates:
  * time         (time) float64 2kB -0.1559 -0.1542 -0.1525 ... 0.2983 0.3
  * trace        (trace) object 24B MultiIndex
  * receiver_id  (trace) <U9 108B 'XX.AA_3.' 'XX.AA_1.' 'XX.AA_2.'
  * component    (trace) <U1 12B 'Y' 'Y' 'Y'
Attributes:
    start_time_in_seconds:               -0.15593936024673521
    sampling_rate_in_hertz:              581.217642312331
    start_time_in_seconds_utc_datetime:  1969-12-31T23:59:59.844061Z
    end_time_in_seconds_utc_datetime:    1970-01-01T00:00:00.300000Z
    reference_time_utc:                  1970-01-01T00:00:00.000000Z
    reference_time_in_seconds:           0.0
    npts:                                266
    location:                            [[ 60.   0.]\n [ 70.   0.]\n [ 80.  ...
    temporal_weights:                    <xarray.DataArray 'displacement' (tr...
Note that the resulting DataArray has the temporal weights applied, but any station or component that wasn't present in the returned object from select_data is not present at all in the retrieved data.
# Present components and stations
list(da_filt_sel.unstack().component.data), list(
    da_filt_sel.unstack().receiver_id.data
)
(['Y'], ['XX.AA_3.', 'XX.AA_1.', 'XX.AA_2.'])
If we now plot the output, you will see that our data selection works as intended.
da_filt_sel.sel(component="Y").plot()
<matplotlib.collections.QuadMesh at 0x7dec32f6c610>
In earlier versions of Salvus, misfits were always computed on a trace-by-trace basis by looping over all receivers and components. Now, misfits are directly computed on the new xarray structures, which allows us to vectorize the misfit computations. If you want to define a custom misfit function, you can now take advantage of this. Additionally, it is now possible to define time-and-space dependent misfit functions.
Here, we simply review the standard L2 misfit to understand how custom misfit functions can be implemented within Salvus.
A custom misfit function needs to take three mandatory input arguments:
  • data_array_synthetic: The synthetic waveforms as an xarray.DataArray object.
  • data_array_observed: The observed waveforms as an xarray.DataArray object.
  • event: The event object.
The output must be an xarray.Dataset object. A xarray.Dataset object combines multiple xarray.DataArray's. The output Dataset will contain two DataArray's, one for the adjoint sources and one for the per-trace misfits.
The following code snippets demonstrates the computation of the misfits and adjoint sources for a conventional L2 misfit.
def l2_misfit_and_adjoint_source(
    data_array_synthetic: xr.DataArray,
    data_array_observed: xr.DataArray,
    event: sn.Event,
) -> xr.Dataset:
    """
    L2 misfit and adjoint source.

    Using this only serves for demonstration purposes right now as Salvus
    already comes with this misfit implemented.

    Both data traces have been resampled to exactly the same points in time.

    Args:
        data_array_synthetic: synthetic waveforms.
        data_array_observed: observed waveforms.
        event: Event object.
    """
    diff = data_array_observed - data_array_synthetic
    misfits = (
        0.5
        * (diff**2).sum(axis=1)
        / data_array_observed.attrs["sampling_rate_in_hertz"]
    )
    adj_src = diff / data_array_observed.attrs["sampling_rate_in_hertz"]

    return xr.Dataset(
        data_vars={
            "adjoint_sources": adj_src,
            "misfits": misfits,
        },
        attrs={
            "sampling_rate_in_hertz": data_array_synthetic.attrs[
                "sampling_rate_in_hertz"
            ],
            "start_time_in_seconds": data_array_synthetic.attrs.get(
                "start_time_in_seconds", 0.0
            ),
        },
    )
We can now set up a new misfit configuration using all of our custom functions (processing, data_selection, and custom misfit) and add it to our project in the following way.
mc = sn.MisfitConfiguration(
    name="custom_L2",
    observed_data="PROCESSED_DATA:bandpass",
    misfit_function=l2_misfit_and_adjoint_source,
    receiver_field="displacement",
    data_selection_configuration="data_selection",
)

p.add_to_project(mc)
Let us verify that everything was properly added to our project:
<salvus.project.project.Project at 0x7dec33a34c50>
PAGE CONTENTS