VoidFinder API

file_preprocess

vast.voidfinder.preprocessing.file_preprocess(galaxies_filename, in_directory, out_directory, mag_cut=True, rm_isolated=True, dist_metric='comoving', min_z=None, max_z=None, Omega_M=0.3, h=1.0, verbose=0)[source]

Set up output file names, calculate distances, etc.

Parameters
galaxies_filenamestring

File name of galaxy catalog. Should be readable by astropy.table.Table.read as a ascii.commented_header file. Required columns include ‘ra’, ‘dec’, ‘z’, and absolute magnitude (either ‘rabsmag’ or ‘magnitude’.

in_directorystring

Directory path for input files

out_directorystring

Directory path for output files

mag_cutboolean

Determines whether or not to implement a magnitude cut on the galaxy survey. Default is True (remove all galaxies fainter than Mr = -20).

rm_isolatedboolean

Determines whether or not to remove isolated galaxies (defined as those with the distance to their third nearest neighbor greater than the sum of the average third-nearest-neighbor distance and 1.5 times the standard deviation of the third-nearest-neighbor distances).

dist_metricstring

Description of which distance metric to use. Options should include ‘comoving’ (default) and ‘redshift’.

min_z, max_zfloat

Minimum and maximum redshift range for the survey mask. Default values are None (determined from galaxy extent).

Omega_Mfloat

Value of the matter density of the given cosmology. Default is 0.3.

hfloat

Value of the Hubble constant. Default is 1 (so all distances will be in units of h^-1).

Returns
galaxy_data_tableastropy table

Table of all galaxies in catalog.

dist_limitsnumpy array of shape (2,)

Minimum and maximum distances to use for void search. Units are Mpc/h, in either comoving or redshift coordinates (depending on dist_metric).

out1_filenamestring

File name of maximal sphere output file.

out2_filenamestring

File name of all void holes

filter_galaxies

vast.voidfinder.filter_galaxies(galaxy_table, survey_name, out_directory, mag_cut=True, dist_limits=None, rm_isolated=True, write_table=True, sep_neighbor=3, dist_metric='comoving', h=1.0, magnitude_limit=-20.09, verbose=0)[source]

A hodge podge of miscellaneous tasks which need to be done to format the data into something the main find_voids() function can use.

  1. Optional magnitude cut

  2. Convert from ra-dec-redshift space into xyz space

  3. Calculate the hole search grid shape

  4. Optional remove isolated galaxies by partitioning them into wall (non-isolated) and field (isolated) groups

  5. Optionally write out the wall and field galaxies to disk

Parameters
galaxy_tableastropy.table of shape (N,?)

variable number of required columns. If doing magnitude cut, must include ‘rabsmag’ column. If distance metric is ‘comoving’, must include ‘Rgal’ column, otherwise must include ‘redshift’. Also must always include ‘ra’ and ‘dec’

survey_namestr

Name of the galxy catalog, string value to prepend or append to output names

out_directorystring

Directory path for output files

mag_cutbool

whether or not to cut on magnitude, removing galaxies less than magnitude_limit

dist_limitslist of length 2

[Minimum distance, maximum distance] of galaxy sample (in units of Mpc/h)

magnitude_limitfloat

value at which to perform magnitude cut

rm_isolatedbool

whether or not to perform Nth neighbor distance calculation, and use it to partition the input galaxies into wall and field galaxies

write_tablebool

use astropy.table.Table.write to write out the wall and field galaxies to file

sep_neighborint, positive

if rm_isolated_flag is true, find the Nth galaxy neighbors based on this value

dist_metricstr

Distance metric to use in calculations. Options are ‘comoving’ (default; distance dependent on cosmology) and ‘redshift’ (distance independent of cosmology).

hfloat

Fractional value of Hubble’s constant. Default value is 1 (where H0 = 100h).

verboseint

values greater than zero indicate to print output

Returns
wall_gals_xyznumpy.ndarray of shape (K,3)

the galaxies which were designated not to be isolated

field_gals_xyznumpy.ndarray of shape (L,3)

the galaxies designated as isolated

hole_grid_shapetuple of 3 integers (i,j,k)

shape of the hole search grid

coords_minnumpy.ndarray of shape (3,)

coordinates of the minimum of the survey used for converting from xyz space into ijk space

ra_dec_to_xyz

vast.voidfinder.ra_dec_to_xyz(galaxy_table, distance_metric='comoving', h=1.0)[source]

Convert galaxy coordinates from ra-dec-redshift space into xyz space.

Parameters
galaxy_tableastropy.table of shape (N,?)

must contain columns ‘ra’ and ‘dec’ in degrees, and either ‘Rgal’ in who knows what unit if distance_metric is ‘comoving’ or ‘redshift’ for everything else

distance_metricstr

Distance metric to use in calculations. Options are ‘comoving’ (default; distance dependent on cosmology) and ‘redshift’ (distance independent of cosmology).

hfloat

Fractional value of Hubble’s constant. Default value is 1 (where H0 = 100h).

Returns
coords_xyznumpy.ndarray of shape (N,3)

values of the galaxies in xyz space

z_to_comoving_dist

vast.voidfinder.distance.z_to_comoving_dist()

Convert redshift values into the comoving distance cosmology using the integral of the Robertson-Walker metric.

Parameters
z_inputnumpy.ndarray of shape (N,)

redshift values to compute distances for

omega_Mfloat

Cosmological matter energy density

hfloat

Hubble constant factor

Returns
output_comov_distsnumpy.ndarray of shape (N,)

the comoving distance values in units of Mpc/h

generate_mask

vast.voidfinder.multizmask.generate_mask(gal_data, z_max, dist_metric='comoving', smooth_mask=True, min_maximal_radius=10.0, Omega_M=0.3, h=1.0)[source]

This function creates a grid of shape (N,M) where the N dimension represents increments of the ra space (0 to 360 degrees) and the M dimension represents increments in the dec space (0 to 180 degrees). The value of the mask is a boolean representing whether or not that (ra,dec) position is part of the survey, or outside the survey. For example, if mask[320,17] == True, that indicates the right ascension of 320 degrees and declination of 17 degrees is within the survey.

Note that this mask will be for the ra-dec (right ascension, declination) space of the survey, the radial min/max limits will be need to be checked separately.

Parameters
gal_dataastropy table

Table of all galaxies in sample Ra and Dec must be given in degrees Ra can be in either -180 to 180 or 0 to 360 format Dec must be in -90 to 90 format since the code below subtracts 90 degrees to go to 0 to 180 format

z_maxfloat

Maximum redshift of the volume-limited catalog.

dist_metricstring

Distance metric to use in calculations. Options are ‘comoving’ (default; distance dependent on cosmology) and ‘redshift’ (distance independent of cosmology).

smooth_maskboolean

If smooth_mask is set to True (default), small holes in the mask (single cells without any galaxy in them that are surrounded by at least 3 cells which have galaxies in them) are unmasked.

min_maximal_radiusfloat

Minimum radius of the maximal spheres. Default is 10 Mpc/h. The mask resolution depends on this value.

Omega_Mfloat

Cosmological matter normalized energy density. Default is 0.3.

hfloat

Fractional value of Hubble’s constant. Default value is 1 (where H0 = 100h).

Returns
masknumpy array of shape (N,M)

Boolean array of the entire sky, with points within the survey limits set to True. N represents the incremental RA; M represents the incremental dec.

mask_resolutioninteger

Scale factor of coordinates in maskfile

find_voids

vast.voidfinder.find_voids(galaxy_coords_xyz, survey_name, mask_type='ra_dec_z', mask=None, mask_resolution=None, dist_limits=None, xyz_limits=None, check_only_empty_cells=True, max_hole_mask_overlap=0.1, hole_grid_edge_length=5.0, grid_origin=None, min_maximal_radius=10.0, galaxy_map_grid_edge_length=None, pts_per_unit_volume=0.01, maximal_spheres_filename='maximal_spheres.txt', void_table_filename='voids_table.txt', potential_voids_filename='potential_voids_list.txt', num_cpus=None, save_after=None, use_start_checkpoint=False, batch_size=10000, verbose=0, print_after=5.0)[source]

Main entry point for VoidFinder.

Using the VoidFinder algorithm, this function grows a sphere in each empty grid cell of a grid imposed over the target galaxy distribution. It then combines these spheres into unique voids, identifying a maximal sphere for each void.

This algorithm at a high level uses 3 data to find voids in the large-scale structure of the universe:

  1. The galaxy coordinates

  2. A survey-limiting mask

  3. A cubic-cell grid of potential void locations

Before running VoidFinder, a preprocessing stage of removing isolated galaxies from the target galaxy survey is performed. Currently this is done by removing galaxies whose distance to their 3rd nearest neighbor is greater than 1.5 times the standard deviation of 3rd nearest neighbor distance for the survey. This step should be performed prior to calling this function.

Next, VoidFinder will impose a grid of cubic cells over the remaining non-isolated, or “wall” galaxies. The cell size of this grid should be small enough to allow a thorough search, but is also the primary consumer of time in this algorithm.

At each grid cell, VoidFinder will evaluate whether that cubic cell is “empty” or “nonempty.” Empty cells contain no galaxies, non-empty cells contain at least 1 galaxy. This makes the removal of isolated galaxies in the preprocessing stage important.

VoidFinder will proceed to grow a sphere (or hole), at every Empty grid cell. These pre-void holes will be filtered such that the potential voids along the edge of the survey will be removed, since any void on the edge of the survey could potentially grow unbounded, and there may be galaxies not present which would have bounded the void. After the filtering, these pre-voids will be combined into the actual voids based on an analysis of their overlap.

This implementation uses a reference point, ‘coords_min’, from xyz space, and the ‘hole_grid_edge_length’ to convert between the x,y,z coordinates of a galaxy, and the i,j,k coordinates of a cell in the search grid such that:

ijk = ((xyz - coords_min)/hole_grid_edge_length).astype(integer)

During the sphere growth, VoidFinder also uses a secondary grid to help find the bounding galaxies for a sphere. This secondary grid facilitates nearest-neighbor and radius queries, and uses a coordinate space referred to in the code as pqr, which uses a similar transformation:

pqr = ((xyz - coords_min)/neighbor_grid_edge_length).astype(integer)

In VoidFinder terminology, a Void is a union of spheres, and a single sphere is just a hole. The Voids are found by taking the set of holes, and ordering them based on radius. Starting from the largest found hole, label it a maximal sphere, and continue to the next hole. If the next hole does not overlap with any of the previous maximal spheres by some factor, it is also considered a maximal sphere. This process is repeated until there are no more maximal spheres, and all other spheres are joined to the maximal spheres.

A note on the purpose of VoidFinder - VoidFinder is intended to find distinct, discrete void locations within the large scale structure of the universe. This is in contrast to finding the large scale void structure. VoidFinder answers the question “Where are the voids?” with a concrete “Here is a list of x,y,z coordinates”, but it does not answer the questions “What do the voids look like? How are they shaped? How much do they overlap?” These questions can be partially answered with additional analysis on the output of VoidFinder, but the main algorithm is intended to find discrete, disjoint x-y-z coordinates of the centers of void regions. If you wanted a local density estimate for a given galaxy, you could just use the distance to Nth nearest neighbor, for example.

To do this, VoidFinder makes the following assumptions:

  1. A Void region can be approximated by a union of spheres. Note: the center of the maximal sphere in that void region will yield the x-y-z coordinate of that void region.

  2. Void regions are distinct/discrete - we are not looking for huge tunneling structures throughout space, if does happen to be the structure of space (it basically does happen to be that way) we want the locations of the biggest rooms

Parameters
galaxy_coordsnumpy.ndarray of shape (num_galaxies, 3)

coordinates of the galaxies in the survey, units of Mpc/h (xyz space)

survey_namestr

identifier for the survey running, may be prepended or appended to output filenames including the checkpoint filename

mask_typestring, one of [‘ra_dec_z’, ‘xyz’, ‘periodic’]

Determines the mode of mask checking to use and which mask parameters to use.

‘ra_dec_z’ means the mask, mask_resolution, and dist_limits parameters

must be provided. The ‘mask’ represents an angular space in Right Ascension and Declination, the corresponding mask_resolution integer represents the scale needed to index into the Right Ascension and Declination of the mask, and the dist_limits represent the min and max redshift values (as radial distances in xyz space).

‘xyz’ means that the xyz_limits parameter must be provided which

directly encodes a bounding box for the survey in xyz space

‘periodic’ means that the xyz_limits parameter must be provided, which

directly encodes a bounding box representing the periodic boundary of the survey, and the survey will be treated as if its bounding box were tiled to infinity in all directions. Spheres will still only be grown starting from within the original bounding box.

masknumpy.ndarray of shape (N,M) type bool

Represents the survey footprint in scaled ra/dec space. Value of True indicates that a location is within the survey (ra/dec space)

mask_resolutioninteger

Scale factor of coordinates needed to index mask

dist_limitsnumpy array of shape (2,)

[min_dist, max_dist] in units of Mpc/h (xyz space)

xyz_limitsnumpy array of shape (2,3)
format [x_min, y_min, z_min]

[x_max, y_max, z_max]

to be used for checking against the mask when mask_type == ‘xyz’ or for periodic conditions when mask_type == ‘periodic’

hole_grid_edge_lengthfloat

Size in Mpc/h of the edge of 1 cube in the search grid, or distance between 2 grid cells (xyz space)

grid_originndarray of shape (3,) or None

The spatial location to use as (0,0,0) in the search grid. if None, will use the numpy.min() function on the provided galaxies as the grid origin

min_maximal_radiusfloat

The minimum radius in units of distance for a hole to be considered for maximal status. Default value is 10 Mpc/h.

max_hole_mask_overlapfloat in range (0, 0.5)

When the volume of a hole overlaps the mask by this fraction, discard that hole. Maximum value of 0.5 because a value of 0.5 means that the hole center will be outside the mask, but more importantly because the numpy.roots() function used below won’t return a valid polynomial root.

galaxy_map_grid_edge_lengthfloat or None

Edge length in Mpc/h for the secondary grid for finding nearest neighbor galaxies. If None, will default to 3*hole_grid_edge_length (which results in a cell volume of 3^3 = 27 times larger cube volume). This parameter yields a tradeoff between number of galaxies in a cell, and number of cells to search when growing a sphere. Too large and many redundant galaxies may be searched, too small and too many cells will need to be searched. (xyz space)

hole_center_iter_distfloat

Distance to move the sphere center each iteration while growing a void sphere in units of Mpc/h (xyz space)

pts_per_unit_volumefloat

Number of points per unit volume that are distributed within the holes to calculate the fraction of the hole’s volume that falls outside the survey bounds. Default is 0.01.

maximal_spheres_filenamestr

Location to save maximal spheres file

void_table_filenamestr

Location to save void table to

potential_voids_filenamestr

Location to save potential voids file to

num_cpusint or None

Number of cpus to use while running the main algorithm. None will result in using number of physical cores on the machine. Some speedup benefit may be obtained from using additional logical cores via Intel Hyperthreading but with diminishing returns. This can safely be set above the number of physical cores without issue if desired.

save_afterint or None

Save a VoidFinderCheckpoint.h5 file after approximately every save_after cells have been processed. This will over-write this checkpoint file every save_after cells, NOT append to it. Also, saving the checkpoint file forces the worker processes to pause and synchronize with the master process to ensure the correct values get written, so choose a good balance between saving too often and not often enough if using this parameter. Note that it is an approximate value because it depends on the number of worker processes and the provided batch_size value, if your batch size is 10,000 and your save_after is 1,000,000 you might actually get a checkpoint at say 1,030,000. If None, disables saving the checkpoint file.

check_only_empty_cellsbool

Whether or not to start growing a hole in a cell which has galaxies in it, aka “non-empty”. If True (default), don’t grow holes in these cells.

use_start_checkpointbool

Whether to attempt looking for a VoidFinderCheckpoint.h5 file which can be used to restart the VF run. If False, VoidFinder will start fresh from 0.

batch_sizeint

Number of potential void cells to evaluate at a time. Lower values may be a bit slower as it involves some memory allocation overhead, and values which are too high may cause the status update printing to take more than print_after seconds. Default value 10,000

verboseint or bool

Level of verbosity to print during running, 0 indicates off, 1 indicates to print after every ‘print_after’ cells have been processed, and 2 indicates to print all debugging statements

print_afterfloat

Number of seconds to wait before printing a status update

Returns
All output is currently written to disk:
potential voids table, ascii.commented_header format
combined voids table, ascii.commented_header format
maximal spheres table