VoidFinder API
file_preprocess
- vast.voidfinder.preprocessing.file_preprocess(galaxies_filename, in_directory, out_directory, mag_cut=True, rm_isolated=True, dist_metric='comoving', min_z=None, max_z=None, Omega_M=0.3, h=1.0, verbose=0)[source]
Set up output file names, calculate distances, etc.
- Parameters
- galaxies_filenamestring
File name of galaxy catalog. Should be readable by astropy.table.Table.read as a ascii.commented_header file. Required columns include ‘ra’, ‘dec’, ‘z’, and absolute magnitude (either ‘rabsmag’ or ‘magnitude’.
- in_directorystring
Directory path for input files
- out_directorystring
Directory path for output files
- mag_cutboolean
Determines whether or not to implement a magnitude cut on the galaxy survey. Default is True (remove all galaxies fainter than Mr = -20).
- rm_isolatedboolean
Determines whether or not to remove isolated galaxies (defined as those with the distance to their third nearest neighbor greater than the sum of the average third-nearest-neighbor distance and 1.5 times the standard deviation of the third-nearest-neighbor distances).
- dist_metricstring
Description of which distance metric to use. Options should include ‘comoving’ (default) and ‘redshift’.
- min_z, max_zfloat
Minimum and maximum redshift range for the survey mask. Default values are None (determined from galaxy extent).
- Omega_Mfloat
Value of the matter density of the given cosmology. Default is 0.3.
- hfloat
Value of the Hubble constant. Default is 1 (so all distances will be in units of h^-1).
- Returns
- galaxy_data_tableastropy table
Table of all galaxies in catalog.
- dist_limitsnumpy array of shape (2,)
Minimum and maximum distances to use for void search. Units are Mpc/h, in either comoving or redshift coordinates (depending on dist_metric).
- out1_filenamestring
File name of maximal sphere output file.
- out2_filenamestring
File name of all void holes
filter_galaxies
- vast.voidfinder.filter_galaxies(galaxy_table, survey_name, out_directory, mag_cut=True, dist_limits=None, rm_isolated=True, write_table=True, sep_neighbor=3, dist_metric='comoving', h=1.0, magnitude_limit=-20.09, verbose=0)[source]
A hodge podge of miscellaneous tasks which need to be done to format the data into something the main find_voids() function can use.
Optional magnitude cut
Convert from ra-dec-redshift space into xyz space
Calculate the hole search grid shape
Optional remove isolated galaxies by partitioning them into wall (non-isolated) and field (isolated) groups
Optionally write out the wall and field galaxies to disk
- Parameters
- galaxy_tableastropy.table of shape (N,?)
variable number of required columns. If doing magnitude cut, must include ‘rabsmag’ column. If distance metric is ‘comoving’, must include ‘Rgal’ column, otherwise must include ‘redshift’. Also must always include ‘ra’ and ‘dec’
- survey_namestr
Name of the galxy catalog, string value to prepend or append to output names
- out_directorystring
Directory path for output files
- mag_cutbool
whether or not to cut on magnitude, removing galaxies less than magnitude_limit
- dist_limitslist of length 2
[Minimum distance, maximum distance] of galaxy sample (in units of Mpc/h)
- magnitude_limitfloat
value at which to perform magnitude cut
- rm_isolatedbool
whether or not to perform Nth neighbor distance calculation, and use it to partition the input galaxies into wall and field galaxies
- write_tablebool
use astropy.table.Table.write to write out the wall and field galaxies to file
- sep_neighborint, positive
if rm_isolated_flag is true, find the Nth galaxy neighbors based on this value
- dist_metricstr
Distance metric to use in calculations. Options are ‘comoving’ (default; distance dependent on cosmology) and ‘redshift’ (distance independent of cosmology).
- hfloat
Fractional value of Hubble’s constant. Default value is 1 (where H0 = 100h).
- verboseint
values greater than zero indicate to print output
- Returns
- wall_gals_xyznumpy.ndarray of shape (K,3)
the galaxies which were designated not to be isolated
- field_gals_xyznumpy.ndarray of shape (L,3)
the galaxies designated as isolated
- hole_grid_shapetuple of 3 integers (i,j,k)
shape of the hole search grid
- coords_minnumpy.ndarray of shape (3,)
coordinates of the minimum of the survey used for converting from xyz space into ijk space
ra_dec_to_xyz
- vast.voidfinder.ra_dec_to_xyz(galaxy_table, distance_metric='comoving', h=1.0)[source]
Convert galaxy coordinates from ra-dec-redshift space into xyz space.
- Parameters
- galaxy_tableastropy.table of shape (N,?)
must contain columns ‘ra’ and ‘dec’ in degrees, and either ‘Rgal’ in who knows what unit if distance_metric is ‘comoving’ or ‘redshift’ for everything else
- distance_metricstr
Distance metric to use in calculations. Options are ‘comoving’ (default; distance dependent on cosmology) and ‘redshift’ (distance independent of cosmology).
- hfloat
Fractional value of Hubble’s constant. Default value is 1 (where H0 = 100h).
- Returns
- coords_xyznumpy.ndarray of shape (N,3)
values of the galaxies in xyz space
z_to_comoving_dist
- vast.voidfinder.distance.z_to_comoving_dist()
Convert redshift values into the comoving distance cosmology using the integral of the Robertson-Walker metric.
- Parameters
- z_inputnumpy.ndarray of shape (N,)
redshift values to compute distances for
- omega_Mfloat
Cosmological matter energy density
- hfloat
Hubble constant factor
- Returns
- output_comov_distsnumpy.ndarray of shape (N,)
the comoving distance values in units of Mpc/h
generate_mask
- vast.voidfinder.multizmask.generate_mask(gal_data, z_max, dist_metric='comoving', smooth_mask=True, min_maximal_radius=10.0, Omega_M=0.3, h=1.0)[source]
This function creates a grid of shape (N,M) where the N dimension represents increments of the ra space (0 to 360 degrees) and the M dimension represents increments in the dec space (0 to 180 degrees). The value of the mask is a boolean representing whether or not that (ra,dec) position is part of the survey, or outside the survey. For example, if mask[320,17] == True, that indicates the right ascension of 320 degrees and declination of 17 degrees is within the survey.
Note that this mask will be for the ra-dec (right ascension, declination) space of the survey, the radial min/max limits will be need to be checked separately.
- Parameters
- gal_dataastropy table
Table of all galaxies in sample Ra and Dec must be given in degrees Ra can be in either -180 to 180 or 0 to 360 format Dec must be in -90 to 90 format since the code below subtracts 90 degrees to go to 0 to 180 format
- z_maxfloat
Maximum redshift of the volume-limited catalog.
- dist_metricstring
Distance metric to use in calculations. Options are ‘comoving’ (default; distance dependent on cosmology) and ‘redshift’ (distance independent of cosmology).
- smooth_maskboolean
If smooth_mask is set to True (default), small holes in the mask (single cells without any galaxy in them that are surrounded by at least 3 cells which have galaxies in them) are unmasked.
- min_maximal_radiusfloat
Minimum radius of the maximal spheres. Default is 10 Mpc/h. The mask resolution depends on this value.
- Omega_Mfloat
Cosmological matter normalized energy density. Default is 0.3.
- hfloat
Fractional value of Hubble’s constant. Default value is 1 (where H0 = 100h).
- Returns
- masknumpy array of shape (N,M)
Boolean array of the entire sky, with points within the survey limits set to True. N represents the incremental RA; M represents the incremental dec.
- mask_resolutioninteger
Scale factor of coordinates in maskfile
find_voids
- vast.voidfinder.find_voids(galaxy_coords_xyz, survey_name, mask_type='ra_dec_z', mask=None, mask_resolution=None, dist_limits=None, xyz_limits=None, check_only_empty_cells=True, max_hole_mask_overlap=0.1, hole_grid_edge_length=5.0, grid_origin=None, min_maximal_radius=10.0, galaxy_map_grid_edge_length=None, pts_per_unit_volume=0.01, maximal_spheres_filename='maximal_spheres.txt', void_table_filename='voids_table.txt', potential_voids_filename='potential_voids_list.txt', num_cpus=None, save_after=None, use_start_checkpoint=False, batch_size=10000, verbose=0, print_after=5.0)[source]
Main entry point for VoidFinder.
Using the VoidFinder algorithm, this function grows a sphere in each empty grid cell of a grid imposed over the target galaxy distribution. It then combines these spheres into unique voids, identifying a maximal sphere for each void.
This algorithm at a high level uses 3 data to find voids in the large-scale structure of the universe:
The galaxy coordinates
A survey-limiting mask
A cubic-cell grid of potential void locations
Before running VoidFinder, a preprocessing stage of removing isolated galaxies from the target galaxy survey is performed. Currently this is done by removing galaxies whose distance to their 3rd nearest neighbor is greater than 1.5 times the standard deviation of 3rd nearest neighbor distance for the survey. This step should be performed prior to calling this function.
Next, VoidFinder will impose a grid of cubic cells over the remaining non-isolated, or “wall” galaxies. The cell size of this grid should be small enough to allow a thorough search, but is also the primary consumer of time in this algorithm.
At each grid cell, VoidFinder will evaluate whether that cubic cell is “empty” or “nonempty.” Empty cells contain no galaxies, non-empty cells contain at least 1 galaxy. This makes the removal of isolated galaxies in the preprocessing stage important.
VoidFinder will proceed to grow a sphere (or hole), at every Empty grid cell. These pre-void holes will be filtered such that the potential voids along the edge of the survey will be removed, since any void on the edge of the survey could potentially grow unbounded, and there may be galaxies not present which would have bounded the void. After the filtering, these pre-voids will be combined into the actual voids based on an analysis of their overlap.
This implementation uses a reference point, ‘coords_min’, from xyz space, and the ‘hole_grid_edge_length’ to convert between the x,y,z coordinates of a galaxy, and the i,j,k coordinates of a cell in the search grid such that:
ijk = ((xyz - coords_min)/hole_grid_edge_length).astype(integer)
During the sphere growth, VoidFinder also uses a secondary grid to help find the bounding galaxies for a sphere. This secondary grid facilitates nearest-neighbor and radius queries, and uses a coordinate space referred to in the code as pqr, which uses a similar transformation:
pqr = ((xyz - coords_min)/neighbor_grid_edge_length).astype(integer)
In VoidFinder terminology, a Void is a union of spheres, and a single sphere is just a hole. The Voids are found by taking the set of holes, and ordering them based on radius. Starting from the largest found hole, label it a maximal sphere, and continue to the next hole. If the next hole does not overlap with any of the previous maximal spheres by some factor, it is also considered a maximal sphere. This process is repeated until there are no more maximal spheres, and all other spheres are joined to the maximal spheres.
A note on the purpose of VoidFinder - VoidFinder is intended to find distinct, discrete void locations within the large scale structure of the universe. This is in contrast to finding the large scale void structure. VoidFinder answers the question “Where are the voids?” with a concrete “Here is a list of x,y,z coordinates”, but it does not answer the questions “What do the voids look like? How are they shaped? How much do they overlap?” These questions can be partially answered with additional analysis on the output of VoidFinder, but the main algorithm is intended to find discrete, disjoint x-y-z coordinates of the centers of void regions. If you wanted a local density estimate for a given galaxy, you could just use the distance to Nth nearest neighbor, for example.
To do this, VoidFinder makes the following assumptions:
A Void region can be approximated by a union of spheres. Note: the center of the maximal sphere in that void region will yield the x-y-z coordinate of that void region.
Void regions are distinct/discrete - we are not looking for huge tunneling structures throughout space, if does happen to be the structure of space (it basically does happen to be that way) we want the locations of the biggest rooms
- Parameters
- galaxy_coordsnumpy.ndarray of shape (num_galaxies, 3)
coordinates of the galaxies in the survey, units of Mpc/h (xyz space)
- survey_namestr
identifier for the survey running, may be prepended or appended to output filenames including the checkpoint filename
- mask_typestring, one of [‘ra_dec_z’, ‘xyz’, ‘periodic’]
Determines the mode of mask checking to use and which mask parameters to use.
- ‘ra_dec_z’ means the mask, mask_resolution, and dist_limits parameters
must be provided. The ‘mask’ represents an angular space in Right Ascension and Declination, the corresponding mask_resolution integer represents the scale needed to index into the Right Ascension and Declination of the mask, and the dist_limits represent the min and max redshift values (as radial distances in xyz space).
- ‘xyz’ means that the xyz_limits parameter must be provided which
directly encodes a bounding box for the survey in xyz space
- ‘periodic’ means that the xyz_limits parameter must be provided, which
directly encodes a bounding box representing the periodic boundary of the survey, and the survey will be treated as if its bounding box were tiled to infinity in all directions. Spheres will still only be grown starting from within the original bounding box.
- masknumpy.ndarray of shape (N,M) type bool
Represents the survey footprint in scaled ra/dec space. Value of True indicates that a location is within the survey (ra/dec space)
- mask_resolutioninteger
Scale factor of coordinates needed to index mask
- dist_limitsnumpy array of shape (2,)
[min_dist, max_dist] in units of Mpc/h (xyz space)
- xyz_limitsnumpy array of shape (2,3)
- format [x_min, y_min, z_min]
[x_max, y_max, z_max]
to be used for checking against the mask when mask_type == ‘xyz’ or for periodic conditions when mask_type == ‘periodic’
- hole_grid_edge_lengthfloat
Size in Mpc/h of the edge of 1 cube in the search grid, or distance between 2 grid cells (xyz space)
- grid_originndarray of shape (3,) or None
The spatial location to use as (0,0,0) in the search grid. if None, will use the numpy.min() function on the provided galaxies as the grid origin
- min_maximal_radiusfloat
The minimum radius in units of distance for a hole to be considered for maximal status. Default value is 10 Mpc/h.
- max_hole_mask_overlapfloat in range (0, 0.5)
When the volume of a hole overlaps the mask by this fraction, discard that hole. Maximum value of 0.5 because a value of 0.5 means that the hole center will be outside the mask, but more importantly because the numpy.roots() function used below won’t return a valid polynomial root.
- galaxy_map_grid_edge_lengthfloat or None
Edge length in Mpc/h for the secondary grid for finding nearest neighbor galaxies. If None, will default to 3*hole_grid_edge_length (which results in a cell volume of 3^3 = 27 times larger cube volume). This parameter yields a tradeoff between number of galaxies in a cell, and number of cells to search when growing a sphere. Too large and many redundant galaxies may be searched, too small and too many cells will need to be searched. (xyz space)
- hole_center_iter_distfloat
Distance to move the sphere center each iteration while growing a void sphere in units of Mpc/h (xyz space)
- pts_per_unit_volumefloat
Number of points per unit volume that are distributed within the holes to calculate the fraction of the hole’s volume that falls outside the survey bounds. Default is 0.01.
- maximal_spheres_filenamestr
Location to save maximal spheres file
- void_table_filenamestr
Location to save void table to
- potential_voids_filenamestr
Location to save potential voids file to
- num_cpusint or None
Number of cpus to use while running the main algorithm. None will result in using number of physical cores on the machine. Some speedup benefit may be obtained from using additional logical cores via Intel Hyperthreading but with diminishing returns. This can safely be set above the number of physical cores without issue if desired.
- save_afterint or None
Save a VoidFinderCheckpoint.h5 file after approximately every save_after cells have been processed. This will over-write this checkpoint file every save_after cells, NOT append to it. Also, saving the checkpoint file forces the worker processes to pause and synchronize with the master process to ensure the correct values get written, so choose a good balance between saving too often and not often enough if using this parameter. Note that it is an approximate value because it depends on the number of worker processes and the provided batch_size value, if your batch size is 10,000 and your save_after is 1,000,000 you might actually get a checkpoint at say 1,030,000. If None, disables saving the checkpoint file.
- check_only_empty_cellsbool
Whether or not to start growing a hole in a cell which has galaxies in it, aka “non-empty”. If True (default), don’t grow holes in these cells.
- use_start_checkpointbool
Whether to attempt looking for a VoidFinderCheckpoint.h5 file which can be used to restart the VF run. If False, VoidFinder will start fresh from 0.
- batch_sizeint
Number of potential void cells to evaluate at a time. Lower values may be a bit slower as it involves some memory allocation overhead, and values which are too high may cause the status update printing to take more than print_after seconds. Default value 10,000
- verboseint or bool
Level of verbosity to print during running, 0 indicates off, 1 indicates to print after every ‘print_after’ cells have been processed, and 2 indicates to print all debugging statements
- print_afterfloat
Number of seconds to wait before printing a status update
- Returns
- All output is currently written to disk:
- potential voids table, ascii.commented_header format
- combined voids table, ascii.commented_header format
- maximal spheres table