The Building Blocks of Analysis: Foundational Variables for Understanding the Characteristics of GEDI Data
In this section:
- Definitions and considerations for key GEDI waveform characteristics and algorithmic interpretation that may influence GEDI metrics.
- A reference table for important concepts and variables when interpreting characteristics of GEDI data over your study area and time period.
- Additional details of select variables for how they function, are formatted, and used.
- Summary of key considerations when exploring the implications of the parameters on the GEDI data over your study area and time period.
There are several parameters which influence waveform interpretation algorithms and both the calculation and quality of the resulting metrics in each GEDI data product. This section will introduce such parameters and provide corresponding considerations for choosing quality data summarized from GEDI’s Algorithm Theoretical Based Documents, User Guides, and related literature.
The suggested considerations and processing techniques for each variable outline in this section act as a “recipe” or starting point for exploring and evaluating the performance of GEDI over a particular landscape.
This training will not go into further detail on how the transmitted and received waveforms are processed and interpreted to generate each GEDI level data product or how users can investigate the waveform further themselves, this information can be found in the ATBDs.
We will largely ignore many of the datasets found in the product files from L1B, and L2A:
- /geolocation
- /ancillary
- /land_cover_data
- /rx_1gaussfit
- /rx_1gaussfit/ancillary
- /rx_assess
- /rx_assess/ancillary
- /rx_processing_aN
- rx_processing_aN/ancillary
L2B:
- /ancillary
- /geolocation
- /land_cover_data
- /rx_processing
This training additionally omits detailing many of the biomass estimation model datasets found in L4A:
- /geolocation
- /agbd_prediction
- /land_cover_data
- /ANCILLARY/model_data
- /ANCILLARY/pft_lut
- /ANCILLARY/region_lut
We ignore these variables in this training to focus on some of the most relevant information for an applied researcher or applied user looking to evaluate the already most updated and available data. This training is not aiming to teach you how to re-calculate or improve upon the resulting metrics, but rather, how to understand the basis of published metrics, and founded considerations for further handling quality of the given dataset over particular regions, without having to re-establish the results from the raw or geolocated waveforms.
A note on language and GEDI’s observations and data samples:
The ‘waveform’ (data collected in its waveform format) is the actual captured data ‘sample’ (25m diameter sample, not an image), or resulting ‘observation’ (referring to the data collected) for each ‘shot’ (location and collected data from the transmitted and received laser energy) corresponding to each GEDI ‘footprint’ (the location where the shot/observation/sample took place).
GEDI’s full-waveform lidar captures vertical structure
The GEDI waveform is the digitally recorded return laser pulse. Fundamental to a lidar (aka. LiDAR - Light Detection And Ranging) is measuring the time it takes for the laser (light) to make a round trip from the sensor, reflecting off the surface target, and back. The time interval between the transmitted and returned energy is a waveform which retains all the detail of the reflected laser pulse, thereby the total time is used to calculate distances. The raw waveforms undergo several processing steps to take systematic and atmospheric conditions into account before analyzing the waveform characteristics, energy peaks, and their associated times in order to resolve the distance of the surface features they may represent. GEDI’s “footprint” observations–the received waveforms as a function of receive time–are then combined with the “instrument pointing, position, and observed range to compute the planet referenced location of the first and last samples of each laser receive waveform” (Beck et al., 2020). This process geolocates the waveform on Earth (latitude, longitude, height above the reference ellipsoid) where the waveform becomes the georeferenced 3-dimensional surface.
Full waveform lidar is advantageous because it offers more detail than discrete return lidar from a single laser pulse. The outgoing and received waveform shapes are processed to interpret the vertical distribution of the surfaces captured. Each waveform undergoes a rigorous analysis for gaussian fitting and peak detection, among other processes. The waveform captures the continuous profile of the surface, meaning multiple returns (areas where the laser pulse reflects off objects and back to the sensor) within the 25m diameter footprint are represented in the waveform’s shape (Hofton et al., 2019).
Source: (Hofton et al., 2019)
Detected “peaks” will correspond to the time of return (used to calculate distance) for that intercepted feature. Waveform shapes are variable, they could be simple (single-mode) or complex (multi-modal). The modes captured in the waveform equate to a significant distribution of energy possibly representing a distinct surface feature. For each mode, the amplitude/intensity corresponds to the surface reflectivity which is determined by how that surface responds to the given wavelength (1064nm, near-infrared for GEDI). The signal width as a measure of time can provide information about the volume of scattering of the light on the target surface, or roughness. For example, the lowest mode (ground) would have a peak at the strongest reflection from the ground. A mode representing a dense canopy, could have multiple peaks over its distribution, but similarly have a peak at the height where most of the energy was reflected. Returns over the ocean or bare-ground usually result in simple waveforms, while vegetation or variable terrain can be more complex. Multiple algorithms that try to interpret the first and last modes associated with the highest and lowest reflecting surfaces, as well as the intermediate portions of the waveform, are then used to calculate GEDI’s elevation, relative height, and vegetation structure metrics (Hofton et al., 2019; Blair et al., 2021). The waveform undergoes an initial geolocation process in the L1 products associated with the lidar distances from the waveform to the geodetic reference. An additional geolocation step occurs after the waveform has been fully processed to locate the waveform and its interpreted modes relative to the estimated ground elevation.
The overarching background of waveform processing provided here helps establish a general reference for the characteristics of certain variables used to interpret surface features from the waveform. Understanding the expected characteristics and behavior of the waveform can also support investigation of the performance of higher level products (metrics and derivations), since they are all foundationally based on the waveform. Some applied researchers have found that using the original geolocated waveform product in Machine Learning such as Deep Learning models have shown improved results (Lahssini et al., 2022; Lang et al., 2022). Additionally, numerous publications have tested the efficacy of GEDI over non-vegetated surfaces though the mission design, waveform processing, and data products published are optimized for ecosystem applications (Barbosa et al., 2022; Fayad et al., 2020; Zhongmin, Ma et al., 2024; Ma et al., 2024). Applying GEDI for highly localized ecosystem dynamics (specific vegetated study areas) or using GEDI for other land cover types will likely refer to the variable selection and filtering modalities in relation to direct application.
| Knowledge Check #3 |
|---|
| In a few sentences, highlight the overarching workflow for processing a received GEDI waveform. |
Reference table for important concepts and variables to consider when exploring and preparing data for analysis:
| Variable | Definition | How It Works | Common considerations | How It’s Formatted |
|---|---|---|---|---|
| Waveform | The GEDI waveform is the returned energy recorded as a function of receive time. The ‘waveform’ is the actual captured data, or resulting ‘observation’ for each ‘shot’ corresponding to each GEDI ‘footprint.’ | The active sensor transmits 1064nm near-infrared laser pulse energy from its position on the International Space Station. The energy is reflected and scattered across surface features within the laser footprint as the light pulse travels from the top of the surface feature to the ground. The round-trip time it takes for the energy to reflect off the surface features is converted into distance (range) once properly cleaned and geolocated. | The GEDI lidar system design and processing algorithms are optimized for detecting vegetation vertical structure. The shape of the full waveform is processed with several algorithms to interpret potential distinct layers of the vertical profile. Interpreting full waveforms is challenging compared to other types of lidar due to its ability to capture continuous complex information for the entire pulse, rather than only capturing information about the surface interaction at a maximum point of energy returned. This complexity makes waveform processing a challenge, especially where valid signals are weak, indistinguishable from background noise, or influenced by certain systematic or environmental conditions. Active areas of research continue to improve GEDI’s algorithms for resolving details over various surface features. | The L1B product houses the processed and geolocated waveforms as ‘rx_waveform’. This is the measured energy for the waveform amplitude as digital numbers (DN) for the relative energy received over a range of height bins. The amplitude values are associated to fixed-size “bins” of time that correspond to the vertical distance/heights. ‘Tx_’ refers to the energy transmitted from GEDI, while ‘Rx_’ refers to the energy received or returned to GEDI. |
| Shot number | Unique identifier for each shot/collected waveform at each footprint, with associated orbit and beam information. | Each shot receives a number and is available within the associated “Group” of the HDF5 footprint level data product. | Be sure to retain the ‘shot_number’ when subsetting the footprint level datasets in the case you wish to isolate an individual observation, and/or explore relationships between variables from different GEDI products. | ‘shot_number’OOO: OOBBFFFNNNNNNNN where: OOOOO: Orbit number BB: Beam number FFF: Minor frame number (0-241) NNNNNNNN: Shot number within orbit (Level 1 User guide). |
| 📌Pro tip: strictly enforce Uint64 datatype for `shot_number` or cast them as strings when converting data formats. | ||||
| Beam Type | Refers to two possible beam types built into the sensor. Each type is categorized by the amount of laser beam emitted energy per pulse at mJ/pulse. Higher or full energy (15 mJ/pulse) of the laser generates power beams. Lower energy (4.5 mJ/pulse) emitted by the laser generates coverage beams. | Power beams: Two independent lasers “dither” (aka shift position back and forth) to produce two beams each, creating four “full power” beam ground tracks. Coverage beams: One independent laser is designed to “split” into two coverage beams, reducing the emitted energy per beam. Each of the two beams is also dithered, creating four “coverage” beam ground tracks. This strategy increases spatial coverage, with tradeoffs to resolution, especially over particular features. The determined laser energy output is a part of the mission hardware design. | Higher energy lasers tend to have a stronger ability to detect the ground and penetrate denser canopies and greater canopy cover. “GEDI was designed to achieve ground detection through 98% canopy cover for strong beams and 95% for coverage beams.” (Dubayah et al., 2020). While the coverage beams were designed to penetrate up to 95% cover, this is only expected under optimal conditions. Users are encouraged to use the power beams in denser vegetated regions. It is recommended to evaluate the data beam type differences over their respective decision needs, and surface reflectance characteristics expected for their study area, and time period against other quality metrics and influences in case all the data is indeed valuable. | Within each product level file, each beam is the top hierarchy as its own Group: BEAMXXXX. Naming convention is as such: Power: BEAM0101, BEAM0110, BEAM1000, BEAM1011 Coverage: BEAM0000, BEAM0001, BEAM0010, BEAM0011 Each of the eight BEAMXXXX groups house their respective datasets/variables, which are replicated across each beam. For example, if you want to select the data for plant area index from all beams, you would have to select the ‘pai’ variable eight times within the file, following the individual paths, one for each beam: L2B HDF5 file Path→BEAM0001 →’pai’ Path→BEAM0011 →’pai’, etc. |
| Solar elevation | This parameter indicates the sun’s position at the time of observation. | The system records the solar elevation at the time of observation. | It is recommended to use nighttime observations where possible to reduce the impact of background noise from the sun (Beck et al., 2021). Some studies show conflicting evidence of significant impact on time of day (Li et al., 2023; Fayad et al., 2022), therefore suggesting users should evaluate the impacts of time of day on results over the study area. Removing daytime shots tends to greatly reduce the size of the dataset. | L1B variable for ‘solar_elevation’ < 0 indicates data collected at night, while > 0 is for data collected during the day. |
| Algorithm Setting Group | The receive waveform interpretation processing workflow includes a step for applying several waveform interpretation algorithms designed to optimize waveform interpretation over different observing conditions and surface features. The results are compared and a final “default” ASG is chosen for each footprint. | Algorithm Setting Groups 1-6 and 10 hold unique combinations of parameters for waveform smoothing width (noise and signal), the waveform signal start and end thresholds, and whether mode filtering was applied. Under each ASG, latitude, longitude, elevation, highest return, detected modes, sensitivity, quality flag, and the metrics themselves are calculated. | Algorithms with lower signal thresholds improve ground detection by allowing for weaker signals within the waveform interpretation process-this could be beneficial in dense tropical forests. While a low threshold might misinterpret noise as the ground in less dense vegetation covers. The ASGs offer several specified interpretations of waveforms given known relationships of those signal returns over particular vegetation types that the user can access and compare across results to optimize GEDI over localized contexts. | ‘Selected_algorithm’ from L2A and up identifies the algorithm used to produce the resulting derived metrics for each shot. Any variable with the suffix ‘_aN’ is identifying the specific algorithm ‘aN’ with its respective calculations of that metric or parameter according to the algorithmic results. |
| Sensitivity | Refers to the “maximum canopy cover that can be penetrated considering the Signal to Noise Ratio (SNR) of the waveform” (Hofton & 9 Blair, 2020). Sensitivity is a probability metric “offering insight into the percentage of canopy cover through which we expect to be able to detect the ground 90% of the time” (Duncanson et al., 2020). | The sensitivity metric is computed for each waveform. This signal detection performance evaluation measures the probability for that waveform being able to detect the ground. The result is an estimate of the relative minimum percentage of the waveform return that needs to be present within the portion of the waveform representing the ground return for the ground to be detected. Waveform sensitivity is calculated by simulating the minimum detectable ground return pulse energy for a given detection algorithm (algorithm setting group). The area of that minimum detectable ground return is then divided by the total return waveform area. | Beam sensitivity is directly influenced by hardware design–the laser pulse energy/beam type, as well as factors during the time of acquisition such as atmospheric conditions, background solar illumination (day vs. night), and surface reflectance characteristics for the given study area and time period (topography, canopy cover). GEDI’s waveforms exhibit variable signal strengths, therefore resulting in waveforms with variable ability to penetrate through canopies. Given these influences and their potential confounding behavior, sensitivity can be considered a fundamental indicator of GEDI quality, particularly over areas highly influenced by these factors. | The ‘sensitivity’ variable from Level 2 data and up. Alternative calculations of sensitivity for each ASG are found as ‘sensitivity_aN’. Sensitivity will have a range of values, but for the waveform interpretation algorithm to run, sensitivity will need to be >0 and <1. |
| Quality flag | A binary flag for users to remove or lower outlier or low quality returns. | A quality_flag value of 1 indicates the laser shot meets criteria. | Each level product deploys its own quality flag or inherits quality checks from previous data products based on requirements for energy, sensitivity, amplitude, and real-time surface tracking quality, and/or land cover masks. | L2A ‘quality_flag’ is referred to as ‘l2a_quality_flag’ in higher products. L2B ‘l2b_quality_flag’ L4A ‘l4_quality_flag’ Alternative calculations of the quality flag for each ASG are found with ‘_aN’. |
| Elevation highest return | The estimated elevation of the highest detected return of the receive waveform. | The geolocated waveforms, signals and noise are identified in the waveform. The highest and lowest detected returns become the boundaries where modes and other waveform characteristics between these points are processed in order to later calculate higher level metrics like canopy height. | This metric is used to represent the canopy top position determined during waveform processing and interpretation. The ASGs can contribute to different interpretations of the highest return therefore impacting the location attributed to that return which could misinterpret the associated canopy height. The accuracy of the elevation highest return can also be impacted by conditions affecting waveform characteristics like low-lying clouds being mistaken for canopy, errors in ground detection, or challenging interpretation over complex spatial variation in canopy heights. | ‘Elev_highestreturn’ from L2A. Alternative calculations of this variable for each ASG are found as ‘elev_highestreturn_aN’. |
| Elevation lowest mode | The estimated elevation of the center of the lowest identified mode in the received waveform. This mode represents ground elevation. | Elevation lowest mode is the center of the mode the processing algorithm identifies with peak energy distribution over the ground between the highest and lowest returns. | This metric is used to determine the ground elevation. The ASGs can contribute to different interpretations of the lowest mode possibly leading to inaccurate identification of the ground elevation and associated canopy height or higher derived metrics. The accuracy of the elevation highest return can also be impacted by conditions affecting waveform characteristics like cloud interference, mixed or weak ground returns to other signals like low canopies. | ‘Elev_lowestmode’ from L2A. Alternative calculations of this variable for each ASG are found as ‘elev_lowestmode_aN’. |
| Latitude and longitude of the lowest mode | The final latitude and longitude located for the lowest mode of the waveform. | Once the lowest mode has been interpreted, its latitude and longitude are linearly interpolated between the L1B geolocated waveform’s first and last sample bins (ranging points). | The geolocation accuracy was originally aimed to be within 10m. Initial versions determined horizontal error up to 23.8m. This has been improved to 10.2m. Geolocation can be adjusted by matching the waveforms to simulated GEDI waveforms which require airborne lidar, or referencing the elevation lowest mode to high-quality DEMs, which is accomplished during GEDI’s calibration/validation phase. However, if no airborne lidar or reliable DEMs overlap with the data, alternative measures for assessing and correcting for these areas may need to be deployed by the user. | ‘Lat_lowestmode’ and ‘lon_lowestmode’ from L2A. Alternative calculations of this variable for each ASG are found as ‘lat_lowestmode_aN’, and ‘lon_lowestmode_aN’ . |
| Number of detected modes | Quantifies the number of distinct surfaces reflected within the waveform. | How the waveform was collected, smoothed/denoised, interpreted by each ASG and mode filtering established are all factors determining the number of modes detected in the waveform processing. | Multi-modal waveforms are a strong indication of complex surface characteristics such as what is found in vegetation or irregular topography. Identifying modes along different height ranges can provide intricate information about canopy strata and complexity (Dwiputra et al., 2023). Mode detection can be influenced by the strength of the SNR at both ground and canopy (associated with power and coverage beam tendencies). Cloudy conditions and solar elevation can also impact the number of modes captured. Dense canopies and high slopes can potentially overcomplicate the waveform’s shape and confuse distinguishing characteristics of the modes across the vertical profile. Consider scenarios where canopy and ground modes might be conflated (short vegetation) and how these scenarios might influence the derived metrics. For example, it might be beneficial to remove waveforms with more than two modes where uni-modal returns are expected, such as over water features. | ‘Num_detectedmodes’. Alternative calculations of this variable for each ASG are found as ‘num_detectedmodes_aN’. Using num_detectedmodes = 0 or exploring the results from different ASGs may help reduce inclusion of noisy or “empty” waveforms. |
| Degrade flag | The degraded flag marks observations where the horizontal geolocation accuracy may be impacted by degradation of the pointing or positioning information. | The GEDI Mission team includes updated tracking in the degrade flag to expand on degrade status conditions due to specific systematic or observation conditions. | Degraded conditions can lead to excessive horizontal geolocation accuracy. Some studies have found a more detailed selection for removing observations over specific degraded conditions that were particularly more impactful than others captured in the broad filter (>0). This helped combat | ‘Degrade_flag’ = 0 indicates the shot is not degraded by sensor or systematic conditions. Shots with non-zero flags include a range of specific tracking attributes. Non-zero tens digit indicate degraded attitude, and non-zero units digits indicate degraded trajectory. |
| Local beam elevation | This is the GEDI viewing angle (θ) at which the GEDI laser pulse is received relative to nadir. | The angle is from the local horizontal plane upwards toward GEDI from the ground bounce point (ATBD L2B). For GEDI, the zenith-viewing angle (off-nadir angle) is kept at 6 degrees or less for accurate ranging. As a result, the local beam elevation typically lies between 84-90 degrees. | The GEDI view angle may consistently be near-nadir for accurate measurements, and therefore is expected to have little to no impact on accuracy of the datasets. Some studies have found, though, that filtering datasets for local_beam_elevation >= 1.5 radians (keeping high angles) can be beneficial in scenarios where ground elevation accuracy may be impacted (Fayad et al., 2020; Tommasso et al., 2023). Examples include areas of highly variable topography that have been found to influence accuracy of derived metrics. Another example found that uncertainty of water returns were increased by higher viewing angles, and some of the inaccuracies prevailed differently between power and coverage beams (Fayad et al., 2020). | ‘Local_beam_elevation’ from L2B. The viewing angle can differ from one beam to another on the given date. Addressing the potential influence local beam elevation may have over particular surface features can include filtering the dataset to include high angles only, or instead, require applying elevation and slope filtering strategies, depending on the characteristics of the land cover. |
| Leaf Status Flag | This is a flag external to GEDI measurements based on ancillary data indicating whether observations were collected over phenological periods classified as leaf-on, leaf-off, or transitional stages. | The classification is based on relative greenness derived from normalized NDVI values, and Julian days (for deciduous forests). | Seasonal filtering is recommended for relevant vegetation covers. The presence of leaves inherently affects the lidar signal characteristics and ability required to calculate metrics capturing canopy complexity (canopy cover, height, plant area index, plant area volume density, foliage height diversity, biomass, structural complexity index) possibly leading to systematic underestimation of these metrics during leaf-off conditions. Non-deciduous strata such as evergreens, woodlands, grasslands, shrublands, etc. by contrast do not exhibit the same limitations across seasons when considering leaf presence. | ‘land_cover_data/leaf_off_flag’ = 0 indicates leaf-on, while = 1 indicates leaf-off status. |
Table References:
Beck, J., Wirt, B., Armston, J., Hofton, M., Lutchke, S., Tang, H. (2021). GLOBAL Ecosystem Dynamics Investigation (GEDI) Level 2 User Guide (Version 2.0). LP DAAC. LP DAAC.
Kellner, J. R., Armston, J., & Duncanson, L. (2023). Algorithm theoretical basis document for GEDI footprint aboveground biomass density (AGBD) [Article]. Earth and Space Science, 10, e2022EA002516. https://doi.org/10.1029/2022EA002516
Hofton, M. (2019). Algorithm Theoretical Basis Document for GEDI waveform processing: Transmit and receive waveform interpretation for L1A and L2A products (Version 1.0). USGS / LP DAAC. LP DAAC
Luthcke, S. B. (2019). Algorithm Theoretical Basis Document for GEDI waveform geolocation: L1B geolocation of waveforms (Version 1.0). USGS / LP DAAC. LP DAAC.
More Background on Selected Variables
Beam Type: Power and Coverage
The GEDI instrument consists of three lasers at 1064 nm, two of which operate at full power, referred to as power beams, while the third laser is split into two coverage beams with reduced power. Each of these four beams is then dithered (shifting the position of the sensor, in this case the laser beams) every other shot across the ground track to produce the full complement of 8 ground tracks. The power lasers are dithered to produce two ground transects each. GEDI’s ground sampling pattern captures distinct lidar samples rather than continuous imaging or mapping across the swath. Each shot generates a footprint where the center of each footprint along a singular track for that beam are about 60 meters apart. The ground tracks themselves are spaced 600 meters apart across the flight direction within a ~4.2 km swath (Dubayah et al., 2020).
Source: The GEDI Mission.
Accounting for possible differences between the power and coverage beams
The quality or accuracy of the data may potentially be influenced by beam type and the GEDI Mission team explicitly recommends selecting power beams over coverage when possible. Some studies have demonstrated there may be no significant difference between the two, depending on the metric used or localized context.
Penetrative ability and ground detection:
Due to its higher energy output, power beams have a higher likelihood of penetrating through denser vegetation cover and dense canopies to detect the ground. GEDI was designed to achieve ground detection through 98% canopy cover for strong beams and 95% for coverage beams (Dubayah et al., 2020). Therefore, power beams tend to be associated with SNRsleading to better interpretation of the waveforms. Typically, with greater ability to detect the ground surface, canopy height or vegetation structure metrics (and therefore AGBD) may exhibit improved accuracies (Lahssini et al., 2022).
Susceptibility to solar noise:
Full-power lasers are less affected by background solar illumination/noise during daytime acquisitions, which helps maintain consistent canopy top detection. This suggests careful consideration of the time of acquisition especially when working with coverage beams since coverage beams can exhibit lower SNRs exacerbated by increased solar background noise. (Dubayah et al., 2020; Lahssini et al., 2022).
Influence over certain metrics:
When estimating canopy height, plant area index, and above ground biomass, especially in higher canopy cover or tropical forests, power beams outperformed coverage beams (Lahssini et al., 2022; Fayad et al., 2022; Hoffren et al., 2023; Jia et al., 2025; McClure et al., 2024; Wang et al., 2023). GEDI’s current biomass models generally do not use coverage beam collecting data during the day for areas with over 70% canopy cover where canopy height uncertainties may largely contribute to biomass estimation errors (Lang et al., 2022; Duncanson et al, 2020). Several of GEDI’s higher level derived products may already incorporate beam type and sensitivity selections to optimize results.
The benefit of using coverage beams:
Using both beams can maximize sampling density. Due to a culmination of factors, it is recommended to use night time observations and power beams since there tend to be a greater number of good quality footprints collected under these conditions (Blair et al., 2021). However, this trend may vary over particular regions.
Conflicting evidence?
Proper exploration and quality filtering between beam types will allow the user to assess potential increases to good data available based on beam type selection. Several studies found conflicting evidence concerning major differences between beam types. One study found differences in performance between specific beams irrespective of its power output over water surfaces (Fayad et al., 2020). Another study over African savannas, compared beam types and how each performed over day and night conditions, where little to no differences were largely attributed to canopy cover seldom exceeding 70% (Li et al., 2023). Another study after evaluation, decided to use all available beams when classifying fuel types (Hoffren et al., 2023). Additionally, the effects of high slopes or overall degraded samples may be more responsible for errors in ground estimates compared to the expected differences between beam types over these same areas (Urbazaev et al., 2022). Further, forest types may play a larger role in inherent differences as found when comparing the beams in flat areas (Wan et al., 2022). Lastly, some deep learning approaches were able to ingest potential variances between the power and coverage beams for estimating canopy height as overall performances between modeling based on beam type was not affected (Lang et al., 2022).
Solar Elevation - Time of Day
The effect of the time of day data was acquired is variable across beam types with power observations generally more accurate than coverage, though nighttime observations overall do yield better accuracy than daytime for both beam types. During the day, solar background noise in the near-infrared can decrease the SNR in GEDI observations, making it difficult to distinguish valid surface signals from ambient sunlight. Weaker SNRs in daytime conditions can impact the ability to penetrate through denser forest to the ground.
According to Duncanson et al., 2020:
“power beams are expected to return a reliable ground signal under 99.5% canopy cover at night and 94% during the day. With maintained design margins, these figures can improve to 99.75% at night and 97% during the day” and,
“coverage beams, due to their lower power, are expected to penetrate 96% canopy cover at night and 92% during the day. With maintained margins, this increases to 98% and 96% respectively.”
The biomass estimation products remove many of the daytime acquisitions. In other cases, there may be conflicting evidence of acquisition time alone being a significant factor for impacting bias of canopy height estimations over various vegetation types (Li et al., 2023; Fayad et al., 2022).
| Knowledge Check #4 |
|---|
| Brainstorm methods for how you might determine whether solar elevation impacts the performance of GEDI metrics over your study area. |
Algorithm Setting Groups
After initial analysis of the received waveform, the GEDI mission has an outlined process for waveform interpretation involving smoothing the waveform, applying algorithms for valid signal and mode detection, calculative waveform sensitivity, and ranging the highest and lowest detected returns. The processing workflow includes several waveform interpretation algorithms designed to optimize extraction of information from waveforms impacted by different observing conditions.
For example, various conditions could include:
- Daytime versus nighttime observations
- The lowest selected mode was actually noise, generating an elevation lowest mode below the ground
- The lowest selected mode falls above the actual ground because there was weak energy returned from the ground
- The highest detected return is below the top of canopy
- The highest detected return is above the top of the canopy (possibly due to interference from clouds or the like).
Each of these conditions may be corrected using a different setting in the interpretation algorithm where the threshold for signal search start or end points thresholds can be increased or decreased, or additional mode filtering can be applied. Both smoothing and thresholding relate to distinguishing between signal and noise in various scenarios. Noise is smoothed to identify valid signals which can be difficult in situations where the background is noisy or when the signal is weak and conflated with noise. The thresholds handle false alarms of signal detection above or below the canopy or ground leading to bias in the ground or height estimates. Setting the signal search window has to balance sensitivity to falsely interpreting noise as signal, or excluding valid signal to derive metrics from.
Table from (Blair et al., 2021). The algorithm access is categorized by setting groups as a1, a2, etc. Algorithm setting group of 10 indicates 5 has been used but a higher mode was actually used to calculate elevation, height, and other metrics.
| Algorithm | Rx_processing Subgroup | Smooth width | Smoothwidth_zcross | Font_ | |
|---|---|---|---|---|---|
| threshold | Back_ | ||||
| threshold | |||||
| 1 | a1 | 6.5 | 6.5 | 3𝛔 | 6𝛔 |
| 2 | a2 | 6.5 | 3.5 | 3𝛔 | 3𝛔 |
| 3 | a3 | 6.5 | 3.5 | 3𝛔 | 6𝛔 |
| 4 | a4 | 6.5 | 6.5 | 6𝛔 | 6𝛔 |
| 5 | a5 | 6.5 | 3.5 | 3𝛔 | 2𝛔 |
| 6 | a6 | 6.5 | 3.5 | 3𝛔 | 4𝛔 |
When each algorithm is applied, the datasets for the resulting latitude, longitude, elevation, highest return, detected modes, sensitivity, quality flags, and product level metrics (L2A: RH, L2B: pgap_theta, L4A: AGBD, AGBD prediction intervals, standard error, and AGBD predictions in the transform space) are calculated for each algorithm (aN) and stored. The final selected results are stored in the root BEAMXXX group under the variable ‘selected_algorithm’.
Updated data product versions optimally select the most appropriate SG for individual laser shots based on plant functional type, geographic region, and laser return energy. Some SGs are better suited for dense vegetation (e.g., SG 5) or specific canopy structures where A1-A3 tend to perform better for estimating canopy height or A2-A6 for estimating canopy cover (Li et al., 2023; Wang et al, 2025). Users are encouraged to check all alternative parameters that result in conditions like high canopy cover where the selected algorithm could be improved.
| Knowledge Check #5 |
|---|
| Think of a study area (vegetated or non-vegetated) and compare and contrast the potential expected results between each of the algorithm setting groups. |
Beam Sensitivity
Given that GEDI return signals can vary greatly, beam sensitivity is a core calculation as a part of the waveform interpretation process helping to determine the effective performance of each waveform. Beam sensitivity is calculated to estimate “the percentage of canopy cover through which we expect to be able to detect the ground 90% of the time.” (Duncanson et al., 2020). The calculation determines the maximum canopy cover that can be penetrated given the relative minimum percentage of the waveform returned for the given SNR (Hofton & Blair, 2020). Higher sensitivity (closer to 1) indicates a higher SNR. A low SNR can make the ground mode of the waveform hard to identify, directly impacting the accurate estimation of the derived metrics for elevation, relative heights, canopy cover, plant area index, foliage height diversity, and plant area volume density, and the products derived from these metrics. GEDI was designed to detect ground signals through 98% canopy cover for strong (power) beams and 95% for coverage beams (Dubayah et al., 2020).
The quality flag for each waveform created for convenient general pre-processing, incorporates a sensitivity threshold for shots > 0.9 for the L2A quality flag for example, meaning it keeps observations most expected to penetrate areas of 90% canopy cover or less. Quality flags for higher level products apply higher thresholds found to improve the calculation of those product metrics. Even when choosing to use the quality flag for the respective product, it is recommended to consider applying additional thresholds to what is predetermined in the quality flag. This recommendation especially applies for study areas in dense forests, with steep slopes, or in cloudy regions where sensitivity is more than likely impacted.
Here are some considerable characteristics that may influence the detection energy and SNR interpretation used to estimate a waveform’s sensitivity:
- Beam type:
- Power beams generally exhibit higher sensitivity than coverage beams.
- Coverage beams generally exhibit lower SNRs which can make it challenging to detect weak returns.
- Sun elevation: similar to coverage beams, daytime acquisitions have lower signal-to-noise ratios due to higher background noise from solar energy compared to lower background noise at night.
- Sensitivity thresholds can have variable reliability depending on the metric. Forest types and densities not only impact the probability of the energy to reach the ground, but create waveforms of varying complexity requiring specified interpretation between valid signal and background noise.
- One study reviewed PAI and PAVD found that sensitivities <0.9 or 0.95 are associated with less reliable data, particularly for coverage beams and daytime observations (Xi et al., 2022).
- The footprint and gridded biomass products highly depend on beam sensitivity for quality flagging given the biomass models were designed for forest and plant functional types whose characteristics are found to be related to GEDI’s sensitivity parameters. (Kellner et al., 2022). For tropical Evergreen Broadleaf Tree prediction strata, a beam sensitivity >0.98 is typically required, while >0.95 is used elsewhere (Dubyah et al., 2022) (Kellner et al., 2023). More stringent sensitivity thresholds (e.g., >0.98 in EBT forests of Africa, South America, and South Asia) further reduce the retained coverage observations (e.g., to 10.4%) (Kellner et al., 2023).
- Topography and slopes have been found to influence ground detection and waveform interpretation (e.g. steeper higher slopes may impact geolocation accuracies or create complex waveforms conflating terrain with vegetation heights) (Hancock et al., 2012).
While selecting higher sensitivity shots is a common recommendation, especially over dense canopies and for daytime observations, some studies found filtering higher sensitivity led to greater residual error (Urbazaev et al., 2022). Therefore, users should evaluate the impact of sensitivity on the resulting data product accuracies against other quality metrics and influences, in case sensitivity is not a determining factor for accuracy over the given study area. While not commonly cited, when filtering by sensitivity alone, small or negative values in the `rx_assess/rx_energy` can cause sensitivity to fall outside the 0-1 domain. Shots with sensitivities beyond 0-1 should be considered invalid and will require upper and/or lower bounds when filtering by sensitivity only (Sager. A., Personal communication, 2025).
| Knowledge Check #6 |
|---|
| Can coverage beams have high sensitivity? Why, or why not? |
Quality Flag
The quality flag is a general recommendation for qualifying the data as usable. A `quality_flag` = 1 means the data met the criteria for energy, sensitivity, amplitude, and real-time surface tracking quality. If the `quality_flag = 0`, it is likely the waveform signal was poor or the ground was not detected or distinguishable. Quality control is a large part of the development of each product. The quality considerations for gridded and higher level derived products may apply similar flags to the footprint based data the product is based on, in addition to gridding, model, or data fusion methods. The details of these quality considerations are found in their respective publications. Here we list the quality control flags enlisted for the footprint product levels.
The L2A quality flag applies five checks at the footprint level regarding energy, sensitivity, amplitude, real-time surface tracking quality and difference to a DEM.
| `algorithm_run_flag = 1` | No error when running the algorithm |
|---|---|
| `surface_flag = 1` | When elev_lowestmode is within 300 m of the TanDEM-X 90 m digital elevation model (DEM) or mean sea surface. |
| `stale_return_flag = 0` | When the pulse detection algorithm detects a return signal > the detection threshold within the search window. |
| `sensitivity > 0.9` | Expect to detect the ground at 90% canopy cover or less |
| `rx_maxamp > 8 × sd_corrected` | maximum amplitude of the received waveform relative to mean noise level |
The L2B quality flag applies these checks at the footprint level. A quality_flag value of 1 indicates the cover and vertical profile metrics are over land and meet criteria for energy, sensitivity, amplitude, and real-time surface tracking quality, and the quality of extended Gaussian fitting to the lowest mode in addition to urban and water landcover filtering.
| L2A checks | Applies the same 5 filtering checks as for L2A for each ASG run. |
|---|---|
| `urban_proportion < 50` | From a 25 m global urban mask developed by the GEDI Science Team using the TerraSAR-X and TanDEM-X urban data product (Esch et al., 2013). |
| `Landsat_water_persistence < 10` | Exclude permanent open water bodies |
| Knowledge Check #7 |
|---|
| How do the L2A and L2B quality flags differ? |
The L4A quality flag applies several checks at the footprint level based on plant functional type:
| Deciduous strata | DBT and DNT |
|---|---|
| `L2_quality_flag = 1` | |
| `Landsat_water_persistence < 10` | Exclude permanent open water bodies |
| `sensitivity > 0.95` | Expect to detect the ground at 90% canopy cover or less |
| `leaf_off_flag = 0` | Only include leaf-on states derived for a 1 km grid using the VIIRS land surface phenology product VNP22Q2 (Zhang et al., 2016). |
| `urban_proportion < 50` | From a 25 m global urban mask developed by the GEDI Science Team using the TerraSAR-X and TanDEM-X urban data product (Esch et al., 2013). |
| Evergreen strata | EBT, ENT, GSW |
| Uses the same criteria as the deciduous strata, excluding the `leaf_off_flag`. |
The L4C quality flag adopts the `l2a_quality_flag` in addition to applying stricter filters for water and urban land cover and prioritizing tree cover and higher sensitivity thresholds.
Using the quality flag is a general recommendation because its effectiveness may vary depending on the application area (tropical forests versus sparse savannahs). In these cases, other pre-processing techniques such as investigating the differences between the results of each ASG (each producing its own resulting quality flag dataset) may help prioritize results that are more optimized for the given study area.
Applying the quality flag may lead to significant reduction in available observations, leading to sparse coverage exacerbated by the sampling nature (versus wall-to-wall coverage) of GEDI. Some studies revealed that the quality flag marked data as ‘0’ shown to be suitable for water applications (Fayad et al., 2020). When choosing not to implement the quality flag, it’s recommended to still filter shots by `algorithm_run_flag = 1` to produce similar results to removing no data values. Additionally, it is important to create a filter threshold for the difference between `elev_lowestmode` and `/digital_elevation_model`, especially when not using the quality flag.
| Knowledge Check #8 |
|---|
| Under which scenarios could the quality flag misrepresent valid data? Which scenarios could the quality flag “miss” flagging poor quality data? |
Degrade Flag
The degraded flag marks observations where the horizontal geolocation accuracy may be impacted by degradation of the pointing or positioning information. Degraded states can potentially lead to errors up to 60m. While the horizontal error for earlier data versions were found to be between 15-20m, with noise contributing ~8m and systematic errors ~8-10m, updates have improved error to about 10.2m.
A non-zero degraded flag indicates a degraded state, while choosing `degrade_flag = 0` keeps higher quality data. The flag is applied to develop L2B, L4B, and other derived products. A non-zero tens digit indicates degraded attitude, a non-zero units digit indicates a degraded trajectory. Details are in the table below copied from the L2 User Guide V2 (Blair et al., 2021).
| Flag | Degrade Condition |
|---|---|
| 3X | ADF CHU solution unavailable (ST-2) |
| 4X | Platform attitude |
| 5X | Poor solution (filter covariance large) |
| 6X | Data outage (platform attitude gap also) |
| 7X | ST 1+2 unavailable (similar boresight FOV) |
| 8X | ST 1+2+3 unavailable |
| 9X | ST 1+2+3 and ISS unavailable |
| X1 | Maneuver |
| X2 | GPS data gap |
| X3 | ST blinding |
| X4 | Other |
| X5 | GPS receiver clock drift |
| X6 | Maneuver & GPS receiver clock drift |
| X7 | GPS data gap & GPS receiver clock drift |
| X8 | ST blinding & GPS receiver clock drift |
| X9 | Other & GPS receiver clock drift |
Using the degrade flag can further ensure quality data selection, especially in combination with other quality metrics (quality flag, higher sensitivity thresholds, nighttime acquisition, mode detection checks, ground elevation outlier removal, land cover filtering, and high slope removal). The degrade flag is only representing conditions affecting geolocation, and not about the validity of the waveform. It is important to note that applying the degrade flag may lead to significant reduction in available observations, leading to sparse coverage exacerbated by the sampling nature (versus wall-to-wall coverage) of GEDI.
Leaf Status
The presence of leaves inherently affects the lidar signal characteristics and ability required to calculate metrics capturing canopy complexity (canopy cover, height, plant area index, plant area volume density, foliage height diversity, biomass, structural complexity index) possibly leading to systematic underestimation of these metrics during leaf-off conditions. Plant area index performance for example was lower during leaf-off conditions (Wang et al., 2025). Specifying the phenological phase under which the data was collected is likely to greatly reduce the size of the dataset, especially in combination with other filters. GEDI’s biomass products make explicit decisions about leaf status depending on the plant functional type. The level 4 quality flag applies leaf-on status only for deciduous trees but does not select for leaf status over non-deciduous strata. When generating the L4A biomass models based on RH98, the leaf status flag is not used because leaf-status is assumed to have minimal impact on this metric, though this assumption may vary across study areas (Li et al., 2023; Potapov et al., 2021).
Key Considerations For Foundational Variables When Exploring and Selecting GEDI Data Ready for Analysis and Application
Across applications relevant to decision-making plug-ins, which metrics and variables are most commonly used?
A selection of around 50 applications mostly having to do with landscape or vegetation area, structure, type, biomass or carbon mapping and monitoring, while also including a few examples integrating GEDI into agriculture, urban, waterbody, and archaeological studies were reviewed. Many of these examples improved upon methods and products or expanded upon dimensional analyses with vertical information for the first time to be input into decision making processes. Others generated novel information either due to vertical data or extent mapping. While not a finalized systematic literature review, the takeaways put into comparison the effective use of the GEDI Mission team recommended processing techniques relative to GEDI data used over a wide range of application areas.
L2A elevation and height metrics, L2B vegetation structure and vertical profile metrics, and geolocation waveform datasets were largely used across all the reviewed applications. Techniques to process the data included using the quality and degrade flags, sensitivity, beam type, time of observation and removing shots with no modes detected. Other strategies may have been successful, but were not as frequently deployed. These results are not comprehensive, and should not be regarded as rules but seen as offering a broad viewpoint of what has been deployed for developing results based on GEDI over the last few years.
Key starting points for deciding how to process GEDI data
Your goal, whether pure research or applied research, may require you to fill a gap in testing GEDI data capabilities across analysis-readiness preparation workflows and contexts. Or, you may choose to directly utilize established techniques. Provided in this next section is a review of key considerations across the GEDI mission ATBD, user guides, and literature assessing accuracies and applications focused product development.
Note: these recommendations are not all-encompassing, and may change as the lifetime of the mission continues and literature grows. Reflecting on these existing strategies can help support real-world and localized use of GEDI. The hope is to prepare and encourage testing and operational adoption of this technology for land based practitioners. These considerations when deployed will aid in establishing those trusted and efficient outcomes.
Geolocation Uncertainty
Accuracy considerations
- Horizontal errors of about 10m may matter more over edges or fragmented forests than over a continuous surface.
- Mitigation could include:
- Apply a `degrade_flag = 0` to exclude shots with systematic degradation to mitigate exacerbating challenging environmental or topographical conditions.
- Buffer footprints, e.g by 10-15m.
- Match GEDI results with airborne, UAV, or terrestrial lidar (optionally using the GEDI simulator to do so).
- Aggregate the footprints to coarser scales (e.g. 1km grids) or use GEDI gridded or other derived products.
Topography & Terrain
Use TanDEM-X or SRTM ancillary data provided within the GEDI product for each footprint, or utilize a user defined DEM.
Elevation Consistency
The elevation consistency may depend on the study area. Some differences are expected due to the difference in remote sensing capabilities (e.g. TanDEM-X is at 90m and does not penetrate the canopy the same as 25m GEDI shots do). Large differences can indicate waveform degradation, atmospheric or outlier interference, or high geolocation error.
- Exclude shots where GEDI’s estimated ground elevation differs from a chosen reference DEM by > 30-300m (range depends on the sensitivity of the application and method).
- The surface_flag applies a > 300m threshold, however differences with threshold > 150m are advised.
- Apply more conservative thresholds (e.g., > 30m) over rugged terrain.
Slope filtering
- For flat/low relief, minimal slope filtering is needed.
- Common thresholds for high slope filtering at > 15-30° over forests, but this could depend on the study area e.g., croplands > 6°.
Environmental & Acquisition Conditions
ISS orbit and sampling patterns
- Beams may have uneven spacing due to changes in altitude or systematic challenges.
- Sampling densities may greatly vary across geographic locations or time periods which could bias results.
- Though rare, yet increasing over time, prioritize regions with overlapping shots.
Day vs night
- Preference for nighttime observations `solar_elevation < 0` to minimize background noise.
- Daytime may still be usable in bright, open areas, if sensitivity is high enough.
Phenology (seasonality)
- When studying canopy structure be sure to use `leaf_off_flag = 0` for leaf-on data only.
- Leaf-on conditions only apply to vegetation that follow those seasonality differences.
- Optical data may additionally help determine seasonal changes with color analysis.
Over water surfaces
- GEDI can estimate water levels, however waves, depth, or inundation can complicate the waveform interpretation.
Signal Strength & Sensitivity
Waveform validity
- Exclude `num_detectedmodes = 0` if other quality filtering tested does not already do so.
- Use the “optimal” algorithm setting group after comparing to the default established in a given product level or to what was found in literature over similar study areas.
Beam type
- Full-power beams are preferred in dense forests or complex forests.
- Coverage beams are often excluded in daylight or dense canopy conditions yet are more likely to perform well over sparse vegetation or open canopies.
Sensitivity thresholds
- Commonly applied thresholds include ≥ 0.90, ≥ 0.95, ≥ 0.97, ≥ 0.98.
- Tropical evergreen broadleaf forests likely require a threshold ≥ 0.98 (as exemplified in the L4B product).
- Over open and sparse vegetation ≥ 0.90 is common.
Core Quality Filtering
Quality flags
- Apply general high quality waveforms with `l2a_quality_flag = 1`, `l2b_quality_flag = 1`, `rx_assess/quality_flag = 1`.
- `l4_quality_flag = 1` is found within the L4A AGBD products.
- Quality flags are highly recommended and can be used as a baseline filter. Additional contextual filters may be essential.
Land Cover Filtering
Filter using with user defined ancillary data, or by what is provided in GEDI’s ‘/land_cover_data’
Exclude non-vegetated surfaces
- Mask permanent water with landsat_water_persistence < 10%, additional evaluation over floodplains may be necessary.
- Reduce `urban_proportion` < 50% (or a stricter threshold of `urban_proportion = 0`).
- Mask snow/ice, bare ground, and deserts.
Vegetation Filtering
- Keep forests or PFT-specific classes only
- Avoid artificial RH98 inflation from urban structures.
Visual interpretation or manual masking may be necessary in fragmented/heterogeneous landscapes.
Interdependencies & Product Model Assumptions
Geolocation & heterogeneity & topography:
~10m error may be negligible in uniform canopies while severe in fragmented landscapes or complex topography where ground elevation estimates are poor compared to reference DEMs.
Beam sensitivity & canopy cover & algorithm choice:
Low-SNR beams are not suitable for certain canopies (with additional possible seasonal dependence), especially for observations collected during the day.
Hybrid Inference (L4B):
Relies on sampling assumptions where filtering prevents bias from artifacts, clouds, and poor geolocation.
Typical Workflows
These highlighted combinations generalize what has been deployed across the several published GEDI products, and also reflect commonalities across applications literature. These are not concrete rules, but rather starting points.
Baseline QC:
- `l2_quality_flag = 1`, `degrade_flag = 0` (while still additionally evaluating the improvements these offer when applied).
Biomass (L4A/L4B):
- Add `l4_quality_flag = 1`
- AGBD output error < 50%, stratify by PFT & geographic region, mask land, water, urban areas etc.
Canopy height:
- RH98 metric (avoid RH100)
- Full-power beams
- Sensitivity ≥ 0.95
- Remove leaf-off observations.
PAI:
- Mask land cover types, water, urban areas etc.
- Leaf-on
- Biome-specific ρv/ρg calibration
Forest change or recovery:
- Assess temporal alignment across data inputs
- Exclude edges/gaps
- Prioritize sensitivity ≥ 0.9
Other Techniques Found Across Literature
Exploring beyond the techniques mentioned above can provide deeper insight on the behavior of GEDI over your study area while helping to retain as many ‘good’ shots as possible. Here are some techniques mentioned across research and analysis and applications focused literature that were less commonly applied. Less common does not necessarily equate to its effectiveness. It simply reflects the state of the research, and is shared here to help you optimize your own investigation of GEDI quality.
More rigorous spatial filtering for handling geolocation uncertainty and land cover heterogeneity:
- Investigate accuracies across view angles for each observation where low view angles may increase uncertainty near water or steep terrain (keep `local_beam_elevation ≥ 1.5`) (Tommaso et al., 2023).
- Applying a buffer (e.g., 10m or 30m) around the GEDI footprint centroid to account for geolocation error when overlaying with other data sources like land cover maps or higher-resolution ALS data (Adrah et al., 2025; Hoffren et al., 2023).
- Removing shots that overlap with more than one land cover type within their buffered area (Adrah et al., 2025).
- Removing shots that were within a certain distance (e.g. 50m) of the boundary between two land cover classes (Adrah et al., 2025).
- Calculating the standard deviation of a spatial metric (like median NDVI from intersecting Sentinel-2 pixels) within a buffered GEDI shot and excluding shots below a certain threshold (e.g., standard deviation < 1) to ensure homogeneity (Adrah et al., 2025).
- Removing shots not completely contained within specific study polygons (like forest management units) to avoid edge effects (Guerra Hernandez et al., 2021).
- Remove sparse/isolated sequences of shots were a single track has < 30 consecutive valid shots after filtering (Keller et al., NASA ROSES 2022).
Statistical outlier detection based on error distribution:
- Use statistical tests to define outlier removal thresholds based on the distribution of errors rather than applying fixed thresholds to keep as many values with low RMSE (e.g. KS non-parametric tests or interquartile range thresholds) (Barbosa et al., 2022).
Applying quality control to the metrics or classifications derived from GEDI data or even model predictions based on GEDI data, rather than just using the raw GEDI flags:
- Removing shots with unrealistic derived height metrics (e.g., RH98 > 10m for low vegetation studies or RH98 < 15m when inconsistent with high AGBD) (Li et al., 2024).
- Filtering shots used as training data based on confidence scores in their predicted class label (e.g. confidence < 0.8) (Adrah et al., 2025).
- Filtering based on peak vegetation index values (proxy for biomass) to identify GEDI shots is less likely to be reliable in low biomass conditions (Tomasso et al., 2023).
- Removing shots with very high relative standard error on derived AGBD (>50%) (Sandamali et al., 2025).
Removing unrealistic values or low-confidence predictions helps generate more accurate training samples for wall-to-wall mapping. This is especially important when GEDI performance is known to be lower in certain conditions, such as low biomass areas, or when using GEDI products that include estimates of uncertainty.
Using the results of unsupervised learning methods to inform filtering decisions:
- After identifying clusters with distinct characteristics from clustering algorithm outputs, further filter the GEDI shots based on their cluster membership or by sampling points nearest the cluster centroids can help refine the data.
- This process may help address measurement errors in clustering, stemming from issues like poor ground detection, terrain slope, geolocation, and spectral similarity. (McClure et al., 2024).
| Knowledge Check #9 |
|---|
| Why is it important to quantify and check a GEDI shot’s ability to detect the ground? Which factors during observation may influence GEDI’s ability to detect the ground? |
Summary
This section provides guidance on understanding and processing key GEDI waveform and data quality parameters that influence the interpretation and usability of GEDI-derived metrics. It outlines how factors such as geolocation accuracy, topography, environmental conditions, signal strength, and sensitivity affect data quality, and presents best practices for filtering and preprocessing GEDI data for reliable analysis. Summarizing findings from GEDI documentation and over 50 applied studies, it highlights common workflows using Level 2A, 2B, and 4 products for applications in forest structure, biomass, and carbon mapping, among others. The document emphasizes using quality flags, sensitivity thresholds, and contextual filters (e.g., slope, seasonality, land cover) as a baseline for analysis while encouraging experimentation with additional filtering and statistical or spatial techniques to enhance data reliability. Overall, it serves as a practical reference and “recipe” for researchers and practitioners to prepare, evaluate, and apply GEDI data effectively in diverse ecological and land-based studies.