ViduraRedshift Tools

Storage Analysis Tools

The storage analysis tools give Vidura the ability to diagnose disk usage, vacuum health, and data distribution across the Redshift cluster. These 7 tools form the storage subset of the 19 Redshift tools.

Tool Reference

Tool NameData SourceDescription
vacuum_stats()svv_table_info + stl_vacuumReturns the top 5 tables by unsorted_pct. Includes reclaimable space estimate, last vacuum timestamp, and ghost row percentage. Use when asked about vacuum scheduling priorities.
reclaimed_space()stl_vacuumReturns top 5 tables by reclaimable disk space. Formula: reclaimable_gb = impacted_blocks / 1024.0. Use when asked about disk space reclamation opportunities.
table_storage_summary()svv_table_info + stl_vacuumLarge table storage analysis for tables with more than 1 million rows. Shows total size, distribution skew ratio, and vacuum history. Useful for identifying candidates for redistribution or archiving.
column_stats()quper.column_statsReturns column-level statistics including NULL percentage, distinct value count, and data type distribution per column. Sourced from Quper's pre-computed statistics table rather than direct system views.
volume_stats()stl_insert + stl_delete (CTE chain)Returns DML volume since last ANALYZE per table: rows inserted, rows deleted, and net row change. Built as a CTE chain that joins stl_insert and stl_delete with the last analyze timestamp.
sort_key_and_encoding_details()svv_table_info + pg_attribute + svv_columnsReturns column-level encoding type and sort key position for all columns in a table. Use when diagnosing compression efficiency or verifying sort key configuration.
table_scan_stats()quper.table_scan_statsReturns table scan frequency and total bytes scanned from Quper-maintained scan statistics. Identifies hot tables (frequently scanned) and cold tables (rarely accessed). Use for storage tiering decisions.

Key Formulas

Ghost Row Percentage Formula

Ghost rows are rows marked for deletion but not yet reclaimed by VACUUM. They consume storage and inflate scan costs.

Ghost Row % = (tbl_rows − estimated_visible_rows) / NULLIF(tbl_rows, 1) × 100

Tables with Ghost Row % above 30% should have VACUUM DELETE ONLY run as soon as possible.
MetricFormula
Reclaimable Space (GB)impacted_blocks / 1024.0
Ghost Row %(tbl_rows - estimated_visible_rows) / NULLIF(tbl_rows, 1) * 100
Distribution SkewMAX(slice_rows) / NULLIF(AVG(slice_rows), 0)

Vacuum Priority Guidance

When Vidura recommends vacuum operations, it uses the following priority order based on output from vacuum_stats():

  1. VACUUM SORT ONLY — Tables with unsorted_pct > 20%. Restores sort key order for better zone map effectiveness. Does not reclaim space.
  2. VACUUM DELETE ONLY — Tables with ghost_rows_pct > 30%. Reclaims space without resorting. Faster than a full VACUUM.
  3. VACUUM FULL — Tables with both high unsorted_pct and high ghost_rows_pct. Reclaims space and restores sort order in one pass. Most expensive — schedule during low-traffic windows.
  4. ANALYZE — Tables with stale statistics (last ANALYZE > 7 days and significant DML since last run). Updates optimizer statistics without a full vacuum.