ViduraRedshift Tools
Storage Analysis Tools
The storage analysis tools give Vidura the ability to diagnose disk usage, vacuum health, and data distribution across the Redshift cluster. These 7 tools form the storage subset of the 19 Redshift tools.
Tool Reference
| Tool Name | Data Source | Description |
|---|---|---|
| vacuum_stats() | svv_table_info + stl_vacuum | Returns the top 5 tables by unsorted_pct. Includes reclaimable space estimate, last vacuum timestamp, and ghost row percentage. Use when asked about vacuum scheduling priorities. |
| reclaimed_space() | stl_vacuum | Returns top 5 tables by reclaimable disk space. Formula: reclaimable_gb = impacted_blocks / 1024.0. Use when asked about disk space reclamation opportunities. |
| table_storage_summary() | svv_table_info + stl_vacuum | Large table storage analysis for tables with more than 1 million rows. Shows total size, distribution skew ratio, and vacuum history. Useful for identifying candidates for redistribution or archiving. |
| column_stats() | quper.column_stats | Returns column-level statistics including NULL percentage, distinct value count, and data type distribution per column. Sourced from Quper's pre-computed statistics table rather than direct system views. |
| volume_stats() | stl_insert + stl_delete (CTE chain) | Returns DML volume since last ANALYZE per table: rows inserted, rows deleted, and net row change. Built as a CTE chain that joins stl_insert and stl_delete with the last analyze timestamp. |
| sort_key_and_encoding_details() | svv_table_info + pg_attribute + svv_columns | Returns column-level encoding type and sort key position for all columns in a table. Use when diagnosing compression efficiency or verifying sort key configuration. |
| table_scan_stats() | quper.table_scan_stats | Returns table scan frequency and total bytes scanned from Quper-maintained scan statistics. Identifies hot tables (frequently scanned) and cold tables (rarely accessed). Use for storage tiering decisions. |
Key Formulas
Ghost Row Percentage Formula
Ghost rows are rows marked for deletion but not yet reclaimed by VACUUM. They consume storage and inflate scan costs.
Ghost Row % = (tbl_rows − estimated_visible_rows) / NULLIF(tbl_rows, 1) × 100
Tables with Ghost Row % above 30% should have
Ghost Row % = (tbl_rows − estimated_visible_rows) / NULLIF(tbl_rows, 1) × 100
Tables with Ghost Row % above 30% should have
VACUUM DELETE ONLY run as soon as possible.| Metric | Formula |
|---|---|
| Reclaimable Space (GB) | impacted_blocks / 1024.0 |
| Ghost Row % | (tbl_rows - estimated_visible_rows) / NULLIF(tbl_rows, 1) * 100 |
| Distribution Skew | MAX(slice_rows) / NULLIF(AVG(slice_rows), 0) |
Vacuum Priority Guidance
When Vidura recommends vacuum operations, it uses the following priority order based on output from vacuum_stats():
- VACUUM SORT ONLY — Tables with unsorted_pct > 20%. Restores sort key order for better zone map effectiveness. Does not reclaim space.
- VACUUM DELETE ONLY — Tables with ghost_rows_pct > 30%. Reclaims space without resorting. Faster than a full VACUUM.
- VACUUM FULL — Tables with both high unsorted_pct and high ghost_rows_pct. Reclaims space and restores sort order in one pass. Most expensive — schedule during low-traffic windows.
- ANALYZE — Tables with stale statistics (last ANALYZE > 7 days and significant DML since last run). Updates optimizer statistics without a full vacuum.