ANALYZE 中文man頁面
NAME
ANALYZE - 收集與數據庫有關的統計
SYNOPSIS
ANALYZE [ VERBOSE ] [ table [ (column [, ...] ) ] ]
DESCRIPTION 描述
ANALYZE 收集有關 PostgreSQL 表的內容的統計,然后把結果保存在系統表 pg_statistic 里。隨后,查詢規劃器就可以使用這些統計幫助判斷查詢的最有效的規劃。
如果沒有參數,ANALYZE 檢查在當前數據庫里的所有表。 如果有參數,ANALYZE 只檢查那個表。 你還可以給出一列字段名字,這個時候只收集那些字段的統計信息。
PARAMETERS 參數
- VERBOSE
打開處理過程信息的顯示。- table
要分析的特定表(可能用模式名修飾)的名字。缺省是當前數據庫里所有表。- column
要分析的特定字段的名字。缺省是所有字段。
OUTPUTS 輸出
如果聲明了 VERBOSE,ANALYZE 發出進度信息,表明當前正在處理的是哪行。 同時打印有關改表的很多其它信息。
NOTES 注意
周期性地運行 ANALYZE,或者在對表的大部分內容做了更改之后馬上運行它是個好習慣, 準確的統計信息將幫助規劃器選擇最合適的查詢規劃,并因此而改善查詢處理的速度。 一種比較經常采用的策略是每天在低負荷的時候運行一次 VACUUM [vacuum(7)] 和 ANALYZE。
和 VACUUM FULL 不同的是, ANALYZE 只需要在目標表上有一個讀取鎖, 因此它可以和表上的其它活動并行地運行。
收集的統計信息通常包括一個每字段最常用數值的列表以及一個包線圖,顯示每個字段里數據的近似分布。 如果 ANALYZE 認為它們都沒有什么用, (比如,在一個唯一鍵字的字段上沒有公共的數值)或者是該字段數據類型不支持相關的操作符, 那么它們都可以忽略。在 Chapter 21 ``Routine Database Maintenance'' 中有關于統計的更多信息。
對于大表,ANALYZE 采集表內容的一個隨機的抽樣做統計,而不是檢查每一行。 這樣就保證了即使是在很大的表上,我們也只需要很少的一些時間就可以完成分析。 不過,要注意的是統計只是近似的結果,而且每次運行ANALYZE都會導致 EXPLAIN 顯示的規劃器的預期開銷有一些小變化, 即使表內容實際上沒有改變也這樣。在很小的概率的情況下,這個不確定的行為會導致查詢優化器在不同 ANALYZE 之間選擇不同的查詢規劃。為了避免這個問題,可以提高 ANALYZE 收集的統計數量,像下面描述的那樣。
分析的廣度可以通過用調整 default_statistics_target 參變量, 或者是以每字段為基礎通過用 ALTER TABLE ... ALTER COLUMN ... SET STATISTICS (參閱 ALTER TABLE [alter_table(7)]) 設置每字段的統計目標來控制。目標數值設置最常用數值列表中的記錄的***數目以及包線圖中的***塊數。 缺省的目標數值是 10,不過我們可以調節這個數值獲取規劃器計算精度和 ANALYZE 運行所需要的時間以及 pg_statistic 里面占據的空間數目之間的平衡。 特別是,把統計目標設置為零就關閉了該字段的統計收集。 這么做對那些從來不參與到查詢的 WHERE,GROUP BY,或者 ORDER BY 子句里的字段是很有用的,因為規劃器不會使用到這樣的字段上的統計。
在被分析的字段中***的統計目標決定為統計采樣的表中的行的數目。 增大目標會導致做 ANALYZE 的時候成比例地增大對時間和空間的需求。
#p#
NAME
ANALYZE - collect statistics about a database
SYNOPSIS
ANALYZE [ VERBOSE ] [ table [ (column [, ...] ) ] ]
DESCRIPTION
ANALYZE collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
With no parameter, ANALYZE examines every table in the current database. With a parameter, ANALYZE examines only that table. It is further possible to give a list of column names, in which case only the statistics for those columns are collected.
PARAMETERS
- VERBOSE
- Enables display of progress messages.
- table
- The name (possibly schema-qualified) of a specific table to analyze. Defaults to all tables in the current database.
- column
- The name of a specific column to analyze. Defaults to all columns.
OUTPUTS
When VERBOSE is specified, ANALYZE emits progress messages to indicate which table is currently being processed. Various statistics about the tables are printed as well.
NOTES
It is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics will help the planner to choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy is to run VACUUM [vacuum(7)] and ANALYZE once a day during a low-usage time of day.
Unlike VACUUM FULL, ANALYZE requires only a read lock on the target table, so it can run in parallel with other activity on the table.
The statistics collected by ANALYZE usually include a list of some of the most common values in each column and a histogram showing the approximate data distribution in each column. One or both of these may be omitted if ANALYZE deems them uninteresting (for example, in a unique-key column, there are no common values) or if the column data type does not support the appropriate operators. There is more information about the statistics in the chapter called ``Routine Database Maintenance'' in the documentation.
For large tables, ANALYZE takes a random sample of the table contents, rather than examining every row. This allows even very large tables to be analyzed in a small amount of time. Note, however, that the statistics are only approximate, and will change slightly each time ANALYZE is run, even if the actual table contents did not change. This may result in small changes in the planner's estimated costs shown by EXPLAIN. In rare situations, this non-determinism will cause the query optimizer to choose a different query plan between runs of ANALYZE. To avoid this, raise the amount of statistics collected by ANALYZE, as described below.
The extent of analysis can be controlled by adjusting the DEFAULT_STATISTICS_TARGET parameter variable, or on a column-by-column basis by setting the per-column statistics target with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS (see ALTER TABLE [alter_table(7)]). The target value sets the maximum number of entries in the most-common-value list and the maximum number of bins in the histogram. The default target value is 10, but this can be adjusted up or down to trade off accuracy of planner estimates against the time taken for ANALYZE and the amount of space occupied in pg_statistic. In particular, setting the statistics target to zero disables collection of statistics for that column. It may be useful to do that for columns that are never used as part of the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner will have no use for statistics on such columns.
The largest statistics target among the columns being analyzed determines the number of table rows sampled to prepare the statistics. Increasing the target causes a proportional increase in the time and space needed to do ANALYZE.