Although the database software “SAP HANA” is characterized by “super” high-speed data processing by in-memory technology, SAP assumes that the feature is not limited to it. It is said that it is due to the design concept itself, and produces advantages that other companies do not have. High-speed processing by in-memory technology is the major feature provided by SAP HANA.
Secondary Effects Brought by HANA Column Store
Column stores are often used for vector processing that processes elements of vectors (arrays) in a large number of repetitive calculations compared with low stores, and compression is often taken advantage of, but from the architectural point of views effects to look at.
Structure that Does Not Cause Data Fragmentation
Oracle database stores data in fixed size data blocks (hereinafter referred to as “blocks”) in units of tablespaces. Storing, updating, and deleting variable-length data in fixed-length blocks causes various problems. For example, when trying to update data beyond the free space in a block, one record cannot be stored in the same block. Also, if the length of one record initially occurs, the block length, “row chaining” occurs where data is stored across two or more blocks (will move the data to another block, leaving the pointer in the original block.
Alternative, if data is deleted row by row with the delete statement, reusable free space occurs in the block, but if the record length to be stored in the subsequent INSERT statement exceeds the free space, small unused space fragments will be damaged (segment level fragmentation).
On the other hand, SAP HANA’s column store has no mechanism to store row data in data blocks, so there is no room for fragmentation to occur in Oracle databases etc. Also, Delta Merge, a very important function of SAP HANA introduced in the second series, properly sorts and compresses data in main storage, maintaining the optimal data structure for reading. Therefore, in this respect SAP HANA can be said to be a database that is less aged than a low store database.
Clustering Factor and Irrelevant Index
As introduced in the 4th series , SAP HANA (main storage) value arrays can be searched at high speed by binary search without creating an index. That is, it has the same effect as indexing all columns in a low-store database (although the value ID array needs an index to avoid a full scan). However, since tables and indexes exist as separate objects in a low-store database, even if you create indexes on all columns, there is no guarantee that they will be the best indexes.
The performance is good if the actual table data are gathered in the same block for multiple rowids obtained in the INDEX RANGE SCAN, and the performance is degraded if they are distributed in separate blocks.
Because the clustering factor (CF: CLUSTERING FACTOR), which indicates the degree of data concentration dispersion, depends on the data storage situation, the index CF for keys that increase over time is generally good, and it is worse for indexes that do not. This is the fate of a low store where data is stored as a raw image, and if you try to improve the CF of a certain column, it is necessary to re-store the entire data in the order of the corresponding key.
In the case of the HANA column store, the data is independent for each column, so the index necessary for speeding up value ID array search or non-unique key search does not have the concept of CF in the first place. This means you can create the best index for any column. The only in-memory-specific HANA is three times faster than the Oracle Database 12c in-memory option.