Designing performant DB2 databases
I recently have been working with the IBM DB2 database and wanted to gain some familiarity with the performance aspects of the product. First, the disclaimer – I am not an expert in DB2 performance! But in the true spirit of blogging: “standing on the shoulders of giants” and all that, I’ve scoured the internet for useful resources and summarized them here.
Now for the second disclaimer! I don’t yet have the experience of applying this to real databases, any comments from experienced DB2 DBAs would be welcome!
All of the information on tablespaces and bufferpools was derived from:
DB2 Basics: Tablespaces and BufferPools
Tablespace design is an important part of the overall database design. Creating more than one user tablespace can enhance performance. There are three ways that multiple tablespaces can be useful:
- Controlling I/O, if the tablespaces can be located on different drives
- Using different pagesizes
- Controlling bufferpools
[Source: Initial Tuning and Design Considerations]
- Page size
- Choices: 4k-32k. Considerations:
- Utilisation: DB2 won’t store more than 255 rows in a single page. If your tables have small rows, then a large page size may waste lots of space and increase IO.
- Row sizing: DB2 won’t split a row across multiple pages. Thus, the page size limits the maximum row size.
- This is essentially the stripe size between the various containers in a tablespace (in units of the page size).
- RAID: Extentsize should be equal to, or a multiple of, the RAID stripe size.
- (in units of the page size) Considerations:
- Relationship to Extentsize: Should be a multiple of Extentsize * Number of containers
- RAID: Should be the RAID stripe size * number of RAID parallel devices (or a multiple of this)
- Automatic: In recent versions of DB2, this can be set automatically.
Note: Tables that are often read via large table scans would benefit from prefetching. It is also advisable to allocate space in the associated bufferpool for block based transfers.
SMS, DMS (File), DMS (Raw)
Db2 offers a reasonable amount of choice as to how you store data on disc. For a good discussion of the various merits of each see:
Here is my potted summary:
- System Managed Tablespace (SMS)
- This is plain old files on the filesystem. The files automatically grow and shrink as the tablespace requirements change. This makes them simple to manage. Performance is ok on OSs with good filesystem layers – and can also make use of the OS cache and prefetching.
- Use for tables where convenience is more important than performance: e.g. System catalog
- Tablespaces with rather unknown or variable space requirements: e.g. Temporary Tablespaces
- Database Managed Tablespace – Files (DMS)
- Here DB2 is given a big file, which it manages itself. The downside is that you must know how much space to allocate in advance. The upside, is potentially better performance than SMS as you don’t get fragmentation and you can take advantage of splitting LOBs, Indexes and data into different tablespaces. With this type of storage it is often possible to choose whether or not the OS filesystem layer will perform caching or not.
- LOBs. DB2 does not use bufferpools to access LOBs, it is useful to use file based storage and allow the OS to do caching and prefetching.
- Database Managed Storage – Raw (DMS)
- DB2 gets a whole logical volume to itself to store data however it chooses. Potentially best performance.
Use for the main data tables. But prefer file-based storage for LOBs so that the OS can do caching and prefetching.
There is many:one relationship between tables:bufferpools. The page size of tablespace must match that of its bufferpool. Generally, fewer is better – DB2 is quite good at managing the contents of a bufferpool.
However, there may be specific scenarios where the DBA can do a better job. A typical example is a database with a handful of small, frequently access tables and one huge table. If they shared a bufferpool, large queries on the big table would purge the frequently accessed small tables from the bufferpool. Using a seperate bufferpool for the small tables would keep them in memory.
The DB2 Infocenter has a good explanation of the types of scenario when you may want multiple bufferpools:
Ok, so you have a really big dahttp://www.ibm.com/developerworks/data/library/techarticle/adamache/0621_adamache.htmltabase – you might need to split this across several servers. In DB2 speak – these are partitions. Databases with tables approaching 1 billion rows, or 100GB data will need to consider partitions. If you have a database this big, you probably aren’t reading this! But in any case, here is an article that discusses what DB2 considered really big.
General DB Parameters
At this level, use the DB2 Configuration Assistant to specify reasonable start values.
Or see the DB2 Infocenter for the command line equivilent: Db2 V9.7 Infocenter: AUTOCONFIGURE