- Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.
- Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs).
- Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:
- Reading data for every transaction when the data could be read once and kept cached or in the working storage;
- Rereading data for a transaction where the data was read earlier in the same transaction;
- Causing unnecessary table or index scans;
- Not specifying key values in the WHERE clause of an SQL statement.
- Do not do things twice. For instance, if you need data summarization for reporting purposes, increment stored totals if possible when data is being initially processed, so your reporting application does not have to reprocess the same data.
- Allocate enough memory at the beginning of an application to avoid time-consuming reallocation during the process.
- Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.
- Simplify as much as possible and avoid building complex logical structures.
- Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.
- Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.
- In large systems backups can be challenging, especially if the batch system is running concurrent with on-line on a 24-7 basis. Database backups are typically well taken care of in the on-line design, but file backups should be considered to be just as important. If the system depends on flat files, file backup procedures should not only be in place and documented, but regularly tested as well.
- A batch architecture typically affects on-line architecture and vice versa. Design with both architectures and environments in mind using common building blocks when possible.
Friday, January 25, 2013
Performance General Principles
The following are a number of key principles, guidelines, and general considerations to take into consideration when building any solution that need to be highly performing. It is based on batching solutions, but I find it relevant to all other kind of applications - especially big online web-services.
Subscribe to:
Post Comments (Atom)
Datafusion Comet
Hi! Recently I moved to Rust and working on several projects - more insights to come ... one of them was Datafusion - an extremely fast S...
-
I spend few long hours to try to fix default crappy touchpad behavior, before finally I got to this. And by "crappy" I mean: not ...
-
Lastly I spent some time investigating usage of GPU in computations, particularly nvidia CUDA. But during research I spent some time looking...
No comments:
Post a Comment