Friday, January 25, 2013

Performance General Principles

The following are a number of key principles, guidelines, and general considerations to take into consideration when building any solution that need to be highly performing. It is based on batching solutions, but I find it relevant to all other kind of applications - especially big online web-services.

  • Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.
  • Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs).
  • Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:
    • Reading data for every transaction when the data could be read once and kept cached or in the working storage;
    • Rereading data for a transaction where the data was read earlier in the same transaction;
    • Causing unnecessary table or index scans;
    • Not specifying key values in the WHERE clause of an SQL statement.
  • Do not do things twice. For instance, if you need data summarization for reporting purposes, increment stored totals if possible when data is being initially processed, so your reporting application does not have to reprocess the same data.
  • Allocate enough memory at the beginning of an application to avoid time-consuming reallocation during the process.
  • Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.
  • Simplify as much as possible and avoid building complex logical structures.
  • Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.
  • Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.
  • In large systems backups can be challenging, especially if the batch system is running concurrent with on-line on a 24-7 basis. Database backups are typically well taken care of in the on-line design, but file backups should be considered to be just as important. If the system depends on flat files, file backup procedures should not only be in place and documented, but regularly tested as well.
  • A batch architecture typically affects on-line architecture and vice versa. Design with both architectures and environments in mind using common building blocks when possible.

No comments:

Post a Comment

Web 3 - blockchain layers

Layers from a blockchain perspective. My plan is to write 5 articles:  1 Intro: Web 1.. 2.. 3.. 2 Layers in crypto.  [this one] 3 Applicatio...