Friday, January 25, 2013

Performance General Principles

The following are a number of key principles, guidelines, and general considerations to take into consideration when building any solution that need to be highly performing. It is based on batching solutions, but I find it relevant to all other kind of applications - especially big online web-services.

  • Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.
  • Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs).
  • Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:
    • Reading data for every transaction when the data could be read once and kept cached or in the working storage;
    • Rereading data for a transaction where the data was read earlier in the same transaction;
    • Causing unnecessary table or index scans;
    • Not specifying key values in the WHERE clause of an SQL statement.
  • Do not do things twice. For instance, if you need data summarization for reporting purposes, increment stored totals if possible when data is being initially processed, so your reporting application does not have to reprocess the same data.
  • Allocate enough memory at the beginning of an application to avoid time-consuming reallocation during the process.
  • Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.
  • Simplify as much as possible and avoid building complex logical structures.
  • Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.
  • Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.
  • In large systems backups can be challenging, especially if the batch system is running concurrent with on-line on a 24-7 basis. Database backups are typically well taken care of in the on-line design, but file backups should be considered to be just as important. If the system depends on flat files, file backup procedures should not only be in place and documented, but regularly tested as well.
  • A batch architecture typically affects on-line architecture and vice versa. Design with both architectures and environments in mind using common building blocks when possible.

Thursday, January 10, 2013

SimpleDataFormat - thread safe

If there is issue connected to dates in multi-threaded application and first to check is: do it use SimpleDateFormat.

Why you should never use SimpleDataFormat? It's not safe! 

.. and here is proof:


Run this test class and you will see (as is not predictable run it few times):

/**
 * Please feel free to experiment - not only wrong data but sometimes number format exceptions...
 */
public class SimpleDateTest {

 static SimpleDateFormat df = new SimpleDateFormat("dd-MMM-yyyy");
 static String testdata[] = { "01-Jan-1999", "14-Feb-2001", "31-Dec-2007" };

 /**
  * Test method for SDF.
  */
 @Test
 public void testParse() {
  Runnable r[] = new Runnable[testdata.length];
  for (int i = 0; i < r.length; i++) {
   final int i2 = i;
   r[i] = new Runnable() {
    public void run() {
     try {
      for (int j = 0; j < 1000; j++) {
       String str = testdata[i2];
       String str2 = null;
//         synchronized(df) 
       {
        Date d = df.parse(str);
        str2 = df.format(d);
       }

       Assert.assertEquals("date conversion failed after "
         + j + " iterations.", str, str2);
      }
     } catch (ParseException e) {
      throw new RuntimeException("parse failed");
     }
    }
   };
   new Thread(r[i]).start();
  }
 }
}

 

 

Possible outputs are:

Exception in thread "Thread-0" junit.framework.ComparisonFailure: date conversion failed after 0 iterations. expected:<[01-Jan-1999]> but was:<[14-Feb-2001]>
 at junit.framework.Assert.assertEquals(Assert.java:85)

 

Exception in thread "Thread-0" Exception in thread "Thread-1" java.lang.NumberFormatException: For input string: "19992001.E199920014E"

 

Exception in thread "Thread-0" java.lang.NumberFormatException: multiple points
 at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)

 

Exception in thread "Thread-0" java.lang.NumberFormatException: For input string: ""
 at java.lang.NumberFormatException.forInputString(Unknown Source)

 

And solution is synchronize usage of SimpleDateFormat, use ThreadLocal like here: ThreadSafeSimpleDateFormat:

tsd

 

or my last findings use improved version of ThreadLocal  that creates HashMaps of SDF inside: SafeSimpleDateFormat 

import java.text.DateFormatSymbols;
import java.text.NumberFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.TimeZone;

/**
 * This class implements a Thread-Safe (re-entrant) SimpleDateFormat
 * class.  It does this by using a ThreadLocal that holds a Map, instead
 * of the traditional approach to hold the SimpleDateFormat in a ThreadLocal.
 *
 * Each ThreadLocal holds a single HashMap containing SimpleDateFormats, keyed
 * by a String format (e.g. "yyyy/M/d", etc.), for each new SimpleDateFormat
 * instance that was created within the threads execution context.
 *
 * @author John DeRegnaucourt (jdereg@gmail.com)
 *         <br/>
 *         Copyright (c) John DeRegnaucourt
 *         <br/><br/>
 *         Licensed under the Apache License, Version 2.0 (the "License");
 *         you may not use this file except in compliance with the License.
 *         You may obtain a copy of the License at
 *         <br/><br/>
 *         http://www.apache.org/licenses/LICENSE-2.0
 *         <br/><br/>
 *         Unless required by applicable law or agreed to in writing, software
 *         distributed under the License is distributed on an "AS IS" BASIS,
 *         WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *         See the License for the specific language governing permissions and
 *         limitations under the License. */
public class SafeSimpleDateFormat
{
    private final String _format;
    private static final ThreadLocal<Map<String, SimpleDateFormat>> _dateFormats = new ThreadLocal<Map<String, SimpleDateFormat>>()
    {
        public Map<String, SimpleDateFormat> initialValue()
        {
            return new HashMap<String, SimpleDateFormat>();
        }
    };

    private SimpleDateFormat getDateFormat(String format)
    {
        Map<String, SimpleDateFormat> formatters = _dateFormats.get();
        SimpleDateFormat formatter = formatters.get(format);
        if (formatter == null)
        {
            formatter = new SimpleDateFormat(format);
            formatters.put(format, formatter);
        }
        return formatter;
    }

    public SafeSimpleDateFormat(String format)
    {
        _format = format;
    }

    public String format(Date date)
    {
        return getDateFormat(_format).format(date);
    }

    public String format(Object date)
    {
        return getDateFormat(_format).format(date);
    }

    public Date parse(String day) throws ParseException
    {
        return getDateFormat(_format).parse(day);
    }

    public void setTimeZone(TimeZone tz)
    {
        getDateFormat(_format).setTimeZone(tz);
    }

    public void setCalendar(Calendar cal)
    {
        getDateFormat(_format).setCalendar(cal);
    }

    public void setNumberFormat(NumberFormat format)
    {
        getDateFormat(_format).setNumberFormat(format);
    }

    public void setLenient(boolean lenient)
    {
        getDateFormat(_format).setLenient(lenient);
    }

    public void setDateFormatSymbols(DateFormatSymbols symbols)
    {
        getDateFormat(_format).setDateFormatSymbols(symbols);
    }

    public void set2DigitYearStart(Date date)
    {
        getDateFormat(_format).set2DigitYearStart(date);
    }
} 

 

 

Tuesday, January 8, 2013

Log4J vs. own ApplicationLogger

Very often I found that projects are using own layer of logging on top of Log4J. I understand that everyone wants to be independent however sometimes we need to ask our-self simple question: why?
Here you could find simple example & test how much you can achieve by not introducing that layer.
I'll compare two instances: Log4J and AppplicationLogger in "production like" scenario - means log is switch on on INFO level. And this is quite predictable calculation: how much time can we get by not calling separate methods...
Test:
 @Test
 public void loggerTest(){
  int TIMES = 1000000;
 [...]
  System.out.println("x");
  s1=System.currentTimeMillis();
  for (int i=0; i<TIMES; i++) {
   Log4JStandardMethod("666");
  }
  e1=System.currentTimeMillis();

  System.out.println("x");
  s2=System.currentTimeMillis();
  for (int i=0; i<TIMES; i++) {
   AppLoggerStandardMethod("666");
  }
  e2=System.currentTimeMillis();
[...]
 }

        void Log4JStandardMethod(String id) {
  if (log4J.isDebugEnabled()) log4J.debug("Just some addition: " + 13);
  
  if (log4J.isDebugEnabled()) log4J.debug("Details are:  " + id + 
    " System: " + id + "  date: " + 14);
  
  if (log4J.isDebugEnabled()) log4J.debug("Size of messages for: " + id + " is: " + 15);
  if (log4J.isDebugEnabled()) log4J.debug("Size of messages for: " + id + " is: " + 16);

 }

 void ApplicationLoggerStandardMethod(String id) {
  String strMethodName = "AppLoggerSimpleMethod";
  appLogger.logDebug("Just some addition: " + 13, strClassName, strMethodName);
  
  appLogger.logDebug("Details are:  " + id + 
    " System: " + id + "  date: " + 14, strClassName, strMethodName);
  
  appLogger.logDebug("Size of messages for: " + id + " is: " + 15, strClassName, strMethodName);
  appLogger.logDebug("Size of messages for: " + id + " is: " + 16, strClassName, strMethodName);
 }
  
I will skip definition how ApplicationLogger looks, just one method that we are interested in:
    /**
     * Method to log DEBUG level messages
     *
     * @param String strMsg
     * @param String strClassName
     * @param String strMethodName
     */
    public void logDebug(final String strMsg, final String strClassName,
        final String strMethodName) {

     if (!this.logger.isEnabledFor (Level.DEBUG))
      return;

        this.logger.log (Level.DEBUG, new StringBuffer ().append (strClassName)
                .append (strMethodName).append (strMsg));

    }

Output:
Log4J : 161 ms
ApplicationLogger: 1050 ms
 Difference in code is very small:
  • Log4J - there is method isDebugEnabled() before each debug line;
  • ApplicationLogger - we must put some additional params: information about method/class, and check for log level is done in Logger method 
Of course if you will introduce isDebugEnabled method in your ApplicationLogger - that will speed up solution, however you will finish with rewriting all Log4J methods.
Second thought is: if you want different formatting of your log ... I'm pretty sure you can do this by proper configuration of your logger.
See: http://logging.apache.org/log4j/
And new performance improvements to Log4J 2.x.

Datafusion Comet

Hi! Recently I moved to Rust and working on several projects - more insights to come ... one of them was Datafusion - an extremely fast S...