Thursday, December 19, 2013

3 easy step to test your website with Selenium from Java.

In this post I want to cover how quickly test website with Selenium.
In my scenario I had issue on some prod server where after some time we get OOM (OutOfMemory), server is .net IIS, after few quick manual test we was not able to reproduce issue on any our test env. So then Selenium comes to play.

Step one: download.

Selenium IDE is written on top of Firefox - it is Firefox plugin. Download it from:
You need 2 pieces to play: IDE and server. When you install IDE, you'll be ask to restart browser, then you'd see in the tools menu "Selenium IDE".

Step two: record & replay.

You're ready to play - just create your test case, click on "Record" and open the page that you want to play with.
Finish of that is script containing steps that you was doing on page. Replay them to make sure that everything is working. If you'll have error that server is not ready: go to command line and start selenium server: java -jar selenium-server-standalone-{version}.jar 
My advice is to do this step couple of time to get familiar how Selenium is working / how is recording steps-clicks, and how actually replay is going through your website - also it looks cool ;-).
When you finish - save it. You also can export it - in my case I export it to jUnit4 format.

Step three: jUnit.

Run from your development IDE (eclipse/netbeans/intelliJ/..).
Create new project in eclipse, then copy body of generated jUnit from previous step. To run test in eclipse you need libraries, you can download them from same page:  http://docs.seleniumhq.org/download/. Just add them to your project dependency... and you ready to rock.

You can also use this example maven project:

  <modelVersion>4.0.0</modelVersion>
  <groupId>com.yarenty.uitest</groupId>
  <artifactId>selenium</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
   <dependency>
        <groupId>org.seleniumhq.selenium</groupId>
        <artifactId>selenium-firefox-driver</artifactId>
        <version>2.38.0</version>
    </dependency>
  </dependencies>
</project>


How to make concurrent calls? use Executors from my post: http://yarenty.blogspot.com/2012/01/template-multithreading-solution-using.html

       public void start(){
                ExecutorService executor = Executors.newFixedThreadPool(CONCURRENT_USERS);
                  List<Future<Long>> list = new ArrayList<Future<Long>>();
                  for (int i = 0; i < NUMBER_OF_USER_SESSIONS*CONCURRENT_USERS; i++) {
                    Callable<Long> worker = new FirefoxWorker(i);
                    Future<Long> submit = executor.submit(worker);
                    list.add(submit);
                  }
                  // now retrieve the result
                  for (Future<Long> future : list) {
                    try {
                      future.get();
                    } catch (InterruptedException e) {
                      e.printStackTrace();
                    } catch (ExecutionException e) {
                      e.printStackTrace();
                    }
                  }
                  executor.shutdown();
       }

CONCURRENT_USERS - is what is the number of users accessing webpages at single time.
NUMBER_OF_USER_SESSIONS - is how many interactions do you want to have.
Overall number of "visits" is multiplicity of both above.

In my solution I created user_id adding to standard user "i" - next number of execution. You may think about different solution - Hopefully I can present you how to incorporate Disruptor with defined handlers, ... soon ;-)

Friday, December 6, 2013

Java: Code review tools

Rietveld - code review tool running on Google App Engine, for use with Subversion .

Gerrit2 - Java solution for use with GIT.

TODO:

http://phabricator.org/

http://www.reviewboard.org/


Rietveld - code review tool running on Google App Engine, for use with Subversion .

Rietveld is in common use by many open source projects, facilitating their peer reviews much as Mondrian does for Google employees. Unlike Mondrian and the Google Perforce triggers, Rietveld is strictly advisory and does not enforce peer-review prior to submission.

Git is a distributed version control system, wherein each repository is assumed to be owned/maintained by a single user. There are no inherit security controls built into Git, so the ability to read from or write to a repository is controlled entirely by the host's filesystem access controls. When multiple maintainers collaborate on a single shared repository a high degree of trust is required, as any collaborator with write access can alter the repository.

Gitosis provides tools to secure centralized Git repositories, permitting multiple maintainers to manage the same project at once, by restricting the access to only over a secure network protocol, much like Perforce secures a repository by only permitting access over its network port.

Gerrit Code Review started as a simple set of patches to Rietveld, and was originally built to service AOSP. This quickly turned into a fork as we added access control features that Guido van Rossum did not want to see complicating the Rietveld code base. As the functionality and code were starting to become drastically different, a different name was needed.

Gerrit2 is a complete rewrite of the Gerrit fork, completely changing the implementation from Python on Google App Engine, to Java on a J2EE servlet container and a SQL database.


Tuesday, November 19, 2013

Design documents how-to

The hardest part of writing a design document has nothing to do with the writing.
The difficult part is working through a logical design before you get to coding.
Once you have a vision of how the objects and entities are arranged, writing the details is easy.
The positive difference that spending a week on this task can make is unbelievably rewarding in the end.

As the adage goes, “If you fail to plan, then you plan to fail.”


Friday, October 18, 2013

Batch processing strategies

To help design and implement batch systems, basic batch application building blocks and patterns should be provided to the designers and programmers in form of sample structure charts and code shells. When starting to design a batch job, the business logic should be decomposed into a series of steps which can be implemented using the following standard building blocks:

  • Conversion Applications: For each type of file supplied by or generated to an external system, a conversion application will need to be created to convert the transaction records supplied into a standard format required for processing. This type of batch application can partly or entirely consist of translation utility modules (see Basic Batch Services).
  • Validation Applications: Validation applications ensure that all input/output records are correct and consistent. Validation is typically based on file headers and trailers, checksums and validation algorithms as well as record level cross-checks.
  • Extract Applications: An application that reads a set of records from a database or input file, selects records based on predefined rules, and writes the records to an output file.
  • Extract/Update Applications: An application that reads records from a database or an input file, and makes changes to a database or an output file driven by the data found in each input record.
  • Processing and Updating Applications: An application that performs processing on input transactions from an extract or a validation application. The processing will usually involve reading a database to obtain data required for processing, potentially updating the database and creating records for output processing.
  • Output/Format Applications: Applications reading an input file, restructures data from this record according to a standard format, and produces an output file for printing or transmission to another program or system.
Additionally a basic application shell should be provided for business logic that cannot be built using the previously mentioned building blocks.
In addition to the main building blocks, each application may use one or more of standard utility steps, such as:
  • Sort - A Program that reads an input file and produces an output file where records have been re-sequenced according to a sort key field in the records. Sorts are usually performed by standard system utilities.
  • Split - A program that reads a single input file, and writes each record to one of several output files based on a field value. Splits can be tailored or performed by parameter-driven standard system utilities.
  • Merge - A program that reads records from multiple input files and produces one output file with combined data from the input files. Merges can be tailored or performed by parameter-driven standard system utilities.
Batch applications can additionally be categorized by their input source:
  • Database-driven applications are driven by rows or values retrieved from the database.
  • File-driven applications are driven by records or values retrieved from a file.
  • Message-driven applications are driven by messages retrieved from a message queue.
The foundation of any batch system is the processing strategy. Factors affecting the selection of the strategy include: estimated batch system volume, concurrency with on-line or with another batch systems, available batch windows (and with more enterprises wanting to be up and running 24x7, this leaves no obvious batch windows).
Typical processing options for batch are:
  • Normal processing in a batch window during off-line
  • Concurrent batch / on-line processing
  • Parallel processing of many different batch runs or jobs at the same time
  • Partitioning (i.e. processing of many instances of the same job at the same time)
  • A combination of these
The order in the list above reflects the implementation complexity, processing in a batch window being the easiest and partitioning the most complex to implement.
Some or all of these options may be supported by a commercial scheduler.
In the following section these processing options are discussed in more detail. It is important to notice that the commit and locking strategy adopted by batch processes will be dependent on the type of processing performed, and as a rule of thumb and the on-line locking strategy should also use the same principles. Therefore, the batch architecture cannot be simply an afterthought when designing an overall architecture.
The locking strategy can use only normal database locks, or an additional custom locking service can be implemented in the architecture. The locking service would track database locking (for example by storing the necessary information in a dedicated db-table) and give or deny permissions to the application programs requesting a db operation. Retry logic could also be implemented by this architecture to avoid aborting a batch job in case of a lock situation.

1. Normal processing in a batch window
For simple batch processes running in a separate batch window, where the data being updated is not required by on-line users or other batch processes, concurrency is not an issue and a single commit can be done at the end of the batch run.
In most cases a more robust approach is more appropriate. A thing to keep in mind is that batch systems have a tendency to grow as time goes by, both in terms of complexity and the data volumes they will handle. If no locking strategy is in place and the system still relies on a single commit point, modifying the batch programs can be painful. Therefore, even with the simplest batch systems, consider the need for commit logic for restart-recovery options as well as the information concerning the more complex cases below.

2. Concurrent batch / on-line processing
Batch applications processing data that can simultaneously be updated by on-line users, should not lock any data (either in the database or in files) which could be required by on-line users for more than a few seconds. Also updates should be committed to the database at the end of every few transaction. This minimizes the portion of data that is unavailable to other processes and the elapsed time the data is unavailable.
Another option to minimize physical locking is to have a logical row-level locking implemented using either an Optimistic Locking Pattern or a Pessimistic Locking Pattern.
  • Optimistic locking assumes a low likelihood of record contention. It typically means inserting a timestamp column in each database table used concurrently by both batch and on-line processing. When an application fetches a row for processing, it also fetches the timestamp. As the application then tries to update the processed row, the update uses the original timestamp in the WHERE clause. If the timestamp matches, the data and the timestamp will be updated successfully. If the timestamp does not match, this indicates that another application has updated the same row between the fetch and the update attempt and therefore the update cannot be performed.
  • Pessimistic locking is any locking strategy that assumes there is a high likelihood of record contention and therefore either a physical or logical lock needs to be obtained at retrieval time. One type of pessimistic logical locking uses a dedicated lock-column in the database table. When an application retrieves the row for update, it sets a flag in the lock column. With the flag in place, other applications attempting to retrieve the same row will logically fail. When the application that set the flag updates the row, it also clears the flag, enabling the row to be retrieved by other applications. Please note, that the integrity of data must be maintained also between the initial fetch and the setting of the flag, for example by using db locks (e.g., SELECT FOR UPDATE). Note also that this method suffers from the same downside as physical locking except that it is somewhat easier to manage building a time-out mechanism that will get the lock released if the user goes to lunch while the record is locked.
These patterns are not necessarily suitable for batch processing, but they might be used for concurrent batch and on-line processing (e.g. in cases where the database doesn't support row-level locking). As a general rule, optimistic locking is more suitable for on-line applications, while pessimistic locking is more suitable for batch applications. Whenever logical locking is used, the same scheme must be used for all applications accessing data entities protected by logical locks.
Note that both of these solutions only address locking a single record. Often we may need to lock a logically related group of records. With physical locks, you have to manage these very carefully in order to avoid potential deadlocks. With logical locks, it is usually best to build a logical lock manager that understands the logical record groups you want to protect and can ensure that locks are coherent and non-deadlocking. This logical lock manager usually uses its own tables for lock management, contention reporting, time-out mechanism, etc.

3. Parallel Processing
Parallel processing allows multiple batch runs / jobs to run in parallel to minimize the total elapsed batch processing time. This is not a problem as long as the jobs are not sharing the same files, db-tables or index spaces. If they do, this service should be implemented using partitioned data. Another option is to build an architecture module for maintaining interdependencies using a control table. A control table should contain a row for each shared resource and whether it is in use by an application or not. The batch architecture or the application in a parallel job would then retrieve information from that table to determine if it can get access to the resource it needs or not.
If the data access is not a problem, parallel processing can be implemented through the use of additional threads to process in parallel. In the mainframe environment, parallel job classes have traditionally been used, in order to ensure adequate CPU time for all the processes. Regardless, the solution has to be robust enough to ensure time slices for all the running processes.
Other key issues in parallel processing include load balancing and the availability of general system resources such as files, database buffer pools etc. Also note that the control table itself can easily become a critical resource.

4. Partitioning
Using partitioning allows multiple versions of large batch applications to run concurrently. The purpose of this is to reduce the elapsed time required to process long batch jobs. Processes which can be successfully partitioned are those where the input file can be split and/or the main database tables partitioned to allow the application to run against different sets of data.
In addition, processes which are partitioned must be designed to only process their assigned data set. A partitioning architecture has to be closely tied to the database design and the database partitioning strategy. Please note, that the database partitioning doesn't necessarily mean physical partitioning of the database, although in most cases this is advisable.
The architecture should be flexible enough to allow dynamic configuration of the number of partitions. Both automatic and user controlled configuration should be considered. Automatic configuration may be based on parameters such as the input file size and/or the number of input records.

4.1 Partitioning Approaches
The following lists some of the possible partitioning approaches. Selecting a partitioning approach has to be done on a case-by-case basis.

1. Fixed and Even Break-Up of Record Set
This involves breaking the input record set into an even number of portions (e.g. 10, where each portion will have exactly 1/10th of the entire record set). Each portion is then processed by one instance of the batch/extract application.
In order to use this approach, preprocessing will be required to split the recordset up. The result of this split will be a lower and upper bound placement number which can be used as input to the batch/extract application in order to restrict its processing to its portion alone.
Preprocessing could be a large overhead as it has to calculate and determine the bounds of each portion of the record set.

2. Breakup by a Key Column
This involves breaking up the input record set by a key column such as a location code, and assigning data from each key to a batch instance. In order to achieve this, column values can either be

3. Assigned to a batch instance via a partitioning table (see below for details).
4. Assigned to a batch instance by a portion of the value (e.g. values 0000-0999, 1000 - 1999, etc.)
Under option 1, addition of new values will mean a manual reconfiguration of the batch/extract to ensure that the new value is added to a particular instance.
Under option 2, this will ensure that all values are covered via an instance of the batch job. However, the number of values processed by one instance is dependent on the distribution of column values (i.e. there may be a large number of locations in the 0000-0999 range, and few in the 1000-1999 range). Under this option, the data range should be designed with partitioning in mind.
Under both options, the optimal even distribution of records to batch instances cannot be realized. There is no dynamic configuration of the number of batch instances used.

5. Breakup by Views
This approach is basically breakup by a key column, but on the database level. It involves breaking up the recordset into views. These views will be used by each instance of the batch application during its processing. The breakup will be done by grouping the data.
With this option, each instance of a batch application will have to be configured to hit a particular view (instead of the master table). Also, with the addition of new data values, this new group of data will have to be included into a view. There is no dynamic configuration capability, as a change in the number of instances will result in a change to the views.

6. Addition of a Processing Indicator
This involves the addition of a new column to the input table, which acts as an indicator. As a preprocessing step, all indicators would be marked to non-processed. During the record fetch stage of the batch application, records are read on the condition that that record is marked non-processed, and once they are read (with lock), they are marked processing. When that record is completed, the indicator is updated to either complete or error. Many instances of a batch application can be started without a change, as the additional column ensures that a record is only processed once.
With this option, I/O on the table increases dynamically. In the case of an updating batch application, this impact is reduced, as a write will have to occur anyway.

7. Extract Table to a Flat File
This involves the extraction of the table into a file. This file can then be split into multiple segments and used as input to the batch instances.
With this option, the additional overhead of extracting the table into a file, and splitting it, may cancel out the effect of multi-partitioning. Dynamic configuration can be achieved via changing the file splitting script.

8. Use of a Hashing Column
This scheme involves the addition of a hash column (key/index) to the database tables used to retrieve the driver record. This hash column will have an indicator to determine which instance of the batch application will process this particular row. For example, if there are three batch instances to be started, then an indicator of 'A' will mark that row for processing by instance 1, an indicator of 'B' will mark that row for processing by instance 2, etc.
The procedure used to retrieve the records would then have an additional WHERE clause to select all rows marked by a particular indicator. The inserts in this table would involve the addition of the marker field, which would be defaulted to one of the instances (e.g. 'A').
A simple batch application would be used to update the indicators such as to redistribute the load between the different instances. When a sufficiently large number of new rows have been added, this batch can be run (anytime, except in the batch window) to redistribute the new rows to other instances.
Additional instances of the batch application only require the running of the batch application as above to redistribute the indicators to cater for a new number of instances.

4.2 Database and Application design Principles
An architecture that supports multi-partitioned applications which run against partitioned database tables using the key column approach, should include a central partition repository for storing partition parameters. This provides flexibility and ensures maintainability. The repository will generally consist of a single table known as the partition table.
Information stored in the partition table will be static and in general should be maintained by the DBA. The table should consist of one row of information for each partition of a multi-partitioned application. The table should have columns for: Program ID Code, Partition Number (Logical ID of the partition), Low Value of the db key column for this partition, High Value of the db key column for this partition.
On program start-up the program id and partition number should be passed to the application from the architecture (Control Processing Tasklet). These variables are used to read the partition table, to determine what range of data the application is to process (if a key column approach is used). In addition the partition number must be used throughout the processing to:
  • Add to the output files/database updates in order for the merge process to work properly
  • Report normal processing to the batch log and any errors that occur during execution to the architecture error handler
4.3 Minimizing Deadlocks When applications run in parallel or partitioned, contention in database resources and deadlocks may occur. It is critical that the database design team eliminates potential contention situations as far as possible as part of the database design.
Also ensure that the database index tables are designed with deadlock prevention and performance in mind.
Deadlocks or hot spots often occur in administration or architecture tables such as log tables, control tables, and lock tables. The implications of these should be taken into account as well. A realistic stress test is crucial for identifying the possible bottlenecks in the architecture.
To minimize the impact of conflicts on data, the architecture should provide services such as wait-and-retry intervals when attaching to a database or when encountering a deadlock. This means a built-in mechanism to react to certain database return codes and instead of issuing an immediate error handling, waiting a predetermined amount of time and retrying the database operation.

4.4 Parameter Passing and Validation
The partition architecture should be relatively transparent to application developers. The architecture should perform all tasks associated with running the application in a partitioned mode including:
  • Retrieve partition parameters before application start-up
  • Validate partition parameters before application start-up
  • Pass parameters to application at start-up
The validation should include checks to ensure that:
  • the application has sufficient partitions to cover the whole data range
  • there are no gaps between partitions
If the database is partitioned, some additional validation may be necessary to ensure that a single partition does not span database partitions.
Also the architecture should take into consideration the consolidation of partitions. Key questions include:
  • Must all the partitions be finished before going into the next job step?
  • What happens if one of the partitions aborts?

Wednesday, October 2, 2013

Java: Best Practices for Exception Handling

We as programmers want to write quality code that solves problems. Unfortunately, exceptions come as side effects of our code. No one likes side effects, so we soon find our own ways to get around them.


  • Throw exceptions when the method cannot handle the exception, and more importantly, should be handled by the caller. A good example of this happens to present in the Servlet API - doGet() and doPost() throw ServletException or IOException in certain circumstances where the request could not be read correctly. Neither of these methods are in a position to handle the exception, but the container is (which results in the 50x error page in most cases).
  •  Bubble the exception if the method cannot handle it. This is a corollary of the above, but applicable to methods that must catch the exception. If the caught exception cannot be handled correctly by the method, then it is preferable to bubble it.
  •  Throw the exception right away. This might sound vague, but if an exception scenario is encountered, then it is a good practice to throw an exception indicating the original point of failure, instead of attempting to handle the failure via error codes, until a point deemed suitable for throwing the exception. In other words, attempt to minimize mixing exception handling with error handling.
  • Either log the exception or bubble it, but don't do both. Logging an exception often indicates that the exception stack has been completely unwound, indicating that no further bubbling of the exception has occurred. Hence, it is not recommended to do both at the same time, as it often leads to a frustrating experience in debugging.
  •  Use subclasses of java.lang.Exception (checked exceptions), when you except the caller to handle the exception. This results in the compiler throwing an error message if the caller does not handle the exception. Beware though, this usually results in developers "swallowing" exceptions in code.
  •  Use subclasses of java.lang.RuntimeException (unchecked exceptions) to signal programming errors. The exception classes that are recommended here include IllegalStateException, IllegalArgumentException, UnsupportedOperationException etc. Again, one must be careful about using exception classes like NullPointerException (almost always a bad practice to throw one).
  •  Use exception class hierarchies for communicating information about exceptions across various tiers. By implementing a hierarchy, you could generalize the exception handling behavior in the caller. For example, you could use a root exception like DomainException which has several subclasses like InvalidCustomerException, InvalidProductException etc. The caveat here is that your exception hierarchy can explode very quickly if you represent each separate exceptional scenario as a separate exception.
  • Avoid catching exceptions you cannot handle. Pretty obvious, but a lot of developers attempt to catch java.lang.Exception or java.lang.Throwable. Since all subclassed exceptions can be caught, the runtime behavior of the application can often be vague when "global" exception classes are caught. After all, one wouldn't want to catch OutOfMemoryError - how should one handle such an exception?
  • Wrap exceptions with care. Rethrowing an exception resets the exception stack. Unless the original cause has been provided to the new exception object, it is lost forever. In order to preserve the exception stack, one will have to provide the original exception object to the new exception's constructor.
  •  Convert checked exceptions into unchecked ones only when required. When wrapping an exception, it is possible to wrap a checked exception and throw an unchecked one. This is useful in certain cases, especially when the intention is to abort the currently executing thread. However, in other scenarios this can cause a bit of pain, for the compiler checks are not performed. Therefore, adapting a checked exception as an unchecked one is not meant to be done blindly.




We as programmers want to write quality code that solves problems. Unfortunately, exceptions come as side effects of our code. No one likes side effects, so we soon find our own ways to get around them. I have seen some smart programmers deal with exceptions the following way:

public void consumeAndForgetAllExceptions(){
    try {
        ...some code that throws exceptions
    } catch (Exception ex){
        ex.printStacktrace();
    }
}

What is wrong with the code above?
Once an exception is thrown, normal program execution is suspended and control is transferred to the catch block. The catch block catches the exception and just suppresses it. Execution of the program continues after the catch block, as if nothing had happened.
How about the following?
public void someMethod() throws Exception{
}

This method is a blank one; it does not have any code in it. How can a blank method throw exceptions? Java does not stop you from doing this. Recently, I came across similar code where the method was declared to throw exceptions, but there was no code that actually generated that exception. When I asked the programmer, he replied "I know, it is corrupting the API, but I am used to doing it and it works."
It took the C++ community several years to decide on how to use exceptions. This debate has just started in the Java community. I have seen several Java programmers struggle with the use of exceptions. If not used correctly, exceptions can slow down your program, as it takes memory and CPU power to create, throw, and catch exceptions. If overused, they make the code difficult to read and frustrating for the programmers using the API. We all know frustrations lead to hacks and code smells. The client code may circumvent the issue by just ignoring exceptions or throwing them, as in the previous two examples.

The Nature of Exceptions

Broadly speaking, there are three different situations that cause exceptions to be thrown:

  • Exceptions due to programming errors: In this category, exceptions are generated due to programming errors (e.g., NullPointerException and IllegalArgumentException). The client code usually cannot do anything about programming errors.
  •  Exceptions due to client code errors: Client code attempts something not allowed by the API, and thereby violates its contract. The client can take some alternative course of action, if there is useful information provided in the exception. For example: an exception is thrown while parsing an XML document that is not well-formed. The exception contains useful information about the location in the XML document that causes the problem. The client can use this information to take recovery steps.
  • Exceptions due to resource failures: Exceptions that get generated when resources fail. For example: the system runs out of memory or a network connection fails. The client's response to resource failures is context-driven. The client can retry the operation after some time or just log the resource failure and bring the application to a halt.

Types of Exceptions in Java

Java defines two kinds of exceptions:

  • Checked exceptions: Exceptions that inherit from the Exception class are checked exceptions. Client code has to handle the checked exceptions thrown by the API, either in a catch clause or by forwarding it outward with the throws clause.
  •  Unchecked exceptions: RuntimeException also extends from Exception. However, all of the exceptions that inherit from RuntimeException get special treatment. There is no requirement for the client code to deal with them, and hence they are called unchecked exceptions.

I have seen heavy use of checked exceptions and minimal use of unchecked exceptions. Recently, there has been a hot debate in the Java community regarding checked exceptions and their true value. The debate stems from fact that Java seems to be the first mainstream OO language with checked exceptions. C++ and C# do not have checked exceptions at all; all exceptions in these languages are unchecked.
A checked exception thrown by a lower layer is a forced contract on the invoking layer to catch or throw it. The checked exception contract between the API and its client soon changes into an unwanted burden if the client code is unable to deal with the exception effectively. Programmers of the client code may start taking shortcuts by suppressing the exception in an empty catch block or just throwing it and, in effect, placing the burden on the client's invoker.

Checked exceptions are also accused of breaking encapsulation. Consider the following:

public List getAllAccounts() throws
    FileNotFoundException, SQLException{
    ...
}

The method getAllAccounts() throws two checked exceptions. The client of this method has to explicitly deal with the implementation-specific exceptions, even if it has no idea what file or database call has failed within getAllAccounts(), or has no business providing filesystem or database logic. Thus, the exception handling forces an inappropriately tight coupling between the method and its callers.

Best Practices for Designing the API

Having said all of this, let us now talk about how to design an API that throws exceptions properly.

1. When deciding on checked exceptions vs. unchecked exceptions, ask yourself, "What action can the client code take when the exception occurs?"


If the client can take some alternate action to recover from the exception, make it a checked exception. If the client cannot do anything useful, then make the exception unchecked. By useful, I mean taking steps to recover from the exception and not just logging the exception. To summarize:
Client's reaction when exception happens           Exception type
Client code cannot do anything                           Make it an unchecked exception
Client code will take some useful recovery          Make it a checked exception
 action based on information in exception           
Moreover, prefer unchecked exceptions for all programming errors: unchecked exceptions have the benefit of not forcing the client API to explicitly deal with them. They propagate to where you want to catch them, or they go all the way out and get reported. The Java API has many unchecked exceptions, such as NullPointerException, IllegalArgumentException, and IllegalStateException. I prefer working with standard exceptions provided in Java rather than creating my own. They make my code easy to understand and avoid increasing the memory footprint of code.

2. Preserve encapsulation.


Never let implementation-specific checked exceptions escalate to the higher layers. For example, do not propagate SQLException from data access code to the business objects layer. Business objects layer do not need to know about SQLException. You have two options:
·         Convert SQLException into another checked exception, if the client code is expected to recuperate from the exception.
·         Convert SQLException into an unchecked exception, if the client code cannot do anything about it.
Most of the time, client code cannot do anything about SQLExceptions. Do not hesitate to convert them into unchecked exceptions. Consider the following piece of code:

public void dataAccessCode(){
    try{
        ..some code that throws SQLException
    }catch(SQLException ex){
        ex.printStacktrace();
    }
}
This catch block just suppresses the exception and does nothing. The justification is that there is nothing my client could do about an SQLException. How about dealing with it in the following manner?
public void dataAccessCode(){
    try{
        ..some code that throws SQLException
    }catch(SQLException ex){
        throw new RuntimeException(ex);
    }
}

This converts SQLException to RuntimeException. If SQLException occurs, the catch clause throws a new RuntimeException. The execution thread is suspended and the exception gets reported. However, I am not corrupting my business object layer with unnecessary exception handling, especially since it cannot do anything about an SQLException. If my catch needs the root exception cause, I can make use of the getCause() method available in all exception classes as of JDK1.4.
If you are confident that the business layer can take some recovery action when SQLException occurs, you can convert it into a more meaningful checked exception. But I have found that just throwing RuntimeException suffices most of the time.

3. Try not to create new custom exceptions if they do not have useful information for client code.


What is wrong with following code?
public class DuplicateUsernameException
    extends Exception {}

It is not giving any useful information to the client code, other than an indicative exception name. Do not forget that Java Exception classes are like other classes, wherein you can add methods that you think the client code will invoke to get more information.
We could add useful methods to DuplicateUsernameException, such as:
public class DuplicateUsernameException
    extends Exception {
    public DuplicateUsernameException
        (String username){....}
    public String requestedUsername(){...}
    public String[] availableNames(){...}
}

The new version provides two useful methods: requestedUsername(), which returns the requested name, and availableNames(), which returns an array of available usernames similar to the one requested. The client could use these methods to inform that the requested username is not available and that other usernames are available. But if you are not going to add extra information, then just throw a standard exception:

throw new Exception("Username already taken");

Even better, if you think the client code is not going to take any action other than logging if the username is already taken, throw a unchecked exception:

throw new RuntimeException("Username already taken");
Alternatively, you can even provide a method that checks if the username is already taken.
It is worth repeating that checked exceptions are to be used in situations where the client API can take some productive action based on the information in the exception. Prefer unchecked exceptions for all programmatic errors. They make your code more readable.

4. Document exceptions.


You can use Javadoc's @throws tag to document both checked and unchecked exceptions that your API throws. However, I prefer to write unit tests to document exceptions. Tests allow me to see the exceptions in action and hence serve as documentation that can be executed. Whatever you do, have some way by which the client code can learn of the exceptions that your API throws. Here is a sample unit test that tests for IndexOutOfBoundsException:

public void testIndexOutOfBoundsException() {
    ArrayList blankList = new ArrayList();
    try {
        blankList.get(10);
        fail("Should raise an IndexOutOfBoundsException");
    } catch (IndexOutOfBoundsException success) {}
}

The code above should throw an IndexOutOfBoundsException when blankList.get(10) is invoked. If it does not, the fail("Should raise an IndexOutOfBoundsException") statement explicitly fails the test. By writing unit tests for exceptions, you not only document how the exceptions work, but also make your code robust by testing for exceptional scenarios.


Monday, September 30, 2013

Disruptor

The Google Code project does reference a technical paper on the implementation of the ring buffer, however it is a bit dry, academic and tough going for someone wanting to learn how it works. However there are some blog posts that have started to explain the internals in a more readable way. There anexplanation of ring buffer that is the core of the disruptor pattern, a description of the consumer barriers(the part related to reading from the disruptor) and some information on handling multiple producers available.


The simplest description of the Disruptor is: It is a way of sending messages between threads in the most efficient manner possible. It can be used as an alternative to a queue, but it also shares a number of features with SEDA and Actors.

Compared to Queues:

The Disruptor provides the ability pass a message onto another threads, waking it up if required (similar to a BlockingQueue). However, there are 3 distinct differences.
  1. The user of the Disruptor defines how messages are stored by extending Entry class and providing a factory to do the preallocation. This allows for either memory reuse (copying) or the Entry could contain a reference to another object. 
  2. Putting messages into the Disruptor is a 2-phase process, first a slot is claimed in the ring buffer, which provides the user with the Entry that can be filled with the appropriate data. Then the entry must be committed, this 2-phase approach is necessary to allow for the flexible use of memory mentioned above. It is the commit that makes the message visible to the consumer threads. 
  3. It is the responsibility of the consumer to keep track of the messages that have been consumed from the ring buffer. Moving this responsibility away from the ring buffer itself helped reduce the amount of write contention as each thread maintains its own counter.

Compared to Actors


The Actor model is closer the Disruptor than most other programming models, especially if you use the BatchConsumer/BatchHandler classes that are provided. These classes hide all of the complexities of maintaining the consumed sequence numbers and provide a set of simple callbacks when important events occur. However, there are a couple of subtle differences.
  1. The Disruptor uses a 1 thread - 1 consumer model, where Actors use an N:M model i.e. you can have as many actors as you like and they will be distributed across a fixed numbers of threads (generally 1 per core). 
  2. The BatchHandler interface provides an additional (and very important) callback onEndOfBatch(). This allows for slow consumers, e.g. those doing I/O to batch events together to improve throughput. It is possible to do batching in other Actor frameworks, however as nearly all other frameworks don't provide a callback at the end of the batch you need to use a timeout to determine the end of the batch, resulting in poor latency.


Compared to SEDA

LMAX built the Disruptor pattern to replace a SEDA based approach.
  1. The main improvement that it provided over SEDA was the ability to do work in parallel. To do this the Disruptor supports multi-casting messages the same messages (in the same order) to multiple consumers. This avoids the need for fork stages in the pipeline. 
  2. We also allow consumers to wait on the results of other consumers without having to put another queuing stage between them. A consumer can simply watch the sequence number of a consumer that it is dependent on. This avoids the need for join stages in pipelin


Compared to Memory Barriers

Another way to think about it is as a structured, ordered memory barrier. Where the producer barrier form the write barrier and the consumer barrier is the read barrier.
There are one or more writers. There are one or more readers. There is a line of entries, totally ordered from old to new (pictured as left to right). Writers can add new entries on the right end. Every reader reads entries sequentially from left to right. Readers can't read past writers, obviously.
There is no concept of entry deletion. I use "reader" instead of "consumer" to avoid the image of entries being consumed. However we understand that entries on the left of the last reader become useless.
Generally readers can read concurrently and independently. However we can declare dependencies among readers. Reader dependencies can be arbitrary acyclic graph. If reader B depends on reader A, reader B can't read past reader A.
Reader dependency arises because reader A can annotate an entry, and reader B depends on that annotation. For example, A does some calculation on an entry, and stores the result in field a in the entry. A then move on, and now B can read the entry, and the value of a A stored. If reader C does not depend on A, C should not attempt to read a.
This is indeed an interesting programming model. Regardless of the performance, the model alone can benefit lots of applications.
Of course, LMAX's main goal is performance. It uses a pre-allocated ring of entries. The ring is large enough, but it's bounded so that the system will not be loaded beyond design capacity. If the ring is full, writer(s) will wait until the slowest readers advance and make room.
Entry objects are pre-allocated and live forever, to reduce garbage collection cost. We don't insert new entry objects or delete old entry objects, instead, a writer asks for a pre-existing entry, populate its fields, and notify readers. This apparent 2-phase action is really simply an atomic action
setNewEntry(EntryPopulator);

interface EntryPopulator{ void populate(Entry existingEntry); }

Pre-allocating entries also means adjacent entries (very likely) locate in adjacent memory cells, and because readers read entries sequentially, this is important to utilize CPU caches.

And lots of efforts to avoid lock, CAS, even memory barrier (e.g. use a non-volatile sequence variable if there's only one writer)

For developers of readers: Different annotating readers should write to different fields, to avoid write contention. (Actually they should write to different cache lines.) An annotating reader should not touch anything that other non-dependent readers may read. This is why I say these readers annotate entries, instead of modify entries.

Martin Fowler has written an article about LMAX and the disruptor pattern, The LMAX Architecture, which may clarify it further.

Friday, August 30, 2013

Hibernate reverse engineering tool adjustments

Hibernate reverse engineering tool adjustments - how to add your own activities.


Whenever you use hibernate reveng file to generate your entities - this could be helpfull for you, I spend some time to findout how hib-tool.jar is build  and here some easy-peasy example how to adjust them. How to create some additional parameters automatically on entity mapping level.

Basically hib-tool internally uses ftl - FreeMarker Template language. Short introduction to it you could find here: http://viralpatel.net/blogs/introduction-to-freemarker-template-ftl/

This post is not about ftl but how simply you can adjust reveng tool, let's start with some easy-funny change to give you heads up:
Lets change default comment for auto generated entities, to something that will tell everybody that you was there ;-) 

As tool is responsible for generating pojo classes from tables existing in DB, file that we want to change will be:
PojoTypeDeclaration.ftl

/**
${pojo.getClassJavaDoc(pojo.getDeclarationName() + " generated by hbm2java a little bit modified by yarenty.", 0)}
*/
<#include "Ejb3TypeDeclaration.ftl"/>
${pojo.getClassModifiers()} ${pojo.getDeclarationType()} ${pojo.getDeclarationName()} ${pojo.getExtendsDeclaration()} ${pojo.getImplementsDeclaration()}

Example output: ;-)

/**
* Model generated by hbm2java a little bit modified by yarenty. */

Real life example: 

We want to add property on the column level to make sure that this column will not be update-able by hibernate [immutable on DB level].
So if you look into reveng.xml file we would like to add attribute here, lets introduce it as "update-disabled".

<table name="out_model" class="com.yarenty.core.persistence.entities.out.Model" schema="dev" catalog="out_model_dev">
<meta attribute="extra-import">javax.persistence.EntityListeners</meta>
<meta attribute="class-description">
Example Model.
@author Automatic Seam Generator (updated by yarenty)
</meta>
<meta attribute="scope-class">
@EntityListeners(com.yarenty.core.intercept.EntityListener.class)
public
</meta>
<primary-key><generator class="com.yarenty.core.persistence.hibernate.id.InformixSequenceGenerator"><param name="sequence_name">market_sequence</param><param name="increment_size">10</param></generator></primary-key> <column name="result">
<meta attribute="update-disabled"/></column>
</table> 

OK, we wrote it but real thing is: how to make it work?
We need to change two files - one responsible for processing xml, second for processing DB (hbm).
In file: pojo/Ejb3PropertyGetAnnotation.ftl we need to add another property


<#if ejb3>
<#if pojo.hasIdentifierProperty()>
<#if property.equals(clazz.identifierProperty)>
${pojo.generateAnnIdGenerator()}
<#-- if this is the id property (getter)-->
<#-- explicitly set the column name for this property-->
</#if>
</#if>
<#if pojo.hasMetaAttribute(property, "update-disabled")>
${property.setUpdateable(false)}
</#if>
[...]

And now in file: hbm/property.hbm.ftl when output java class will be created we need to add our output text that we want to see: 


<property
name="${property.name}"
type="${property.value.typeName}"
<#if !property.updateable> update="false"</#if> <#if !property.insertable> insert="false"</#if> <#if !property.basicPropertyAccessor>
access="${property.propertyAccessorName}"
</#if>

So our output, will be:
  

@Column(name = "result", updatable = false, length = 1) public Character getResult() { return this.result; }

TIP: As you can see - highlighted in yellowish - there is quite similar method to avoid insertion of column.



Friday, June 21, 2013

Effective Javadoc

Effective Javadoc Documentation Illustrated in Familiar Projects
Projects which provide good examples of effective Javadoc documentation practices

1. Advertising Ultimate Demise of Deprecated Method (Guava)

The current version of Guava (Release 10) provides some good examples of more informative statements of deprecation. The next example shows the @deprecated text for methods Files.deleteDirectoryContents(File) and Files.deleteRecursively(File). In both methods' cases, the documentation states why the method is deprecated and states when it is envisioned that the method will be removed (Release 11 in these cases). It is extremely good idea of stating in the deprecation statement when the deprecated thing is going away. It is easy to learn to ignore @deprecated and @Deprecated if one believes they are really never going to go away. Stating a planned removal version or date implies more urgency in not using deprecated features and provides fair warning to users.

deleteDirectoryContents

@Deprecated
public static void deleteDirectoryContents(File directory)
                                    throws IOException
Deprecated. 
Deprecated. This method suffers from poor symlink detection and race conditions. This functionality can be supported suitably only by shelling out to an operating system command such as rm -rf or del /s. This method is scheduled to be removed from Guava in Guava release 11.0.
Deletes all the files within a directory. Does not delete the directory itself.

If the file argument is a symbolic link or there is a symbolic link in the path leading to the directory, this method will do nothing. Symbolic links within the directory are not followed.

Parameters:
directory - the directory to delete the contents of
Throws:
IllegalArgumentException - if the argument is not a directory
IOException - if an I/O error occurs


deleteRecursively

@Deprecated
public static void deleteRecursively(File file)
                              throws IOException
Deprecated. 
Deprecated. This method suffers from poor symlink detection and race conditions. This functionality can be supported suitably only by shelling out to an operating system command such as rm -rf or del /s. This method is scheduled to be removed from Guava in Guava release 11.0.
Deletes a file or directory and all contents recursively.

If the file argument is a symbolic link the link will be deleted but not the target of the link. If the argument is a directory, symbolic links within the directory will not be followed.

Parameters:
file - the file to delete
Throws:
IOException - if an I/O error occurs

See more: http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/index.html
Although the source code for each of these methods employs the @Deprecated annotation, the code from both cases does not specify this text with Javadoc's @deprecated, but instead simply specifies the deprecation details as part of the normal method description text with bold tags around the word "Deprecated."

2. Documenting Use of an API (Java SE, Java EE, Guava, Joda Time)

When learning how to use a new API, it is helpful when the Javadoc documentation provides examples of using that API. Good example of learning how to marshal and unmarshal JAXB objects by reading the Javadoc documentation for Marshaller and Unmarshaller respectively. Both of these classes take advantage of class-level documentation to describe how to use the class's APIs.

javax.xml.bind 
Interface Marshaller

All Known Implementing Classes:
AbstractMarshallerImpl
public interface Marshaller
The Marshaller class is responsible for governing the process of serializing Java content trees back into XML data. It provides the basic marshalling methods:

Assume the following setup code for all following code fragments:

       JAXBContext jc = JAXBContext.newInstance( "com.example.foo" );
       Unmarshaller u = jc.createUnmarshaller();
       Object element = u.unmarshal( new File( "foo.xml" ) );
       Marshaller m = jc.createMarshaller();
    
Marshalling to a File:

       OutputStream os = new FileOutputStream( "nosferatu.xml" );
       m.marshal( element, os );
    
Marshalling to a SAX ContentHandler:

       // assume MyContentHandler instanceof ContentHandler
       m.marshal( element, new MyContentHandler() );  
    
Marshalling to a DOM Node:

       DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
       dbf.setNamespaceAware(true);
       DocumentBuilder db = dbf.newDocumentBuilder();
       Document doc = db.newDocument();

       m.marshal( element, doc );
    
Marshalling to a java.io.OutputStream:

       m.marshal( element, System.out );
    
Marshalling to a java.io.Writer:

       m.marshal( element, new PrintWriter( System.out ) );
    
Marshalling to a javax.xml.transform.SAXResult:

       // assume MyContentHandler instanceof ContentHandler
       SAXResult result = new SAXResult( new MyContentHandler() );

       m.marshal( element, result );
    
Marshalling to a javax.xml.transform.DOMResult:

       DOMResult result = new DOMResult();
       
       m.marshal( element, result );
    
Marshalling to a javax.xml.transform.StreamResult:

       StreamResult result = new StreamResult( System.out );
 
       m.marshal( element, result );
    
Marshalling to a javax.xml.stream.XMLStreamWriter:

       XMLStreamWriter xmlStreamWriter = 
           XMLOutputFactory.newInstance().createXMLStreamWriter( ... );
 
       m.marshal( element, xmlStreamWriter );
    
Marshalling to a javax.xml.stream.XMLEventWriter:

       XMLEventWriter xmlEventWriter = 
           XMLOutputFactory.newInstance().createXMLEventWriter( ... );
 
       m.marshal( element, xmlEventWriter );
    
Marshalling content tree rooted by a JAXB element
The first parameter of the overloaded Marshaller.marshal(java.lang.Object, ...) methods must be a JAXB element as computed by JAXBIntrospector#isElement(java.lang.Object); otherwise, a Marshaller.marshal method must throw a MarshalException. There exist two mechanisms to enable marshalling an instance that is not a JAXB element. One method is to wrap the instance as a value of a JAXBElement, and pass the wrapper element as the first parameter to a Marshaller.marshal method. For java to schema binding, it is also possible to simply annotate the instance's class with @XmlRootElement.
Encoding
By default, the Marshaller will use UTF-8 encoding when generating XML data to a java.io.OutputStream, or a java.io.Writer. Use the setProperty API to change the output encoding used during these marshal operations. Client applications are expected to supply a valid character encoding name as defined in the W3C XML 1.0 Recommendation and supported by your Java Platform.
Validation and Well-Formedness
Client applications are not required to validate the Java content tree prior to calling any of the marshal API's. Furthermore, there is no requirement that the Java content tree be valid with respect to its original schema in order to marshal it back into XML data. Different JAXB Providers will support marshalling invalid Java content trees at varying levels, however all JAXB Providers must be able to marshal a valid content tree back to XML data. A JAXB Provider must throw a MarshalException when it is unable to complete the marshal operation due to invalid content. Some JAXB Providers will fully allow marshalling invalid content, others will fail on the first validation error.

Even when schema validation is not explictly enabled for the marshal operation, it is possible that certain types of validation events will be detected during the operation. Validation events will be reported to the registered event handler. If the client application has not registered an event handler prior to invoking one of the marshal API's, then events will be delivered to a default event handler which will terminate the marshal operation after encountering the first error or fatal error. Note that for JAXB 2.0 and later versions, DefaultValidationEventHandler is no longer used.
[...]

 See more: http://docs.oracle.com/javaee/6/api/
Guava's class-level description for Stopwatch shows how to use most of that class's features in a concise and easily understandable class usage description.

com.google.common.base
Class Stopwatch
 java.lang.Object

 com.google.common.base.Stopwatch
________________________________________

@Beta
@GwtCompatible(emulated=true)
public final class Stopwatch
extends Object
An object that measures elapsed time in nanoseconds. It is useful to measure elapsed time using this class instead of direct calls toSystem.nanoTime() for a few reasons:
• An alternate time source can be substituted, for testing or performance reasons.
• As documented by nanoTime, the value returned has no absolute meaning, and can only be interpreted as relative to another timestamp returned by nanoTime at a different time. Stopwatch is a more effective abstraction because it exposes only these relative values, not the absolute ones.
Basic usage:
   Stopwatch stopwatch = new Stopwatch().start();
   doSomething();
   stopwatch.stop(); // optional

   long millis = stopwatch.elapsed(MILLISECONDS);

   log.info("that took: " + stopwatch); // formatted string like "12.3 ms"
 
Stopwatch methods are not idempotent; it is an error to start or stop a stopwatch that is already in the desired state.
When testing code that uses this class, use the alternate constructor to supply a fake or mock ticker. This allows you to simulate any valid behavior of the stopwatch.
Note: This class is not thread-safe.
Since:
10.0
Author:
Kevin Bourrillion

 See more: http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/index.html
Use of an API can be documented at the method level as well as at the class level. Examples of this are Guava's Throwables.propagateIfInstanceOf method and the overloaded Throwables.propagateIfPossible methods . The Javadoc documentation for these methods shows "example usage" for each.


propagateIfInstanceOf
public static <X extends Throwable> void propagateIfInstanceOf(@Nullable
                                               Throwable throwable,
                                               Class<X> declaredType)
                                  throws X extends Throwable
Propagates throwable exactly as-is, if and only if it is an instance of declaredType. Example usage:
   try {
     someMethodThatCouldThrowAnything();
   } catch (IKnowWhatToDoWithThisException e) {
     handle(e);
   } catch (Throwable t) {
     Throwables.propagateIfInstanceOf(t, IOException.class);
     Throwables.propagateIfInstanceOf(t, SQLException.class);
     throw Throwables.propagate(t);
   }
 
Throws:
X extends Throwable
propagateIfPossible
public static void propagateIfPossible(@Nullable
                       Throwable throwable)
Propagates throwable exactly as-is, if and only if it is an instance of RuntimeException or Error. Example usage:
   try {
     someMethodThatCouldThrowAnything();
   } catch (IKnowWhatToDoWithThisException e) {
     handle(e);
   } catch (Throwable t) {
     Throwables.propagateIfPossible(t);
     throw new RuntimeException("unexpected", t);
   }
 
propagateIfPossible
public static <X extends Throwable> void propagateIfPossible(@Nullable
                                             Throwable throwable,
                                             Class<X> declaredType)
                                throws X extends Throwable
Propagates throwable exactly as-is, if and only if it is an instance of RuntimeException, Error, or declaredType. Example usage:
   try {
     someMethodThatCouldThrowAnything();
   } catch (IKnowWhatToDoWithThisException e) {
     handle(e);
   } catch (Throwable t) {
     Throwables.propagateIfPossible(t, OtherException.class);
     throw new RuntimeException("unexpected", t);
   }
 
Parameters:
throwable - the Throwable to possibly propagate
declaredType - the single checked exception type declared by the calling method
Throws:
X extends Throwable

 See more: http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/index.html
API documentation is not limited to the class level or method level. The javax.management package-level documentation provides a nice overview of Java Management Extensions (JMX). The first sentence of the package description (which is what's always shown at top) is simple enough: "Provides the core classes for the Java Management Extensions." However, there are far more details in the rest of the package description. The next example shows a small portion of that package documentation.

Package javax.management Description

Provides the core classes for the Java Management Extensions.

The Java Management Extensions (JMXTM) API is a standard API for management and monitoring. Typical uses include:

consulting and changing application configuration
accumulating statistics about application behavior and making them available
notifying of state changes and erroneous conditions.
The JMX API can also be used as part of a solution for managing systems, networks, and so on.

The API includes remote access, so a remote management program can interact with a running application for these purposes.

MBeans

The fundamental notion of the JMX API is the MBean. An MBean is a named managed object representing a resource. It has a management interface consisting of:

named and typed attributes that can be read and/or written
named and typed operations that can be invoked
typed notifications that can be emitted by the MBean.
For example, an MBean representing an application's configuration could have attributes representing the different configuration items. Reading the CacheSize attribute would return the current value of that item. Writing it would update the item, potentially changing the behavior of the running application. An operation such as save could store the current configuration persistently. A notification such as ConfigurationChangedNotification could be sent every time the configuration is changed.

In the standard usage of the JMX API, MBeans are implemented as Java objects. However, as explained below, these objects are not usually referenced directly.

Standard MBeans

To make MBean implementation simple, the JMX API includes the notion of Standard MBeans. A Standard MBean is one whose attributes and operations are deduced from a Java interface using certain naming patterns, similar to those used by JavaBeansTM. For example, consider an interface like this:

    public interface ConfigurationMBean {
         public int getCacheSize();
         public void setCacheSize(int size);
         public long getLastChangedTime();
         public void save();
    }
[...]


Another example of a useful package-level description is the package description for Joda Time package org.joda.time. This core package describes many of the concepts applicable to the entire project in one location.

Package org.joda.time Description

Provides support for dates, times, time zones, durations, intervals, and partials. This package aims to fully replace the Java Date, Calendar, and TimeZone classes. This implementation covers both the Gregorian/Julian calendar system and the ISO8601 standard. Additional calendar systems and extensions can be created as well.

The ISO8601 standard is the international standard for dates, times, durations, and intervals. It defines text representations, the first day of the week as Monday, and the first week in a year as having a Thursday in it. This standard is being increasingly used in computer interchange and is the agreed format for XML. For most uses, the ISO standard is the same as Gregorian, and is thus the preferred format.

Interfaces

The main API concepts are defined by interfaces:

ReadableInstant - an instant in time
ReadableDateTime - an instant in time with field accessors such as dayOfWeek
ReadablePartial - a definition for local times that are not defined to the millisecond, such as the time of day
ReadableDuration - a duration defined in milliseconds
ReadablePeriod - a time period defined in fields such as hours and minutes
ReadableInterval - a period of time between two instants
ReadWritableInstant - an instant that can be modified
ReadWritableDateTime - a datetime that can be modified
ReadWritablePeriod - a time period that can be modified
ReadWritableInterval - an interval that can be modified
These define the public interface to dates, times, periods, intervals and durations. As with java.util.Date and Calendar, the design is millisecond based with an epoch of 1970-01-01. This should enable easy conversions.

Implementations

The basic implementation of the ReadableInstant interface is Instant. This is a simple immutable class that stores the millisecond value and integrates with Java Date and Calendar. The class follows the definition of the millisecond instant fully, thus it references the ISO-8601 calendar system and UTC time zone. If you are dealing with an instant in time but do not know, or do not want to specify, which calendar system it refers to, then you should use this class.

The main implementation class for datetimes is the DateTime class. This implements the ReadableDateTime interface, providing convenient methods to access the fields of the datetime. Conversion methods allow integration with the Java Date and Calendar classes.


3. Explicitly Declaring Throws Clause for Unchecked Exceptions (Guava)

Iit is best to "document all thrown exceptions" whether they are checked or unchecked. Guava's InetAddresses.forString(String) method's documentation does this, specifying that it throws the runtime exception IllegalArgumentException.

forString
public static InetAddress forString(String ipString)
Returns the InetAddress having the given string representation.
This deliberately avoids all nameservice lookups (e.g. no DNS).

Parameters:
ipString - String containing an IPv4 or IPv6 string literal, e.g. "192.168.0.1" or "2001:db8::1"
Returns:
InetAddress representing the argument
Throws:
IllegalArgumentException - if the argument is not a valid IP string literal


4. Using -linksource (JFreeChart, Guava)

For an open source project, a nice benefit that can be provided to developers using that project is to allow linking of Javadoc documentation to underlying source code. There are two examples below with the first one showing the Javadoc with link to source code annotated and the second showing the source code displayed when the class name is clicked on in the Javadoc.


com.google.common.base
Class Strings

java.lang.Object
com.google.common.base.Strings

@GwtCompatible
public final class Strings
extends Object
Static utility methods pertaining to String or CharSequence instances.
Since:
3.0
Author:
Kevin Bourrillion



037public final class Strings {
038  private Strings() {}
039
040  /**
041   * Returns the given string if it is non-null; the empty string otherwise.
042   *
043   * @param string the string to test and possibly return
044   * @return {@code string} itself if it is non-null; {@code ""} if it is null
045   */
046  public static String nullToEmpty(@Nullable String string) {
047    return (string == null) ? "" : string;
048  }
049
050  /**
051   * Returns the given string if it is nonempty; {@code null} otherwise.
052   *
053   * @param string the string to test and possibly return
054   * @return {@code string} itself if it is nonempty; {@code null} if it is
055   *     empty or null
056   */
057  public static @Nullable String emptyToNull(@Nullable String string) {
058    return isNullOrEmpty(string) ? null : string;
059  }


It is very convenient to be able to move easily between the Javadoc documentation and the source code. Of course, this can also be done in an IDE that supports Javadoc presentation in conjunction with code.

Ultimate example

 The ultimate example of JavaDoc is Mockito, where the whole documentation is concisely embedded.

Conclusion

This post has highlighted several projects who Javadoc documentation provides examples of more effective Javadoc-based documentation.


Datafusion Comet

Hi! Recently I moved to Rust and working on several projects - more insights to come ... one of them was Datafusion - an extremely fast S...