This note serves to document things I have learnt using Kettle (3.2).
What Java? Kettle does not appear to function correctly with a 64-bit JVM. It does happily run with Java 8. If you do want to run Kettle (with a 32-bit JVM) and Tomcat (with a 64-bit JVM) on the same system, this is perfectly possible. My experience is on a windows system so I configured Tomcat using C:\Program Files\Apache Software Foundation\Tomcat 7.0\bin\Tomcat7w.exe to explicitly tell it where to get its JVM (in my case C:\Program Files\Java\jre1.8.0_45\bin\server\jvm.dll) and I set the path environment variable to have C:\Program Files (x86)\Java\jre1.8.0_45\bin at the beginning.
Installation To expand on the recipe given elsewhere:
Download and unzip Kettle. From version 1.4 we are compatible with version 3 of the PDI so you can download version 3.2 - BUT not yet (at version 1.6) compatible with version 4 of the PDI.
Unzip the OpenVPMSLoader.zip file found in the import/plugin folder in the openvpms distribution. This should be unzipped into plugins/steps folder in Kettle.
NOTE: Kettle 3.2 ships with version 2.0.6 of the Spring framework, whereas the OpenVPMS Loader plugin requires Spring 3.x. Hence you need to:
1. backup your Kettle installation directory
2. remove libext/spring/spring-core.jar
3. copy the OpenVPMS jars to libext/spring:
cp plugins/steps/OpenVPMSLoader/*.jar libext/spring/
4. Start Kettle
Failure to do the above will result in a "No application context specified" error when the OpenVPMS Loader plugin starts.
Kettle is a really neat powerful tool. You may find the following will save you some time.
Log size The internal log is not of infinite size. If something goes wrong and you have an error on every row, the earlier records will be discarded. Also after an overflow situation like this, Kettle seems to be unable to fully clear the log. That is, you can clear it, and it appears to empty, but when you start a new transform/job the old stuff re-appears. To fully clear the log after an overflow, stop and restart Kettle.
External log You can specify the log file for a job or transform. If you specify it for a job, then the settings apply to all the transforms in the job (unless you provide log settings for them). Note that the records are batched out into the log - the internal log runs ahead of the external one, and in the event of a crash (most commonly a java heap overflow), you will not get all records in the external log.
Minimum logging = no errors If you set the logging level to Minimum, then if your transform has errors, these will not show in the log.
Can detach and park If you have a job that consists of a number of jobs and/or transforms done in sequence, you may wish to re-run the transform omitting some of the transforms/jobs. The easiest way to do this is to delete the hop at each end of the string you don't to run, then move this out of the way, and rejoin things so as to omit the transform/jobs you don't want to run. See below.
Always turn off Lazy Conversion Some of the input steps have a 'Lazy Conversion' checkbox. Do not tick it - using this leads to peculiar error messages.
Trim memo fields Some databases support a long text field called 'memo'. If you are processing this data, and you want to log it via the Text File Output step, make sure that you trim back its lenght. Left unaltered the field lenght is over a million characters and this will cause a heap overflow.
No registry start It is possible to have Kettle crash leaving its repository containing a screwed up transform. This may lead to Kettle crashing on start. The fix is to tick the 'No Repository' box on the startup screen. Then you can can connect to the repository once Kettle has started.
Performance Herewith tips to get the maximum performance out of Kettle:
The following points all apply to the 'Modified Java Script Value' transform step
Var's not required You only need to use "var" to define a variable if you need it in the next step. If its just an internal work variable, then don't use var.
Always turn off compatibility mode Always disable the compatibility mode. Compatibility should only be set if you have transforms from the earlier versions of Kettle.
Replace may need escape If you use the replace function, be aware that the second argument (string to look for) is in fact a regular expression, and thus that any regex meta characters will need escaping. Futhermore, since in javascript \ needs escaping, you need to double the backslash. Hence, to replace periods (.) by spaces you need replace(str,"\\."," ") and to replace backslashes themselves you need to quadruple them, ie replace(str,"\\\\","x") will replace \ by x. However, since the 'look for' string is a regex expression, you can use replace() to replace multiple characters at the same time.
Hence replace(str,"[\\-\\* ]","x") will replace every minus (-), asterisk (*), and space character by 'x'.
Regex is peculiar If you want to use the str2RegExp function, be aware that the required format of the regex string is not obvious. Firstly you MUST specify string start (^) and string end ($), and you MUST double every backslash. For example the string to look for any-chars9999 9999any-chars OR any-chars99999999any-chars (ie phone numbers) is: "^.*(\\d{4} \\d{4}).*|.*(\\d{8}).*$".
Be careful of plus Javascript uses the + operator to both add numbers and concatenate strings. It also has no specific variable typing facility, and can treat things that you think are numbers as strings. The expression (v1 + v2) may concatenate the two variables. If you need these to be treated as numbers either force the second to be converted to a number by using the Number() function [ie write the expression as (v1 + Number(v2))], or use the fact that the minus operator forces the variable to be treated as a number [ie write the expression as (-(-v1 - v2))].