Skip to content

Domesticating Talend

We’ve started working with Talend, and specifically with the ‘big data’ point-and-drag IDE. I’m reasonably happy with it, it does pretty well what it says on the box, but the ability to integrate it’s output with our product and approach is not great. The intention of the product appears to be mainly to run the ETL jobs from within the IDE, but there’s an ‘export job’ facility that dumps a ZIP file containing shell and batch scripts, some generated JARs, and all the dependencies, all bundled up for execution from the command line.

The trouble is that our use case does not match up well with this approach – we need to embed the Talend-generated code inside our service, which for us then means getting the generated JARs into our service project using Maven. The nasty bit then is immediately obvious – how do we version and deploy the Talend-generated JAR files?

My first tentative approach is going to be as follows. Step 1 is to use the Talend job export facility to export the ZIP to a standard location with a standard name. Second step is to use Maven with the following pom.xml, and invoke a standard mvn release:prepare release:perform to get a single unified JAR into our maven repository:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <parent>
    <groupId>com.somoglobal</groupId>
    <artifactId>Apptimiser</artifactId>
    <version>1.14.2</version>
    <relativePath></relativePath>
  </parent>

  <groupId>com.somoglobal.talend</groupId>
  <artifactId>PostAttribution</artifactId>
  <packaging>jar</packaging>
  <version>1.0.7-SNAPSHOT</version>
  <name>PostAttribution</name>
  <description>
    The Talend PostAttribution project packaged as a jar.
  </description>

  <scm>
    <connection>scm:svn:https://svn.somodigital.com/mobfusion/talend/PostAttribution/trunk</connection>
    <developerConnection>scm:svn:https://svn.somodigital.com/mobfusion/talend/PostAttribution/trunk</developerConnection>
    <url>https://svn.somodigital.com/mobfusion/talend/PostAttribution/trunk</url>
  </scm>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <build>
    <plugins>
      <plugin>
        <artifactId>maven-clean-plugin</artifactId>
        <version>2.5</version>
        <configuration>
        <filesets>
          <fileset>
          <directory>temp/unzip</directory>
          <includes>
            <include>**</include>
          </includes>
          <followSymlinks>false</followSymlinks>
          </fileset>
        </filesets>
        </configuration>
      </plugin>

       <plugin>
      <!-- http://evgeny-goldin.com/wiki/Copy-maven-plugin -->
        <groupId>com.github.goldin</groupId>
        <artifactId>copy-maven-plugin</artifactId>
        <version>0.2.5</version>
        <executions>
          <execution>
            <id>obtain-jars</id>
            <phase>prepare-package</phase>
            <goals>
              <goal>copy</goal>
            </goals>
            <configuration>
              <resources>
                <!-- unpack the zip when not doing release -->
                <resource>
                  <runIf>{{ new File( project.basedir, 'temp' ).isDirectory() }}</runIf>
                  <description>Unpacking Talend export</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <file>temp/newExportFolder.zip</file>
                  <zipEntries>
                    <zipEntry>**/*_0_1.jar</zipEntry>
                  </zipEntries>
                  <unpack>true</unpack>
                </resource>

                <!-- unpack the zip when not doing release:perform -->
                <resource>
                  <runIf>{{ !(new File( project.basedir, 'temp' ).isDirectory()) }}</runIf>
                  <description>Unpacking Talend export</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <file>../../temp/newExportFolder.zip</file>
                  <zipEntries>
                    <zipEntry>**/*_0_1.jar</zipEntry>
                  </zipEntries>
                  <unpack>true</unpack>
                </resource>

                <!-- unpack the jars -->
                <resource>
                  <description>Unpacking jar files</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <directory>${project.build.outputDirectory}</directory>
                  <includes>
                    <include>*.jar</include>
                  </includes>
                  <unpack>true</unpack>
                </resource>

                <!-- discard the jars -->
                <resource>
                  <description>cleaning jar files</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <directory>${project.build.outputDirectory}</directory>
                  <includes>
                    <include>*.jar</include>
                  </includes>
                  <clean>true</clean>
                </resource>
              </resources>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

Big shout out to Evgeny Goldin for his copy-maven plugin that makes this easier.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*