Assistance with Open Source adoption

Open Source News

Finding Bundle Dependencies

Liferay - Sun, 08/19/2018 - 15:23

So many times I have answered the question, "How do I find out what import packages my bundle needs?" with the tired and unsatisfactory response that uses the following process:

  1. Build your module.
  2. Deploy your module.
  3. Use Gogo to see why your bundle doesn't start, often from an Unresolved Requirement on an Import Package.
  4. Either include the lib in your jar or use the Import-Package bnd.bnd property to exclude the package.
  5. Go to  step 1, repeat until no further Unresolved Requirements are found.

Yeah, so this is really a pain, but it was the only way I knew of how to see what the imports are that your module needs.

Until Today.

Introducing Bnd

The Bnd tool (available is the heart of building an OSGi bundle.  Whether you're using Blade, Gradle, Maven, etc it doesn't matter; under the covers you are likely invoking Bnd to build the module.

Most of us don't know about Bnd only because we're lazy developers. Well okay, maybe not lazy, but we're only going to learn new tools if we have to. But if we can do our jobs without knowing every detail of the process, we're generally fine with it.

It is Bnd which is responsible for applying the directives in your bnd.bnd file and generating the bundle full of the OSGi details necessary for your bundle to deploy and run.

As it turns out, the Bnd tool knows a heck of a lot more about our bundles than we do.

To find this out, though, we need the Bnd tool installed.

Follow the instructions from on 3.1 to install the command line client.

Bnd Printing

The command line tool actually gives you a full basket of tools to play with, you can find the whole list here:

Honestly I have not really played with many of them yet.  I surely need to because there are definitely useful nuggets of gold in them there hills.

For example, the one nugget I've found so far is the print command.

I'm talking specifically about using bnd print --impexp to list imports and exports as described in section 20.3.

Turns out this command will list the imports and exports Bnd has identified for your module.

I turned this on one of my own modules to see what I would get:

bnd print --impexp build/libs/com.example.hero.rules.engine.simple-1.1.0.jar [IMPEXP] Import-Package {version=[1.0,2)} com.example.hero.rules.engine {version=[1.1,2)} com.example.hero.rules.model {version=[1.0,2)} com.example.hero.rules.service {version=[1.0,2)} com.liferay.portal.kernel.log {version=[7.0,8)} com.liferay.portal.kernel.util {version=[7.3,8)} com.liferay.portal.kernel.uuid {version=[6.2,7)} javax.xml.datatype javax.xml.namespace javax.xml.parsers org.w3c.dom org.w3c.dom.bootstrap org.xml.sax

Cool, huh? I can see that my module wants to import stuff that I have defined off in the API module, but I can also see that I'm leveraging portal-kernel as well as XML processing.

POI Portlet

One of the frequent requests is what is necessary to use POI in a portlet. Let's find out together, shall we?

So I created a simple module using Blade and use the following build.gradle file:

dependencies { compileOnly group: "com.liferay.portal", name: "com.liferay.portal.kernel", version: "2.0.0" compileOnly group: "com.liferay.portal", name: "com.liferay.util.taglib", version: "2.0.0" compileOnly group: "javax.portlet", name: "portlet-api", version: "2.0" compileOnly group: "javax.servlet", name: "javax.servlet-api", version: "3.0.1" compileOnly group: "jstl", name: "jstl", version: "1.2" compileOnly group: "org.osgi", name: "osgi.cmpn", version: "6.0.0" compileInclude group: 'org.apache.poi', name: 'poi-ooxml', version: '3.17' }

The only thing I added here was the compileInclude directive. As we all know, this will automagically include the declared dependency and some of the transitive dependencies in the bundle. But what many of us have seen, if you deploy this guy you would still get Unresolved Reference messages.

Well, using Bnd we can now see why that is:

bnd print --impexp build/libs/com.example.poi-1.0.0.jar [IMPEXP] Import-Package com.liferay.portal.kernel.portlet.bridges.mvc {version=[1.0,2)} com.sun.javadoc javax.crypto javax.crypto.spec javax.imageio javax.imageio.metadata javax.portlet {version=[2.0,3)} javax.servlet {version=[3.0,4)} javax.servlet.http {version=[3.0,4)} javax.swing javax.xml.bind javax.xml.bind.annotation javax.xml.bind.annotation.adapters javax.xml.crypto javax.xml.crypto.dom javax.xml.crypto.dsig javax.xml.crypto.dsig.dom javax.xml.crypto.dsig.keyinfo javax.xml.crypto.dsig.spec javax.xml.parsers javax.xml.transform javax.xml.transform.dom javax.xml.validation javax.xml.xpath junit.framework org.apache.commons.logging org.apache.crimson.jaxp org.apache.poi.hsmf org.apache.poi.hsmf.datatypes org.apache.poi.hsmf.extractor org.apache.poi.hwpf.extractor org.apache.xml.resolver org.bouncycastle.asn1 org.bouncycastle.asn1.cmp org.bouncycastle.asn1.nist org.bouncycastle.asn1.ocsp org.bouncycastle.asn1.x500 org.bouncycastle.asn1.x509 org.bouncycastle.cert org.bouncycastle.cert.jcajce org.bouncycastle.cert.ocsp org.bouncycastle.cms org.bouncycastle.cms.bc org.bouncycastle.operator org.bouncycastle.operator.bc org.bouncycastle.tsp org.bouncycastle.util org.etsi.uri.x01903.v14 org.junit org.junit.internal org.junit.runner org.junit.runner.notification org.openxmlformats.schemas.officeDocument.x2006.math org.openxmlformats.schemas.schemaLibrary.x2006.main org.w3c.dom org.xml.sax org.xml.sax.ext org.xml.sax.helpers Export-Package com.example.poi.constants {version=1.0.0}


Now we can see just what OSGi is going to want us to deal with. We can add the compileInclude directives for these artifacts if we want to include them, or we could mask them using the ! syntax for the bnd.bnd Import-Package directive, or even mark them as optional by listing them in the Import-Package directive with the resolution:=optional instruction, ala:

Import-Package:\ ...\*;resolution:=optional,\ ... Conclusion

During my testing, I did find that this is not the perfect solution. Not even the Bnd tool will process transitive dependencies correctly.

For example, we can see from above that BouncyCastle is imported, but what we can't see are any transitive dependencies to BouncyCastle that might be lurking if we decide to include BouncyCastle in. We would have to re-run the bnd print --impexp command again, but that will still be better than the old tired answer I used to have to give.


David H Nebinger 2018-08-19T20:23:00Z
Categories: CMS, ECM

New Sources, New Data, New Tools: Large Health Insurance Group Confronts the Future of Financial Data Management in Healthcare

Talend - Fri, 08/17/2018 - 15:42

The world of healthcare never rests and one of the largest health insurance groups in the U.S. is not alone in wanting to provide, “quality care, better patient outcomes, and more engaged consumers.”   They achieve this through careful consideration as to how they manage their financial data.  This is the backbone of providing that quality care while also managing cost.  The future of healthcare data integration involves fully employing cloud technology to manage financial information while recognizing traditional patterns. For this, Talend is this insurance company’s ETL tool of choice.

In shifting from PeopleSoft to Oracle Cloud, the company realized that they needed a stronger infrastructure when it came to managing their data and also meeting the demands of a cost-benefit analysis.  They needed the right tools and the ability to use them in a cost-efficient manner.  This is where they made the investment in upgrading from using Talend’s Open Studio to Talend’s Big Data Platform.

Ready For More? Download The Forrester Wave™: Big Data Fabric, Q2 2018 User Guide now.

Download Now The Question of Integration

The company’s financial cloud team receives a large quantity of data in over six different data types that come from a variety of sources (including in-house and external vendors).  They are handling XML, CSB, text, and mainframe files, destined for the Oracle Cloud or the mainframe in a variety of formats.  These files must be integrated and transferred between the cloud and their mainframe as inbound and outbound jobs.  The ability to blend and merge this data for sorting is necessary for report creation.

It is imperative that the company be able to sort and merge files, blending the data into a canonical format that can be handled in batch and in real-time.  Source targets vary, as do the file types that they require, and these must be able to be drawn from either the mainframe or Oracle Cloud.  This is an around-the-clock unidirectional process of communication involving a multitude of user groups in their disparate locations.

Amid all of this, they must also anticipate the demands of the future, which will escalate the number of new types of data, sources, and destinations as the company grows.  A seamless integration of new data sources and destinations will reduce, if not eliminate, downtime and loss of revenue.  Ultimately, it is departmental building with a global impact.


From Open Studio to Big Data

The company started off like many Talend users, employing Open Studio at the design stage.  It is important to note that they did not have to train in an entirely new skillset to move their infrastructure to Oracle Cloud.  They used the same skillset in the same way that the on-premises integrations had always been accomplished since Talend natively works with any cloud architecture or data landscape.  This helps companies with creating an effective prototype.  However, while Talend’s Open Studio is the most flexible tool in the open-source market, the company ultimately needed something for design execution, hence their switch to Talend’s Big Data Platform.

It was also critical that the financial cloud team was able to run their testing locally.  Fostering a DevOps culture has been critical to many IT teams because they can do their development locally.  Talend allows for local development as well as remote management.  Project development can be physically separated from the web, and project management can be handled remotely for global execution.  It can also be used anywhere and does not require a web browser, negating the need for an internet connection at this stage, and adding to the level of flexibility.

It is vital for continued development that developers do not have to depend on internet access; they should be able to work independently and from anywhere.  When the team gets together after the development stage is concluded, they can migrate the data and then upload it to another system easily.  Even managing all their tools can be done remotely.  There is limited need for an extensive infrastructure team to manage operations and this leads to further IT efficiency.


Utilizing User Groups

Within three weeks of introduction to Talend, the team from this health insurance group was competent in use.  Zero to sixty learning was achieved through CBT web-based training bolstered with in-person instruction.  The benefits of migrating to Talend are many but the company most values the Talend User Groups as a source of continuing education.

The company realized that User Groups offer a source of practical experience that no manual could ever fully embrace.  The local (Chicago) user group meetups offer in-person assistance and a wealth of practical information and best-practices.  According to the team at this company, taking advantage of the Talend User Groups is the prescription for success.

The post New Sources, New Data, New Tools: Large Health Insurance Group Confronts the Future of Financial Data Management in Healthcare appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

New lessons on Liferay University

Liferay - Fri, 08/17/2018 - 03:00

As promised less than a month ago, we're working on more content for Liferay University. Meet your new professors Charles Cohick and Dimple Koticha.

The new lessons are

As with all lessons on Liferay University, they're completely free and available after logging in with your account.

But wait, there's more...

Learn as much as you can

For a limited time, Liferay University Passport, our flat rate for each and every course on Liferay University, is available for an introductory price at almost 30% discount.  And even those courses aren't finished yet: There are more to come. So, get it while you save that much. With the one-time payment for a personal passport, even the paid courses are free and you have a full year time to take them all.

Prefer a live trainer in the room?

Of course, if you prefer to have a live trainer in the room: The regular trainings are still available, and are updated to  contain all of the courses that you find on Liferay University and Passport. And, this way (with a trainer, on- or offline) you can book courses for all of the previous versions of Liferay as well.

And, of course, the fine documentation is still available and updated to contain information about the new version already.

(Photo: CC by 2.0 Hamza Butt)

Olaf Kock 2018-08-17T08:00:00Z
Categories: CMS, ECM

CI/CD support with SnapLogic’s GitHub Cloud Integration feature

SnapLogic - Thu, 08/16/2018 - 12:32

In my previous blog post, “How to practice CI/CD the SnapLogic way,” I provided three approaches that pipeline developers and DevOps engineers can implement to support their organization’s continuous integration and continuous delivery (CI/CD) methodologies. Several approaches that are widely used today include Project Import/Export, Project Import/Export via the SnapLogic public API, and CI/CD through[...] Read the full article here.

The post CI/CD support with SnapLogic’s GitHub Cloud Integration feature appeared first on SnapLogic.

Categories: ETL

Liferay CE 7.x / Liferay DXP 7.x Java Agents

Liferay - Wed, 08/15/2018 - 10:38

A SysAdmin came up to me and said he was having issues starting Liferay DXP 7.0, a bunch of CNFEs were coming up at startup.

I found that they were set up to use Wily for monitoring their JVMs, and it was those classes that were generating CNFEs.

In general, when you add the -Djavaagent=XXX parameter onto your app server's startup command, you're enabling an agent which will have full access inside of the JVM, but only as long as the class loader hierarchy is available. The classes in the agent are injected into the highest point of the class loader hierarchy so they are normally visible across the entire app server.

Except, of course, for Liferay's OSGi container.

Liferay takes great care when creating the OSGi container to limit the "pollution" of the OSGi container's class loader to prevent classes from the app server leaking in as global classes in the OSGi container.

For monitoring agents, though, the agent packages will not be available within OSGi even though the agent is still going to try to inject itself into object instantiation.

This leads to all of the ClassNotFoundExceptions for missing packages/classes during startup.

Enabling Agent Monitoring in OSGi

We can actually enable the agent inside of the OSGi container, but it takes an additional configuration step.

In our file, we need to add the packages from the agent to the property.

Note here that I said "add".  You can't just say* and think it is going to work out because that strips out all of the other packages Liferay normally passes through the boot delegation.

To find your list, you need your file or access to the portlet properties panel in the System Administration control panel (or from Github or from your copy of your DXP source or ...).  Using the existing value as the guide, you'll end up with a value like:\ __redirected,\ com.liferay.aspectj,\ com.liferay.aspectj.*,\ com.liferay.portal.servlet.delegate,\ com.liferay.portal.servlet.delegate*,\ com.sun.ccpp,\ com.sun.ccpp.*,\ com.sun.crypto.*,\ com.sun.image.*,\ com.sun.jmx.*,\ com.sun.jna,\ com.sun.jndi.*,\ com.sun.mail.*,\*,\*,\ com.sun.msv.*,\*,\ com.sun.syndication,\*,\ com.sun.xml.*,\ com.yourkit.*,\ sun.*,\ com.agent.*

See how I tacked on the "com.agent.*" at the end of the list?

You'll of course change the "com.agent" stuff to match whatever package your particular agent is using, but hopefully you get the idea.

David H Nebinger 2018-08-15T15:38:00Z
Categories: CMS, ECM

Learn how to create and use Custom Fields and Profiles - Online training on August 22nd

CiviCRM - Wed, 08/15/2018 - 10:06

Adding custom fields in CiviCRM and understanding how to edit profiles allow you to customize your database to collect and store the data relevant to your non-profit.

Learn best practices for creating custom fields so your data is stored in the most logical place in CiviCRM and how to create online forms and registration pages by adding those custom fields to a profile. Find out more about different uses of profiles in CiviCRM and how you can maximize this powerful feature!

Categories: CRM

SnapLogic August 2018 Release: Intelligent connectivity meets continuous integration

SnapLogic - Wed, 08/15/2018 - 07:45

We are pleased to announce the general availability of the SnapLogic Enterprise Integration Cloud (EIC) August 2018 4.14 Release. This release brings new levels of integration and added benefits for DevOps processes, including continuous integration and continuous delivery support, new support for container management, added API integration functionalities, and expanded intelligent connectivity – all critical[...] Read the full article here.

The post SnapLogic August 2018 Release: Intelligent connectivity meets continuous integration appeared first on SnapLogic.

Categories: ETL

Data silos are the greatest stumbling block to an effective use of firms’ data

SnapLogic - Mon, 08/13/2018 - 12:43

Originally published on Greater access to data has given business leaders real, valuable insights into the inner workings of their organizations. Those who have been ahead of the curve in utilizing the right kinds of data for the right purposes have reaped the rewards of better customer engagement, improved decision-making, and a more productive[...] Read the full article here.

The post Data silos are the greatest stumbling block to an effective use of firms’ data appeared first on SnapLogic.

Categories: ETL

Going Serverless with Talend through CI/CD and Containers

Talend - Mon, 08/13/2018 - 04:20
Why should we care about CI/CD and Containers?

Continuous integration, delivery and deployment, known as CI/CD, has become such a critical piece in every successful software project that we cannot deny the benefits it can bring to your project. At the same time, containers are everywhere right now and are very popular among developers. In practice, CI/CD delivery allows users to gain confidence in the applications they are building by continuously test and validate them. Meanwhile, containerization gives you the agility to distribute and run your software by building once and being able to deploy “anywhere” thanks to a standard format that companies adopt. These common DevOps practices avoid the “it works on my machine!” effect. Direct consequences are a better time to market and more frequent deliveries.

How does it fit with Talend environment?

At Talend, we want to give you the possibility to be part of this revolution giving you access to the best of both worlds. Since the release of Talend 7.0 you can now build your Talend Jobs within Docker images thanks to standard Maven build. In addition, we also help you to smoothly plug this process into your CI/CD pipeline.

What about serverless?

The serverless piece comes at the end of it. This is the way we will deploy our Talend jobs. In fact, by shipping our jobs in containers we now have the freedom to deploy integration jobs anywhere. Among all the possibilities, a new category of services that are defined as serverless is raising. Most of the major cloud providers are starting to offer their serverless services for containers such AWS Fargate or Azure Container Instances to name a few. They allow you to run containers without the need to manage any infrastructure (servers or clusters). You are only billed for your container usage.

These new features have been presented at Talend Connect US 2018 during the main keynote and have been illustrated with a live demo of a whole pipeline from the build to the run of the job in AWS Fargate as well as Azure ACI. In this blog post, we are going to take advantage of Jenkins to create a CI/CD pipeline. It consists of building our jobs within Docker images, making the images available in a Docker registry, and eventually calling AWS Fargate and Azure ACI to run our images.

Let’s see how to reproduce this process. If you would like to follow along, please make sure you fulfill the following requirements.

  • Talend Studio 7.0 or higher
  • Talend Cloud Spring 18’ or higher
  • Have a Jenkins server available
  • Your Jenkins server needs to have access to a Docker daemon installed on the same machine
  • Talend CommandLine installed on the Jenkins’ machine
  • Nexus version 2 or 3
  • Have installed the Talend CI Builder configured along with your Talend CommandLine.

* All Talend components here are available in the Talend Cloud 30 Day Trial

Talend Environment

If you are new to Talend then let me walk you through a high-level overview of the different components.  You need to start out with Talend Cloud and create a project in the Talend Management Console (TMC), you will then need to configure your project to use a Git repository to store your job artifacts.  You also need to configure the Nexus setting in TMC to point to your hosted Nexus server to store any 3rd party libraries you job may need.  Talend Cloud account provides the overall project and artifact management in the TMC. 

Next, you need to install the Studio and connect it to the Talend Cloud account. The Studio is the main Talend design environment where you build integration jobs. When you log in with the Studio to the cloud account following these steps you will see the projects from the TMC and use the project that is configured to the Git repository.  Follow the steps below to add the needed plugins to the Studio.

The last two components you need are the CI builder and Talend CommandLine or cmdline (if using cloud trial, the CommandLine tool is included with the studio install files). When installing or using the CommandLine tool the first time you will need to give the CommandLine tool a license as well, you can use the same license from your Studio. The CommandLine tool and the CI builder tool are the components that allow you to take the code from the job in Studio (really in Git) and build and deploy fully executables processes to environments via scripts. The CI builder along with the profile in the studio is what determines if that is going to say a Talend Cloud runtime environment or a container. Here are the steps to get started!

1) Create a project in Talend Cloud

First, you need to create a project in your Talend Cloud account and link your project to a GitHub repository. Please, have a look at the documentation to perform this operation.

Don’t forget to configure your Nexus in your Talend Cloud account. Please follow the documentation for configuring your Nexus with Talend cloud. As a reminder your Nexus needs to have the following repositories:

  • releases
  • snapshots
  • talend-custom-libs-release
  • talend-custom-libs-snapshot
  • talend-updates
  • thirdparty
2) Add the Maven Docker profiles to the Studio

We are going to configure the Studio by adding the Maven Docker profiles to the configuration of our project and job pom files. Please find the two files you will need here under project.xml and standalone-job.xml.

You can do so in your studio, under the menu Project Properties -> Build -> Project / Standalone Job.

You just need to replace them with the ones you had copied above. No changes are needed.

In fact, what we are really doing here is adding a new profile called “docker” using the fabric8 Maven plugin. When building our jobs with this Maven profile, openjdk:8-jre-slim will be used as a base image, then the jars of our jobs are going to be added to this image along with a small script to indicate how to run the job. Please be aware that Talend does not support OpenJDK nor Alpine Linux. For testing purposes only, you can keep the openjdk:8-jre-slim image, but for production purposes, you will have to build your own Java Oracle base image. For more information please refer to our supported platforms documentation.

3) Set up Jenkins

The third step is to set up our Jenkins server. In this blog post, the initial configuration of Jenkins will not be covered. If you have never used it before please follow the Jenkins Pipeline getting started guide. Once the initial configuration is completed, we will be using the Maven, Git, Docker, Pipeline and Blue Ocean plugins to achieve our CI/CD pipeline.

We are going to store our Maven settings file in Jenkins. In the settings of Jenkins (Manage Jenkins), go to “Managed files” and create a file with ID “maven-file”. Copy this file in it as in the screenshot below. Make sure to modify the CommandLine path according to your own settings and to specify your own nexus credentials and URL.

What you also need to achieve before going into the detail of the pipeline is define some credentials. To do so go to “Manage Jenkins” and “Configure credentials” then on the left “Credentials”. Look at the screenshot below:

Create four credentials for GitHub, Docker Hub, AWS and Azure. If you only plan to use AWS, you don’t need to specify your Azure credentials and conversely. Make sure you set your ACCESS KEY as username and SECRET ACCESS KEY as password for the AWS credentials.

Finally, and before going through the pipeline, we must get two CLI Docker images available on the Jenkins machine. Indeed, Jenkins will use docker images with AWS and Azure CLIs to perform CLI commands to the different services. This is an easy way to use these CLIs without the need to install them on the machine. Here are the images we will use:

  • vfarcic/aws-cli (docker pull vfarcic/aws-cli:latest; docker tag vfarcic/aws-cli:latest aws-cli)
  • microsoft/azure-cli:latest (docker pull microsoft/azure-cli:latest; docker tag microsoft/azure-cli:latest azure-cli)

You can of course use different images at your convenience.

These Docker images need to be pulled on the Jenkins machine, this way in the pipeline we can use the Jenkins Docker plugin to use the “withDockerContainer(‘image’)” function to execute the CLI commands as you will see later. You can find more information about running build steps inside a Docker container in the Jenkins documentation here.

Now that all the pre-requisites have been fulfilled let’s create a “New item” on the main page and choose “Pipeline”.

Once created you can configure your pipeline. This is where you will define your pipeline script (Groovy language).

You can find the script here.

Let’s go through this file and I will highlight the main steps.

At the top of the file you can set your own settings through environment variables. You have an example that you can follow with a project called “TALEND_JOB_PIPELINE” and a job “test”. The project git name should match the one in your GitHub repository. That is why the name is uppercase. Please be aware that in this script we use the job name as the Docker image name, so you cannot use underscores in your job name. If you want to use an underscore you need to define another name for your Docker image. The following environment variables must be set:

env.PROJECT_GIT_NAME = 'TALEND_JOB_PIPELINE' env.PROJECT_NAME = env.PROJECT_GIT_NAME.toLowerCase() env.JOB = 'test' env.VERSION = '0.1' env.GIT_URL = '' env.TYPE = "" // if big data = _mr env.DOCKERHUB_USER = "talendinc"

In this file, each step is defined by a “stage”. The first two stages are here for pulling the latest version of the job using the Git plugin.

Then comes the build of the job itself. As you can see we are utilizing the Maven plugin. The settings are in a Jenkins Config file. This is the file we added earlier in the Jenkins configuration with the maven-file ID.

In the stages “Build, Test and Publish to Nexus” and “Package Jobs as Container” the line to change is:

-Dproduct.path=/cmdline -DgenerationType=local -DaltDeploymentRepository=snapshots::default::http://nexus:8081/repository/snapshots/ -Xms1024m -Xmx3096m

Here you need to specify your own path to the CommandLine directory (relatively to the Jenkins server) and your Nexus URL.

After the build of the job in a Docker image we are going to push the image to Dockerhub registry. For this step and the next one we will use CLIs to use the different third-parties. As the Docker daemon should be running on the Jenkins’ machine you can use directly the docker CLI. We use the withCredentials() function to get your Dockerhub username and password:

stage ('Push to a Registry') {             withCredentials([usernamePassword(credentialsId: 'dockerhub', passwordVariable: 'dockerhubPassword', usernameVariable: 'dockerhubUser')]) {                sh 'docker tag $PROJECT_NAME/$JOB:$VERSION $DOCKERHUB_USER/$JOB:$VERSION'                sh "docker login -u ${env.dockerhubUser} -p ${env.dockerhubPassword}"                sh "docker push $DOCKERHUB_USER/$JOB:$VERSION"            } }

The stage “Deployment environment” is simply an interaction with the user when running the pipeline. It asks whether you want to deploy your container in AWS Fargate or Azure ACI. You can remove this step if you want to have a continuous build until the deployment. This step is for demo purposes.

The next two stages are the deployment itself to AWS Fargate or Azure ACI. In each of the two stages you need to modify with your own settings. For example, in the AWS Fargate deployment stage you need to modify this line:

aws ecs run-task --cluster TalendDeployedPipeline --task-definition TalendContainerizedJob --network-configuration awsvpcConfiguration={subnets=[subnet-6b30d745],securityGroups=[],assignPublicIp=ENABLED} --launch-type FARGATE

You need to modify the name of your Fargate Cluster and your task definition. For your information you need to create them in your AWS console. You can read the documentation to achieve this operation. At the time of writing, AWS Fargate is only available in N. Virginia region, but other regions will come. The container you that will be defined in your task definition is the one that will be created in your Docker Hub account with the name of your job as image name. For example, it would be talendinc/test:0.1 with the default configuration in the pipeline script.

The same applies to Azure ACI, you need to specify your own resource group and container instance.

4) Configure the Command line

As a matter of fact, Maven will use the CommandLine to build your job. The CommandLine can be used in 2 modes: script and server mode. Here we will use the CommandLine in server mode. First, you need to indicate the workspace of your CommandLine (which in our case will be the Jenkins workspace). Modify file as follow with your own path to Jenkins workspace (it depends on your pipeline name you choose in the previous step):

./Talend-Studio-linux-gtk-x86_64 -nosplash -application org.talend.commandline.CommandLine -consoleLog -data /var/jenkins_home/workspace/talend_job_pipeline startServer -p 8002

Change the Jenkins’ home path according to your own settings. Last thing to do is to modify the /configuration/maven_user_settings.xml file. To do so copy paste this file with your own nexus URL and login information.

Then launch the CommandLine in background:

$ /cmdline/ & 5) Run the pipeline

Once all the necessary configuration has been done you can run your pipeline. To do so, you can go in the Open Blue Ocean view and click on “run” button. It will trigger the pipeline and should see the pipeline progress:

Jenkins Pipeline to build Talend Jobs into Docker Containers

The pipeline in the context of this blog will ask you where you want to deploy your container. Choose either AWS Fargate or Azure ACI. Let’s take the Fargate example.

After having proceeded the deployment, your Fargate cluster should now have one pending task:

If you go into the detail of your task once run, you should be able to access the logs of your job:

You can now run your Talend integration job packaged in a Docker container anywhere such as:

  • AWS Fargate
  • Azure ACI
  • Google Container Engine
  • Kubernetes or OpenShift
  • And more …

Thanks to Talend CI/CD capabilities you can automate the whole process from the build to the run of your jobs. 

If you want to become cloud agnostic and take advantage of the portability of the containers this example shows you how you can use a CI/CD tool (Jenkins in our case) to automate the build and run in different cloud container services. This is only an example among others but being able to build your jobs as containers opens you to a whole new world for your integration jobs. Depending on your use-cases you could find yourself spend way less money thanks to these new serverless services (such as Fargate or Azure ACI). You could also now spend less time configuring your infrastructure and focus on designing your jobs.

If you want to learn more about how to take advantage of containers and serverless technology, join us at Talend Connect 2018 in London and Paris. We will have dedicated break-out sessions on serverless to help you go hands on with these demos. See you there!

The post Going Serverless with Talend through CI/CD and Containers appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Extending Liferay DXP - User Registration (Part 2)

Liferay - Mon, 08/13/2018 - 02:24

This is the second part of the "Extending Liferay DXP - User Registration" blog. In this blog we explore the ways of implementing registration process for a portal with multiple sites

Portal Sites Configuration

Let’s presume we have a portal with the following sites configured:

  • "Liferay", default site, will not be listed
  • "Site 1", site with open membership
  • "Site 2", site with restricted membership
  • "Site 3", site with restricted membership
  • "Site 4", private site, will not be listed

Each of the sites has its own description which we want to display to the user:


User Registration Process Flow

The main steps of the user registration process that we are going to implement here are:

  1. Check if a user already has an account 
  2. If user already has an account but is not signed in, ask to sign in
  3. If user is already signed in, show current read-only details of a user
  4. Show the sites which is the user is not a member of with the description of the site when the site is selected
  5. Collect additional information from the user if the user has selected a 'restricted' site
  6. User reviews the request, with the ability to save request as PDF file, and submits the form
  7. On form submission:
    1. Automatically create user account if user does not exist
    2. If the site selected is 'open', user is automatically enrolled into this site
    3. If the site selected is 'restricted', site membership request is created for this user
    4. If the site selected is 'restricted', notification email is sent to site owners with user details attached as PDF


For the implementation of this process we use SmartForms.

Here we show only essential screenshots of the User Registration Form, the entire form definition can be downloaded (see link at the end of this blog). Once it is imported you can use Visual Designer to see how the business rules are implemented.


User Flow

1. User is asked to enter an email, SmartForms connects to portal webservices to check if such email is already registered (the source code for all webservices is at the end of the blog)

2. If user already has an account, then 'must log in' message is displayed

3. If user is already signed in, the form is automatically populated with user details (user data is automatically brought by SmartForms from Liferay environment).

4. On the next page the user is asked to select a site from the site list obtained via webservice. When user selects a site the description of the site is displayed (webservice again). You can put your own definition of Terms & Conditions together with 'I agree' checkbox.


5. If the site the user selected is of 'restricted' type, the user is asked to provide additional information.

SmartForms will automatically handle this rule once you add the following state condition to 'Company/Organisation Details' page. Visual Designer screenshot:

6. The membership request summary is displayed on a final page, allowing the user to save membership request details as PDF file and Submit the form when ready.

7. Processing the submitted form. There are two form submission handlers :

  1. Emailer, which sends a 'membership site request' email to Site Owner(s) if the selected site is 'restricted', attaching PDF that contains all data submitted by the user (implemented using SmartForms workflow, but of course you can use your own).
  2. Webhook, that creates user account if the user is not registered yet and submits membership request of this user to a selected site. (the source code for this webhook  is at the end of the blog).

That's it. That is the user flow, which is probably longer than the implementation notes listed below.

Implementation Notes Data Flow

Below is the generic data flow where the Form Controller could be implemented inside or outside of the portal.

In our implementation we are using SmartForms to build and run User Registration Form. SmartForms Cloud is acting as Form Controller.


User Registration Handler Code

User registration handler implements:

  1. SOAP webservices to query portal data and populate form fields
  2. JSON REST endpoint to executes all relevant actions on form submission, creation of portal users and site membership requests

Here we provide source code for essential functions, link to get the full source code is at the end of the blog.


Checking if User is Registered

@WebMethod (action=namespace +  "getMemberStatusByEmail" )

public String getMemberStatusByEmail(

         @WebParam (name= "fieldName" )

         String fieldname,

         @WebParam (name= "fields" )

         Field[] fieldList) {

     try {


         Map<String, Field> fieldMap = fieldArrayToMap(fieldList);

         Field emailAddressVO = fieldMap.get(FIELDNAME_USER_EMAIL);

         if (emailAddressVO ==  null ) {

             logger.warn( "Call to getMemberStatusByEmail() is misconfigured, cannot find field '" + FIELDNAME_USER_EMAIL +  "'" );

             return "false" ;


         String emailAddress = emailAddressVO.getValue();


         if (emailAddress.trim().length() ==  0 ) {

             return "false" ;



         try {

             UserLocalServiceUtil.getUserByEmailAddress( this .getCompanyId(fieldMap), emailAddress);

             // no exception, user exists

             return "true" ;

         }  catch (Exception e) {}

         // user is not registered

         return "false" ;

     }  catch (Throwable e) {

         logger.error( "System error " , e);

         return "false" ;




Getting List of Open and Protected Sites

@WebMethod (action=namespace +  "getSites" )

public Option[] getSites(

         @WebParam (name= "fieldName" )

         String fieldname,

         @WebParam (name= "fields" )

         Field[] fieldList) {

     Option[] blankOptions =  new Option[ 0 ];

     try {      

         Map<String, Field> fieldMap = fieldArrayToMap(fieldList);        

         long companyId =  this .getCompanyId(fieldMap);          

         User user =  null ;

         Field userScreennameField = fieldMap.get(FIELDNAME_USER_SCREENNAME);

         if (userScreennameField !=  null ) {

             if (userScreennameField.getValue().trim().length() >  0 ) {

                 try {

                     user = UserLocalServiceUtil.getUserByScreenName(companyId, userScreennameField.getValue());

                 }  catch (Exception e) {}




         List<Option> validGroups =  new ArrayList<Option>();

         LinkedHashMap<String, Object> params =  new LinkedHashMap<String, Object>();

         // limit selection by sites only

         params.put( "site" ,  new Boolean( true ));

         // limit selection by active groups only

         params.put( "active" ,  new Boolean( true ));

         List<Group> allGroupsList =, params , QueryUtil.ALL_POS, QueryUtil.ALL_POS);

         Iterator<Group> allActiveGroups = allGroupsList.iterator();

         while (allActiveGroups.hasNext()) {

             Group group =;

             boolean isAlreadyAMember =  false ;

             // check if user is already a member of it

             if (user !=  null ) {

                 if (group.isGuest()) {

                     // is a member anyway

                     isAlreadyAMember =  true ;

                 }  else {

                     isAlreadyAMember = UserLocalServiceUtil.hasGroupUser(group.getGroupId(), user.getUserId());



             // add the site to the selection list if this is a regular community site and the user is not already a member of it

             if (group.isRegularSite() && !group.isUser() && !isAlreadyAMember && !group.isGuest()) {

                 // include Open and Restricted sites only

                 if (group.getType() ==  1 || group.getType() ==  2 ) {

                     validGroups.add(  new Option( group.getName(group.getDefaultLanguageId()), String.valueOf(group.getGroupId()) ) );




         return validGroups.toArray( new Option[validGroups.size()]);

     }  catch (Throwable e) {

         logger.error( "System error " , e);

         return blankOptions;




Getting Email Addresses of Owners of a Site

@WebMethod (action=namespace +  "getSiteOwnerEmails" )

public String getSiteOwnerEmails(

         @WebParam (name= "fieldName" )

         String fieldname,

         @WebParam (name= "fields" )

         Field[] fieldList) {

     Map<String, Field> fieldMap = fieldArrayToMap(fieldList);

     Group group =  this .getSelectedGroup(fieldMap);

     if (group ==  null ) {

         // no group selected yet

         return "" ;

     }  else if (group.getType() !=  2 ) {

         // this is not a restricted site

         return "" ;

     }  else {

         // check if Terms and Conditions acknowledge is checked, otherwise no point of fetching email addresses

         Field termsAndConditionsField = fieldMap.get(FIELDNAME_TnC_CHECKBOX);

         if (termsAndConditionsField ==  null ) {

             logger.warn( "Call to getSiteOwnerEmails() is misconfigured, cannot find field '" + FIELDNAME_TnC_CHECKBOX +  "'" );

             return "" ;


         if (termsAndConditionsField.getValue().length() ==  0 ) {

             // not checked

             return "" ;


         // make a list of email addresses of site owners for a restricted site

         // this will be used to send 'site membership request' email

         StringBuilder response =  new StringBuilder();

         Role siteOwnerRole;

         try {

             siteOwnerRole = RoleLocalServiceUtil.getRole(group.getCompanyId(), RoleConstants.SITE_OWNER);

         }  catch (PortalException e) {

             logger.error( "Unexpected error" , e);

             return "" ;


         List<User> groupUsers = UserLocalServiceUtil.getGroupUsers(group.getGroupId());

         for ( int i =  0 ; i < groupUsers.size(); i++) {

             User user = groupUsers.get(i);

             if (UserGroupRoleLocalServiceUtil.hasUserGroupRole(user.getUserId(), group.getGroupId(), siteOwnerRole.getRoleId())) {

                 if (response.length() >  0 ) {

                     response.append( ';' );




 "compiled site admin emails " + response.toString());

         return response.toString();




Creating User Account and Site Membership Request

// method to process JSON webhook call


@Path ( "/user-registration" )

public void newFormSubmittedAsJsonFormat(String input) { "In /webhook/user-registration" );


     /* check authorization */

     String handShakeKey = request.getHeader( "X-SmartForms-Handshake-Key" );

     if (handShakeKey ==  null || !handShakeKey.equals(SMARTFORMS_HADSHAKE_KEY) ) {

         throw new WebApplicationException(Response.Status.UNAUTHORIZED);           



     JSONObject data;

     try {

         data = JSONFactoryUtil.createJSONObject(input);


         Map<String, String> fields =  this .jsonFormDataToFields(data);
 "Have received fields " + fields.size());

         User user =  null ;


         ServiceContext serviceContext =  new ServiceContext();

         long groupId = Long.parseLong(fields.get(FIELD_SITE_ID));

         long companyId = Long.parseLong(fields.get(FIELD_COMPANY_ID));

         if (fields.get(FIELD_USER_ID).length() >  0 ) {

    "User is already registered" );

             try {

                 user = UserLocalServiceUtil.getUser(Long.parseLong(fields.get(FIELD_USER_ID)));

             }  catch (Exception e) {

                 logger.error( "Unable to fetch user" , e);

                 throw new WebApplicationException(Response.Status.NOT_FOUND);


         }  else {

             // create user

             String firstName = fields.get(FIELD_USER_FIRST_NAME);

             String lastName = fields.get(FIELD_USER_LAST_NAME);

             String email = fields.get(FIELD_USER_EMAIL);

    "Creating user " + firstName +  " " + lastName +  " " + email);

             try {

                 // the following data could come from the form, but we just provide some hard-coded value

                 long groups[] =  new long [ 0 ];

                 if (!fields.get(FIELD_SITE_TYPE).equals( "restricted" )) {

                     // this is an open group, add it to the list

                     groups =  new long [ 1 ];

                     groups[ 0 ] = groupId;



                 long blanks[] =  new long [ 0 ];

                 boolean sendEmail =  false ;

                 Locale locale = PortalUtil.getSiteDefaultLocale(groupId);

                 boolean male =  true ;

                 String jobTitle =  "" ;

                 long suffixId =  0 ;

                 long prefixId =  0 ;

                 String openId =  null ;

                 long facebookId =  0 ;

                 String screenName =  null ;

                 boolean autoScreenName =  true ;

                 boolean autoPassword =  true ;

                 long creatorUserId =  0 ;

                 user = UserLocalServiceUtil.addUser(

                         creatorUserId, companyId,

                         autoPassword,  null ,  null ,

                         autoScreenName, screenName , email,

                         facebookId, openId, locale,

                         firstName,  "" , lastName,

                         prefixId, suffixId, male,

                         1 ,  1 ,  2000 ,


                         groups, blanks, blanks, blanks,

                         sendEmail ,


             }  catch (Exception e) {

                 logger.error( "Unable to create user" , e);

                 throw new WebApplicationException(Response.Status.INTERNAL_SERVER_ERROR);




         if (fields.get(FIELD_SITE_TYPE).equals( "restricted" )) {

             try {

                 MembershipRequestLocalServiceUtil.addMembershipRequest(user.getUserId(), groupId,

                         "User has requested membership via User Registration Form, you should have received an email" , serviceContext);

             }  catch (PortalException e) {

                 logger.error( "Unable ot add membership request" );;



     }  catch (JSONException e) {

         logger.error( "Unable to create json object from data" );

         throw new WebApplicationException(Response.Status.BAD_REQUEST);




private Map<String, String> jsonFormDataToFields(JSONObject data) {

     Map<String, String> map =  new HashMap<String, String>();

     JSONArray fields = data.getJSONArray( "fields" );

     for ( int i =  0 ; i < fields.length(); i++) {

         JSONObject field = fields.getJSONObject(i);

         map.put(field.getString( "name" ), field.getString( "value" ));


     return map;



Full source code for Form Handler project can be downloaded from here:


To make it work on your Liferay installation using SmartForms you will need to:

One more thing, in SmartForm webhook configuration please change the localhost to URL of your portal:

Feel free to ask questions if you run into problems ...

Victor Zorin 2018-08-13T07:24:00Z
Categories: CMS, ECM

Electronic Signing with Drupal Webform and CiviCRM

CiviCRM - Sat, 08/11/2018 - 11:38

Do you need to collect a "real" signature from your contacts and store it on their contact record?

If you use Drupal Webform you can embed an electronic signature directly using the new webform_civicrm_esign module developed by MJW Consulting and Northbridge Digital.

Categories: CRM

Talend Connect Europe 2018: Liberate your Data. Become a Data Hero

Talend - Fri, 08/10/2018 - 11:03

Save the date! Talend Connect will be back in London and Paris in October

Talend will welcome customers, partners, and influencers to its annual company conference, Talend Connect, taking place in two cities, London and Paris, in October. A must-attend event for business decision makers, CIOs, data scientists, chief architects, and developers, Talend Connect will share innovative approaches to modern cloud and big data challenges, such as streaming data, microservices, serverless, API, containers and data processing in the cloud.

<< Reserve your spot for Talend Connect 2018: Coming to London and Paris >>

Talend customers from different industries including AstraZeneca, Air France KLM, BMW Group, Greenpeace and Euronext will go on stage to explain how they are using Talend’s solutions to put more data to work, faster. Our customers now see making faster decisions and monetizing data as a strategic competitive advantage. They are faced with the opportunity and challenge of having more data than ever spread across a growing range of environments, that change at an increasing speed, combined with the pressure to manage this growing complexity whilst simultaneously reduce operational costs.  At Talend Connect you can learn how Talend customers leverage more data across more environments to make faster decisions and support faster innovation whilst significantly reducing operational costs when compared with traditional approaches. Here’s what to expect at this year’s show.


Day 1: Training Day and Partner Summit

Partners play a critical role in our go-to-market plan for cloud and big data delivery and scale. Through their expertise in Talend technology and their industry knowledge, they can support organisations’ digital transformation strategies. During this first day, attendees will learn about our partner strategy and enablement, our Cloud-First strategy, as well as customer use cases.

The first day will also be an opportunity for training. Designed for developers, the two training sessions will enable the attendees to get started with Talend Cloud trough hands-on practices. Attendees will also be able to get certified on Talend Data Integration solution. An experienced Talend developer will lead the participants through a review of relevant topics, with hands-on practice in a pre-configured environment.

The training day and partner summit will be organised in both London and Paris.

Day 2: User Conference

Talend Connect is a forum in which customers, partners, company executives and industry analysts can exchange ideas and best approaches for tackling the challenges presented by big data and cloud integration. This year’s conference will offer attendees a chance to gain practical hands-on knowledge of Talend’s latest cloud integration innovations, including a new product that will be unveiled during the conference.

Talend Connect provides an ideal opportunity to discover the leading innovations and best practices of Cloud integration. With cloud subscription growing over 100% year-over-year, Talend continues to invest in this area, including serverless integration, DevOps and API capabilities.

Talend Data Master Awards

The winners of the Talend Data Master Awards will be announced at Talend Connect London. Talend Data Master Awards is a program designed to highlight and reward the most innovative uses of Talend solutions. The winners will be selected based on a range of criteria including market impact and innovation, project scale and complexity as well as the overall business value achieved.

Special Thanks to Our Sponsors

Talend Connect benefits from the support of partners including Datalytyx, Microsoft, Snowflake, Bitwise, Business&Decision, CIMT AG, Keyrus, Virtusa, VO2, Jems Group, Ysance, Smile and SQLI.

I am looking forward to welcoming users, customers, partners and all the members of the community to our next Talend Connect Europe!

The post Talend Connect Europe 2018: Liberate your Data. Become a Data Hero appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

How to prospect and reach out to leads in the post-GDPR era

VTiger - Fri, 08/10/2018 - 06:06
Ever since the enactment of GDPR, the way sales professionals prospect and reach out to their potential customers has changed drastically. Your sales activities can no longer include sending out a bulk promotional email or any unsolicited email to contacts just because you have their email addresses. The new regulation has made prospecting and reaching […]
Categories: CRM

How to upgrade my sharded environment to Liferay 7.x?

Liferay - Fri, 08/10/2018 - 02:35

Hi Liferay Community,

Before responding this question I would like to explain what's sharding first: to overcome the horizontal scalability concerns of open source databases at the time (circa 2008), Liferay implemented physical partitioning support.  The solution allowed administrators to configure portal instances to be stored in different database instances and database server processes.

This feature was originally named "sharding" although "data partitioning" is more accurate since it requires a small amount of information sharing to coordinate partitions.

Thus, beginning in 7.0, Liferay removed its own physical partitioning implementation in favor of the capabilities provided natively by database vendors. Please, notice that logical partitioning via the "portal instance" concept (logical set of data grouped by the companyId column with data security at portal level) is not affected by this change and it's available in current Liferay versions.

Having explained this, the answer to this question is simple, just the follow the official procedure to do it:

So Liferay 7.x provides a process which will convert all shards in independent database schemas after the upgrade. This can be suitable for thoses cases where you need to keep information separated for legal reasons. However if you can not afford to maintain one complete environment for every of those independent databases you could try another approach: disable staging by merging all shards into just one database schema before performing the upgrade to Liferay 7.x.

The option of merging all shard schemas into the default one is feasible because sharding generates unique ids per every row among all databases. These are the steps you should follow to achieve this:

  1. Create a backup for the shard database schemas in the production environment.
  2. Copy the content of every table in the non default shards into the default shard. It's recommended to create an SQL script to automate this process.
  3. If a unique index is violated, analyze the data for the two records which cause the issue and remove one of them since it's not necessary anymore (different reasons could cause the creation of data in the incorrect shard in the past such as wrong configuration, a bug, issues with custom developments, etc.)
  4. Resume this process from the last point of failure.
  5. Repeat 3 and 4 until the default shard database contains all data from the other shards.
  6. Clean up the Shard table except for the default shard record.
  7. Startup a Liferay server using this database without the sharding
    1. Remove all database connections except for the default one.
    2. Comment the line META-INF/shard-data-source-spring.xml in the spring.configs property.
  8. Ensure that everything works well and you can access to the different portal instances. 

It is recommended that you keep record of the changes made in the step 3 and 6 since you will need to repeat this process once you decide to go live after merging all databases in the default shard. It is also advisable to do this as a separate project before performing the upgrade to Liferay 7.x. Once you have completed this process you will just need to execute the upgrade as a regular non-shared environment:

This alternative to upgrade sharded environments is not officially supported but it has been executed succesfully in a couple of installations. For that reason, if you have any question regarding it please write a comment in the this blog entry or open a new thread in the community forums, other members of the community and I will try to assist you during this process.

Alberto Chaparro 2018-08-10T07:35:00Z
Categories: CMS, ECM

Contact Layout Editor - Help put on the finishing touches

CiviCRM - Thu, 08/09/2018 - 16:22

One of the most requested features in CiviCRM over the years has been to tweak the Contact Summary page. That's the main Contact view, the "heart" of CiviCRM. And now, thanks to 12 donors contributing a total of $12,860.00 in the past month, we've almost reached our goal. Extension development is underway and beta testers have been wowed with the power and ease of designing their own Contact Summary layout with only a few clicks. Some key features of the extension:

Categories: CRM

Why I hired a university professor to join our tech startup

SnapLogic - Thu, 08/09/2018 - 13:19

Originally published on It’s universally accepted that AI is a game-changer and is having a huge impact on organizations across all industries. In the years to come, this isn’t something that is going to change. Indeed, AI and machine learning will be become so pervasive, making companies more innovative and allowing employees to offload[...] Read the full article here.

The post Why I hired a university professor to join our tech startup appeared first on SnapLogic.

Categories: ETL

3 Common Pitfalls in Building Your Data Lake and How to Overcome Them

Talend - Wed, 08/08/2018 - 11:36

Recently I had the chance to talk to an SVP of IT at one of the largest banks in North America about their digital transformation strategy. As we spoke, their approach to big data and digital transformation struck me as they described it as ever evolving. New technologies would come to market which required new pivots and approaches to leverage these capabilities for the business. It is more important than ever to have an agile architecture that can sustain and scale with your data and analytics growth. Here are three common pitfalls we often see when building a data lake and our thoughts on how to overcome them:

“All I need is an ingestion tool”

Ah yes, the development of a data lake is often seen as the holy grail of everything. Afterall, now you have a place to dump all of your data. The first issue most people run into is data ingestion. How could they collect and ingest the sheer variety and volume of data that was coming into a data lake. Any success of data collection is a quick win for them. So they bought a solution for data ingestion, and all the data can now be captured and collected like never before. Well problem solved, right? Temporarily, maybe, but the real battle has just begun.

Soon enough you will realize that simply getting your data into the lake is just the start. Most data lake projects failed because it turns into a big data swamp with no structure, no quality, a lack of talent and no trace of where the data actually came from. Raw data is rarely useful as a standalone since the data still needs to be processed, cleansed, and transformed in order to provide quality analytics. This often lead to the second pitfall.

Hand coding for data lake

We have had many blogs in the past on this, but you can’t emphasize this topic enough. It’s strikingly true that hand coding may look promising from the initial deployment costs, but the maintenance costs can increase by upwards of 200%. The lack of big data skills, on both the engineering and analytics sides, as well as the movement of cloud adds even more complexity to hand coding. Run the checklist here to help you determine when and where to have custom coding for your data lake project.


With the rising demands of faster analytics, companies today are looking for more self-service capabities when it comes to integration. But it can easily cause peril without proper governance and metadata management in place. As many basic integration tasks may go to citizen integrators, it’s more important to ask is there governance in place to track that? Is access of your data given to the right people at the right time?  Is your data lake enabled with proper metadata management so your self-service data catalog is meaningful?

Don’t look for an avocado slicer.

As the data lake market matures, everyone is looking for more and yet struggling with each phase as they go through the filling, processing and managing of data lake projects.  To put this in perspective, here is a snapshot of the big data landscape from VC firm FirstMarkfrom 2012:

And this is how it looks in 2017:

The big data market landscape is growing like never before as companies are now more clear on what they need. From these three pitfalls, the biggest piece of advice I can offer is to avoid what I like to call “an avocado slicer”. Yes it might be interesting, fancy, and works perfectly for what you are looking for, but you will soon realize it’s a purpose-built point solution that might only work for ingestion, only compatible with one processing framework, or only works for one department’s particular needs. Instead, have a holistic approach when it comes to your data lake strategy, what you really need is a well-rounded culinary knife! Otherwise, you may end up with an unnecessary amount of technologies and vendors to manage in your technology stack.

In my next post, I’ll be sharing some best questions to ask for a successful data management strategy.  

The post 3 Common Pitfalls in Building Your Data Lake and How to Overcome Them appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Using CiviCRM form processor extension to handle form submissions from an external website

CiviCRM - Wed, 08/08/2018 - 08:03

In this blog post I want to show how you could use the new form processor extension to handle form submissions from an external website.

My (imaginary) organisation provides buddies for young people and the form on our website is submitted when somebody is interested in becoming a buddy for a teenager. We ask for the name, address, e-mail, telephone number, birth date and gender.

Categories: CRM

How to practice CI/CD the SnapLogic way

SnapLogic - Tue, 08/07/2018 - 14:27

With the SnapLogic Enterprise Integration Cloud’s (EIC) superior handling of different types of integrations for organizations, forward-looking companies are leveraging DevOps methodologies for their own data and application integration workflows and initiatives. CI/CD – continuous integration and continuous delivery – is a practice where code is built, integrated, and delivered in a frequent manner. This[...] Read the full article here.

The post How to practice CI/CD the SnapLogic way appeared first on SnapLogic.

Categories: ETL

How to Develop a Data Processing Job Using Apache Beam – Streaming Pipelines

Talend - Tue, 08/07/2018 - 12:21

In our last blog, we talked about developing data processing jobs using Apache Beam. This time we are going to talk about one of the most demanded things in modern Big Data world nowadays – processing of Streaming data.

The principal difference between Batch and Streaming is the type of input data source. When your data set is limited (even if it’s huge in terms of size) and it is not being updated along the time of processing, then you would likely use a batching pipeline. Input source, in this case, can be anything from files, database tables, objects in object storages, etc. I want to underline one more time that, with batching, we assume that data is mutable during all the processing time and the number of input records is constant. Why should we pay attention to this? Because even with files we can have unlimited data stream when files are always added or changed. In this instance, we have to apply a streaming approach to work with data. So, if we know that our data is limited and immutable then we need to develop a batching pipeline.

Things get more complicated when our data set is unlimited (continuously arriving) or/and mutable. Some of the examples of such sources might be the following – message systems (like Apache Kafka), new files in a directory (web server logs) or some other system collecting real-time data (like IoT sensors). The common theme among all of these sources is that we always have to wait for new data. Of course, we can split our data into batches (by time or by data size) and process every split in a batching way, but it would be quite difficult to apply some functions across all consumed datasets and create the whole pipeline for this. Luckily, there are several streaming engines that allow us to cope with this type of data processing easily – Apache SparkApache FlinkApache ApexGoogle DataFlow. All of them are supported by Apache Beam and we can run the same pipeline on different engines without any code changes. Moreover, we can use the same pipeline in batching or in streaming mode with minimal changes – the one just needs to properly set input source and voilà – everything works out of the box! Just like magic! I would dream of this a while ago when I was rewriting my batch jobs into streaming ones.

So, enough theory – it’s time to take an example and write our first streaming code. We are going to read some data from Kafka (unbounded source), perform some simple data processing and write results back to Kafka as well.

Let’s suppose we have an unlimited stream of geo-coordinates (X and Y) of some objects on a map (for this example, let’s say the objects are cars) which arrives in real time and we want to select only those that are located inside a specified area. In other words, we have to consume text data from Kafka topic, parse it, filter by specified limits and write back into another Kafka topic. Let’s see how we can do this with a help of Apache Beam.

Every Kafka message contains text data in the following format:

  id – unique id of the object,
  x, y – coordinates on the map (integers).

We will need to take care of the format if it’s not valid and skip such records.

Creating a pipeline

Much like our previous blog, where we did batching processing, we create a pipeline in the same way:

Pipeline pipeline = Pipeline.create(options);

We can elaborate Options object to pass command line options into the pipeline. Please, see the whole example on Github for more details.

Then, we have to read data from Kafka input topic. As stated before, Apache Beam already provides a number of different IO connectors and KafkaIO is one of them. Therefore, we create new unbounded PTransform which consumes arriving messages from specified Kafka topic and propagates them further to the next step:

pipeline.apply( KafkaIO.<Long, String>read() .withBootstrapServers(options.getBootstrap()) .withTopic(options.getInputTopic()) .withKeyDeserializer(LongDeserializer.class) .withValueDeserializer(StringDeserializer.class))

By default, KafkaIO encapsulates all consumed messages into KafkaRecord object. Though, next transform just retrieves a payload (string values) by new created DoFn object:

.apply( ParDo.of( new DoFn<KafkaRecord<Long, String>, String>() { @ProcessElement public void processElement(ProcessContext processContext) { KafkaRecord<Long, String> record = processContext.element(); processContext.output(record.getKV().getValue()); } } ) )

After this step, it is time to filter the records (see the initial task stated above) but before we do that, we have to parse our string value according to the defined format. This allows it to be encapsulated into one functional object which then will be used by Beam internal transform Filter.

.apply( "FilterValidCoords", FilterObjectsByCoordinates( options.getCoordX(), options.getCoordY())) )

Then, we have to prepare filtered messages to write back to Kafka by creating a new pair of key/values using internal Beam KV class which can be used across different IO connectors, including KafkaIO as well.

.apply( "ExtractPayload", ParDo.of( new DoFn<String, KV<String, String>>() { @ProcessElement public void processElement(ProcessContext c) throws Exception { c.output(KV.of("filtered", c.element())); } } ) )

The final transformation is needed to write messages into Kafka, so we simply use KafkaIO.write() – sink implementation – for these purposes. As for reading, we have to configure this transform with some required options, like Kafka bootstrap servers, output topic name and serialisers for key/value.

.apply( "WriteToKafka", KafkaIO.<String, String>write() .withBootstrapServers(options.getBootstrap()) .withTopic(options.getOutputTopic()) .withKeySerializer(org.apache.kafka.common.serialization.StringSerializer.class) .withValueSerializer(org.apache.kafka.common.serialization.StringSerializer.class) );

In the end, we just run our pipeline as usual:;

This time it may seem a bit more complicated than it was in previous blog, but, as one can easily notice, we didn’t do any specific things to make our pipeline streaming-compatible. This is the whole responsibility of the Apache Beam data model implementation which makes it very easy to switch between batching and streaming processing for Beam users.

Building and running a pipeline

Let’s add the required dependencies to make it possible to use Beam KafkaIO:



Then, just build a jar and run it with DirectRunner to test how it works:

# mvn clean package
# mvn exec:java -Dexec.mainClass=org.apache.beam.tutorial.analytic.FilterObjects -Pdirect-runner -Dexec.args=”–runner=DirectRunner”

If it’s needed, we can add other arguments used in the pipeline with a help of “exec.args” option. Also, make sure that your Kafka servers are available and properly specified before running Beam pipeline. Lastly, the Maven command will launch a pipeline and run it forever until it will be finished manually (optionally, it is possible to specify maximum running time). So, it means that data will be processed continuously, in streaming mode.

As usual, all code of this example is published on this github repository.

Happy streaming!

The post How to Develop a Data Processing Job Using Apache Beam – Streaming Pipelines appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL
Syndicate content