AdoptOS

Assistance with Open Source adoption

Open Source News

Display CiviCRM forms on any web site anywhere

CiviCRM - Wed, 10/24/2018 - 10:19

Powerbase (the Progressive Technology Project's hosted CiviCRM program) provides a Drupal site for your CiviCRM database, but, even though it's in Drupal, you can't use it for your web site. Your web site must be hosted elsewhere. This approach is becoming increasingly popular as organizations move from Drupal, Wordpress and Joomla to new web site platforms, or simply don't want their web site vulnerabilities to lead to a compromise of their database.

Categories: CRM

Meet Toky, Ambassador of the month | October 2018

PrestaShop - Wed, 10/24/2018 - 04:30
Please meet our Ambaasdor Toky from Antananarivo in Madagascar. He joined the Ambassador Program in 2016, and have been very active ever since animating the community.
Categories: E-commerce

Continuous Integration Best Practices – Part 1

Talend - Tue, 10/23/2018 - 18:00

In this blog, I want to highlight some of the best practices that I’ve come across as I’ve implemented continuous integration with Talend. For those of you who are new to CI/CD please go through the part 1 and part 2 of my previous blogs on ‘Continuous Integration and workflows with Talend and Jenkins’. This blog would also introduce you to some basic guidance on how to implement and maintain a CI/CD system. These recommendations will help in improving the effectiveness of CI/CD.

Without any further delay – let’s jump right in!

Best Practice 1 – Use Talend Recommended Architectures

For every product Talend offers, there is also a recommended architecture. For details on the recommended architecture for our different products please refer to our Git repo: https://talendpnp.github.io/ . This repository has details on every product Talend offers. It’s truly a great resource, however, for this blog I am focusing only on the CI/CD aspect with Data Integration platform.

In the architecture for Talend Data Integration, it’s recommended to have a separate Software Development Life Cycle (SDLC) environment. This SDLC environment should typically consist of an artifact repository like Nexus, a version control, server tool like Git, Talend Commandline, Maven, and the Talend CI Builder Plugin. The SDLC server would typically look as follows (the optional components are marked in yellow):

As the picture shows, the recommendation suggests having separate servers for Nexus, Version control system and CI server.  One Nexus is shared across all environments. All environments access the binaries stored in Nexus for job execution.  The version control system is needed only in the development environment. This server is also not accessed from other environments.

Best Practice 2 –  Version Control System

Continuous integration requires working with a version control system and hence it is important to have a healthy, clean system for the CI work fast. Talend recommends a few best practices while working with Git. Please go through the links below on best practices while working with Git.

Best Practice 3 –  Continuous Integration Environment

Talend recommends 4 environments with a continuous integration set up (see below). The best practice is illustrated with Git, Jenkins and Nexus as an example. The GIT in the SDLC server is only accessible from the Development Environment.  Other environments cannot access the GIT.

All the coding activities take place in the development environment and are pushed to the version control system Git. The CI/CD process takes the code from Git, converts it into binaries and publishes the code to Nexus Snapshot repository. All non-prod environments have access to Nexus Release and Snapshot repository, however, the production environment has access only to the Release repository.

It is important to note that One nexus is shared among all environments.

Best Practice 4 –  Maintaining Nexus

The artifact repository Nexus plays a very vital role in continuous integration. Nexus is used by Talend not only for providing software updates/patches but is also used as an artifact repository to hold the job binaries.

These binaries are then called via the Talend Administrator Center or Talend Cloud to execute the jobs. If your Talend Studio is connected with the Talend Administration Centre, all the Nexus artifact repository settings are automatically retrieved from the Talend Administration Center. You can choose to use the retrieved settings to publish your Jobs or configure your own artifact repositories.

If your Studio is working on a local connection, all the fields are pre-filled with the locally-stored default settings.

Now, with respect to CI/CD, as a best practice, it is recommended to upload the CI builder plugin and all the external jar files used by the project to the third-party repository in Nexus. The third-party folder will look as given below once Talend ci builder is uploaded.

Best Practice 5 –  Release and Snapshot Artifacts

To implement a CI/CD pipeline it is important to understand the difference between release and snapshot Artifacts/Repositories. Release artifacts are stable and everlasting in order to guarantee that builds which depend on them are repeatable over time. By default, the Release repositories do not allow redeployment. Snapshots capture a work in progress and are used during development. By default, snapshot repositories allow redeployment.

As a best practice, it is recommended that development teams learn the difference between the repositories and to implement the pipeline in such a way that the development artifacts refer to the Snapshot repository and rest of the environments refer only to the Release Repository.

Best Practice 6 – Isolate and Secure the CI/CD Environment

The Talend Recommended Architecture talks about “isolating” the CI/CD environment. The CI/CD system typically has some of the most critical information, complete access to your source/target, has all credentials and hence it is critically important to secure and safeguard the CI/CD system.

As a best practice, the CI/CD system should be deployed to internal and protected networks. Setting up VPNs or other network access control technology is recommended to ensure that only authenticated operators are able to access the system. Depending on the complexity of your network topology, your CI/CD system may need to access several different networks to deploy code to different environments. The important point to keep in mind is that your CI/CD systems are highly valuable targets, and in many cases, they have a broad degree of access to your other vital systems. Protecting all external access to the servers and tightly controlling the types of internal access allowed will help reduce the risk of your CI/CD system being compromised.

The image given below shows Talend’s recommended architecture where the CI server, SDLC and the Version control system (git) are isolated and secured.

Best Practice 7 –  Maintain Production like environment

It is a best practice to have one environment (QA environment) as close to that of the production environment. This includes infrastructure, operating system, databases, patches, network topology, firewalls and configuration.

Having an environment close to the production environment and validating code changes in this environment helps in ensuring that the integration accurately reflects how the change would behave in production. It would identify mismatches and last-minute surprises which can be effectively eliminated, and code can be released safely to production at any time. Note that the more differences between your production environment and the QA environment, the less chances are that your tests will measure how the code will perform when released. Some differences between QA and production are expected but keeping them manageable and making sure they are well-understood is essential.

Best Practice 8 – Keep Your Continuous Integration process fast

CI/CD pipelines help in driving the changes through automated testing cycles to different environments like test, stage and finally to production. Making the CI/CD pipeline fast is very important for the team to be productive.

If a team has to wait long for the build to finish, then it defeats the whole purpose. Since all the changes must follow this CI/CD process, keeping the pipeline fast and dependable is very important or else it would diminish the purpose. There are multiple aspects to keep the CD/CD process fast. To being with the CI/CD infrastructure must be good enough not only to suit the current need but also to scale out if needed. Also, it is important to visit the test cases in a timely manner to ensure that no test cases are adding any overhead to the system

Best Practice 9 –  CI/CD should be the only way to deploy to Production

Promoting code through the CI/CD pipelines validates each change so that it fits the organization’s standards and doesn’t introduce any bugs to the existing code. Any failures in a CI/CD pipeline are immediately visible and it should stop the further code integration/deployment. This is a gatekeeping mechanism that safeguards the important environments from untrusted code.

To utilize these advantages, it is important to ensure that every change in the production environment goes through only the CI/CD pipeline. The CI/CD pipeline should be the only mechanism by which code enters the production environment. This CI/CD pipeline could be automated or via a manual trigger.

Generally, a team follows all this until a production fix or a show stopper error occurs. When the error is critical, there is a pressure to resolve them quickly. It is recommended that even in such scenarios the fix should be introduced to other environments via the CI/CD pipeline. Putting your fix through the pipeline (or just using the CI/CD system to rollback) will also prevent the next deployment from erasing an ad hoc hotfix that was applied directly to production. The pipeline protects the validity of your deployments regardless of whether this was a regular, planned release, or a fast fix to resolve an ongoing issue. This use of the CI/CD system is yet another reason to work to keep your pipeline fast.

As an example, the image below shows the option to publish the job via studio. It is not recommended to use this approach. The recommended approach is to use the pipeline. The example here shows Jenkins.

Best Practice 10 – Build binaries only once

A primary goal of a CI/CD pipeline is to build confidence in your changes and minimize the chance of unexpected impact. If your pipeline requires building, packaging, or bundling step, that step should be executed only once, and the resulting output should be reused throughout the entire pipeline. This practice helps prevent problems that arise while the code is being compiled or packaged. Also, if you have test cases written and you are building the same code for the different environments this ensures you are replicating the testing effort and time in each environment.

To avoid this problem, CI systems should include a build process as the first step in the pipeline that creates and packages the software in a clean environment/temporary workspace. The resulting artifact should be versioned and uploaded to an artifact storage system to be pulled down by subsequent stages of the pipeline, ensuring that the build does not change as it progresses through the system.

Conclusion

In this blog, we’ve started with CI/CD best practices. Hopefully, it has been useful. My next blog in this series will focus on some more best practice in CI/CD world so keep watching and happy reading. Until next time!

The post Continuous Integration Best Practices – Part 1 appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

The AI mindset: Bridging industry and academic perspectives

SnapLogic - Tue, 10/23/2018 - 15:54

Eventually, all organizations will have to leverage machine learning to stay competitive. Just like with the move to web applications, mobile applications, and the cloud, machine learning will be the norm, not the exception, to a modern technology stack. While there is certainly a lot of hype surrounding artificial intelligence and machine learning technology, there[...] Read the full article here.

The post The AI mindset: Bridging industry and academic perspectives appeared first on SnapLogic.

Categories: ETL

Take Advantage of Elastic Stack to monitor Liferay DXP

Liferay - Tue, 10/23/2018 - 07:49

As many of you probably know, starting with Liferay DXP, Elasticsearch is the default Search Engine. In fact, by default, an Elasticsearch instance is embedded in Liferay DXP (it’s a good moment to remind everyone that this is not supported for Production environments, where a separate Elasticsearch instance must be created).

In this article we’re going to outline how to use the Elastic Stack in order to monitor Liferay DXP.

The first idea about this article was to create a step by step guide, but I decided to avoid this approach because it'd require very detailed configurations that in the end would be slightly different on each installation, and also because anyone who uses the Elastic Stack should get a minimum knowledge in order to be able to monitor things that are important for his/her particular situation, so the learning process is going to be beneficial.

 

So what’s the Elastic Stack?

Elastic Stack is a set of products created by Elastic, it’s also known as ELK, which comes from Elasticsearch, Logstash and Kibana.

  • Elasticsearch is the index engine where information is stored.

  • Logstash is a program that sends information to Elasticsearch.

  • Kibana is  a server where we can create dashboards and graphics from the information that Logstash stored on Elasticsearch.

In this example we are not going to alter, nor consume the indexes created on Elasticsearch for Liferay DXP, we are going to create our own indexes just for monitoring purposes.

 

Setting up an Elastic Stack environment.

Kibana is a central service for the ELK stack, so for practical purposes we chose to install it on the same server where Elasticsearch is installed. Also because Logstash has to access Liferay logs, we chose to install a Logstash service on each Liferay node server. Anyway this approach is not mandatory, you can install Kibana in a different server from Elasticsearch and point it to the right address. The following examples are based on a 2 node Liferay cluster (node1 and node2).

 

Collecting data with Logstash:

The first step for the ELK stack to work is to have Logstash collect data from Liferay logs and send it to Elasticsearch. We can create different pipelines for Logstash in order to define which data it should ingest and how to store it on Elasticsearch. Each pipeline is a json file where you can define three sections:

  • input → how the data is going to be collected, here we specify the Logstash plugin we are going to use and set its parameters.
  • filter → where we can alter the data that our plugin has collected before sending it to Elasticsearch
  • output → it indicates the endpoint where our Elasticsearch is and also the name of the index where it’s going to store the data (if it doesn’t exist, it’ll be created automatically).

We are going to focus on two different pipelines for this article, here is some general information about how to configure pipelines.

 

Liferay logs pipeline:

This pipeline collects the information from the Liferay logs using the file input plugin. This plugin reads the logs (specified in the input phase) and parses each line according to some conditions. In our case that parsing happens during the filter phase, where we tokenize the log message to extract the log level, java class, time, and all the data we want to extract from each log line. Finally, we choose in which index we want to store the extracted data using the output phase:

input { file { path => "/opt/liferay/logs/liferay*.log" start_position => beginning ignore_older => 0 type => "liferaylog" sincedb_path => "/dev/null" codec => multiline { pattern => "^%{TIMESTAMP_ISO8601}" negate => true what => previous } } } filter { if [type] == "liferaylog" { grok { match => { "message" => "%{TIMESTAMP_ISO8601:realtime}\s*%{LOGLEVEL:loglevel}\s*\[%{DATA:thread}\]\[%{NOTSPACE:javaclass}:%{DATA:linenumber}\]\s*%{GREEDYDATA:logmessage}" } tag_on_failure => ["error_message_not_parsed"] add_field => { "hostname" => "LIFERAY_SERVER_HOSTNAME"} } date { match => [ "realtime", "ISO8601" ] timezone => "UTC" remove_field => ["realtime"] } } } output { if [type] == "liferaylog" { elasticsearch { hosts => ["ELASTIC_SERVER_HOSTNAME:9200"] index => "logstash-liferay-log-node1-%{+YYYY.MM.dd}" } #stdout { codec => rubydebug } } }

 

JVM statistics pipeline:

The previous plugin is installed by default when we install Logstash, but the plugin we'll use to take JVM statistics (logstash-input-jmx) has to be installed manually.

When using this plugin, we should tell Elasticsearch that the information this plugin sends to a particular index has a decimal format, otherwise Elasticsearch will infer it based on the first data it‘ll receive, which could be interpreted as a long value.

To configure this we can execute a simple CURL to set up this information. We’ll make a HTTP call to Elasticsearch with some JSON data to tell Elasticsearch that all indexes beginning with “logstash-jmx-node” will treat its values as doubles. We just need to do this once and then Elasticsearch will know how to deal with our JMX data:

curl -H "Content-Type: application/json" -XPUT ELASTIC_SERVER_HOSTNAME:9200/_template/template-logstash-jmx-node* -d ' { "template" : "logstash-jmx-node*", "settings" : { "number_of_shards" : 1 }, "mappings": { "doc": { "properties": { "metric_value_number" : { "type" : "double" } } } } }'

Before using this plugin, we will also need to enable the JMX connection in the Liferay JVM. For example, if running on Tomcat, we can add this on the setenv.sh:

CATALINA_OPTS="$CATALINA_OPTS -Djava.rmi.server.hostname=LIFERAY_SERVER_HOSTNAME -Dcom.sun.management.jmxremote.port=5000 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"

To set up this plugin we need to create two different configuration files, the usual Logstash pipeline configuration and a JMX pipeline configuration which location is specified in the Logstash pipeline, in the jmx node:

  • Logstash pipeline: here we specify where the jmx configuration file (/opt/logstash/jmx) is and which index we are going to use.
input { jmx { path => "/opt/logstash/jmx" polling_frequency => 30 type => "jmx" nb_thread => 3 } } filter { mutate { add_field => { "node" => "node1" } } } output { if [type] == "jmx" { elasticsearch { hosts => [ "ELASTIC_SERVER_HOSTNAME:9200" ] index => "logstash-jmx-node1-%{+YYYY.MM.dd}" } } }
  • The JMX Configuration file:  where we decide which JMX statistics we want to collect and send to the index:
{ "host" : "localhost", "port" : 5000, "alias" : "reddit.jmx.elasticsearch", "queries" : [ { "object_name" : "java.lang:type=Memory", "object_alias" : "Memory" }, { "object_name" : "java.lang:type=Threading", "object_alias" : "Threading" }, { "object_name" : "java.lang:type=Runtime", "attributes" : [ "Uptime", "StartTime" ], "object_alias" : "Runtime" },{ "object_name" : "java.lang:type=GarbageCollector,name=ParNew", "object_alias" : "ParNew" },{ "object_name" : "java.lang:type=GarbageCollector,name=ConcurrentMarkSweep", "object_alias" : "MarkSweep" },{ "object_name" : "java.lang:type=OperatingSystem", "object_alias" : "OperatingSystem" },{ "object_name" : "com.zaxxer.hikari:type=Pool (HikariPool-1)", "object_alias" : "Hikari1" },{ "object_name" : "com.zaxxer.hikari:type=Pool (HikariPool-2)", "object_alias" : "Hikari2" },{ "object_name" : "Catalina:type=ThreadPool,name=\"http-nio-8080\"", "object_alias" : "HttpThread" },{ "object_name" : "java.lang:type=MemoryPool,name=Metaspace", "object_alias" : "Metaspace" },{ "object_name" : "java.lang:type=MemoryPool,name=Par Eden Space", "object_alias" : "Eden" },{ "object_name" : "java.lang:type=MemoryPool,name=CMS Old Gen", "object_alias" : "Old" },{ "object_name" : "java.lang:type=MemoryPool,name=Par Survivor Space", "object_alias" : "Survivor" }] }

 

Monitoring with Kibana:

Once all this information is being processed and indexed, it's time to create dashboards and visualizations on Kibana.

First we have to point Kibana to the Elasticsearch index whose data we are going to consume. We’ll use the kibana.yml configuration file for this purpose.

 

Logtrail:

We are also going to install a plugin (logtrail) that allows us to see the logs, in a similar way a tail instruction does, via Kibana’s UI. This way, we can share logs with developers, sysadmins, devops, project managers… without having to give all those people access to the actual server where the logs are.

 

 

How to create visualizations and dashboards

Once we have pointed Kibana to the index and installed Logtrail, then we can start creating dashboards and visualizations in Kibana. To create dashboards and visualizations is easy in Kibana, we just need to use the UI. The steps we need to follow to create a visualization are:

  1. Indicate which index patterns we are going to deal with. In our case we wanted to use JMX and log information from two separate Liferay nodes. In our example:

    logstash-jmx-node1-*

    logstash-liferay-log-node1-*

    logstash-jmx-node2-*

    logstash-liferay-log-node2-*
  2. Create a search query in Kibana using Lucene queries. In this example we’re retrieving Process and System CPU Loads retrieved via JMX:

    metric_path: "reddit.jmx.elasticsearch.OperatingSystem.ProcessCpuLoad" || metric_path: "reddit.jmx.elasticsearch.OperatingSystem.SystemCpuLoad"
  3. Create different visual components to show that information.There are different kind of visualizations we can create, like an histogram showing the cpu usage we have recorded using the JMX plugin:

    Or a table counting the times a java class appears in the log with the ERROR loglevel:

  4. Create dashboards that group all the visualizations you want. Dashboards are collections of visualizations:

To conclude:

The ELK stack is a very powerful way to monitor and extract information of running Liferay instances, and in my opinion the main advantages of this system are:

  • Democratize logs: so everybody can access them, not only sysadmins. The logs can also be searched easily.

  • Historical of JMX stats, so it’s possible to know how the CPU, memory, database pools were on a given time.

I hope I have convinced you that ELK can help you monitor your Liferay DXP installations, and you feel you're ready to start creating the monitoring system that covers your needs.

Eduardo Perez 2018-10-23T12:49:00Z
Categories: CMS, ECM

LiferayPhotos: a fully functional app in 10 days using Liferay Screens. Part 2 Architecture and planning

Liferay - Tue, 10/23/2018 - 05:36

In the previous blog post we covered the beginning of the project, this includes a description of the app features, advantages of using Liferay Screens and lastly the wireframe.

In this one, we will cover how we can “screenletize” the different screens of the application and build them combining different screenlets. We will also organise the 10 days and plan what are the tasks we are going to work on each day.

“Screenletizing” the app

In this step, we have to take each screen and think about which screenlet we can use to build each part, this way we can combine those screenlets together to build the final screen. Before we continue, you can check out available screenlets here.

We will extract these images from the prototype included in the previous post.

Log In

In this case, the solution is very straightforward, we need the Login Screenlet.

Let’s cover a more complex example

Feed

We have a feed screen where the user will see all the pictures that are uploaded onto the platform with its description and number of likes.

This screen is basically a list of images, but each image has a lot of additional information (user, user image, number of likes, description). To render a List of Images we have a dedicated screenlet, the ImageGalleryScreenlet.

The default version of the ImageGalleryScreenlet only displays a list of images in a Grid, but as we learned in the previous post, we can easily customise the UI.

We have to display more information in each image cell, and… exactly, we will also be using screenlets to do this.

 

 

  • To render the user information we will be using the Asset Display Screenlet. This screenlet will show any asset of Liferay and, depending on the inner type, it will choose a specific UI. In this case, as it is a user we will render the name and the profile picture, and of course, to render the profile picture we will use another screenlet, the Userportrait Screenlet.
  • To display the likes, we have another screenlet: the Rating Screenlet. In this case, we don’t have a default UI that represent likes, so we will need to create one.
 

The great thing about separating view parts into screenlets is that it will allow to reuse them in other parts of the application, or even in other applications.

Planning

After we have finished extracting screenlets of our app, we can create a schedule for the development of the application.

The app will be developed by two people. One will be working part-time and the other full time. The person that will be developing it full time is Luis Miguel Barcos, our great intern. He doesn’t have previous experience using Liferay Screens, which will make the journey even more interesting. He will write the next blog post talking about the experience and a more technical look at the project.

Here is the detailed planning of the app:

Conclusion

In this blog post, we have shown how we are going to “screenletize” the application in order to divide it into small and reusable pieces. We have also created a planning scheme to organise the 10 days we have to work on this app.

Be sure to check out the next and last blog post with the final result.

 

Victor Galan 2018-10-23T10:36:00Z
Categories: CMS, ECM

LiferayPhotos: a fully functional app in 10 days using Liferay Screens. Part 1 Introduction 

Liferay - Tue, 10/23/2018 - 05:28

Before we begin with this story, for those of you that don’t know about Liferay Screens. Liferay Screens is a library for iOS, Android and Xamarin that aims to speed up the development of native applications that use Liferay as a backend. It’s basically a collection of components, called Screenlets. These components are very easy to connect to your Liferay instance and provide plenty of features.

This is going to be a series of 3 blog post describing the creation of an iOS application: LiferayPhotos. We are going to describe how we did it from beginning to end and what are the advantages of using Liferay Screens in your projects. All within a 10-day timeframe.

This first blog post will cover the first aspects of the application, from the initial idea to the mockups of how the application should be.

Idea

The application that we want to build is inspired by a well-known app: Instagram. We want to build an application where a user can:

  • Log in.

  • Sign up.

  • Upload pictures, and add stickers too!

  • See other people’s uploaded pictures.

  • Comment pictures.

  • Like pictures.

  • Be able to chat with other users.

  • See your profile with your uploaded pictures.

With all of this in mind, let's move on to the decision of using Liferay Screens to develop this application.

Why use Liferay Screens?

As you have read in the title, we are going to build this app in only 10 days, yes you have read it right. This is one of the greatest benefits of using Liferay Screens, it makes the development of apps really fast. But… how is this even possible? As I mentioned earlier, the library contains a set of ready-to-use components to cover most of our use cases.

Are you tired of implementing a login screen over and over again? By just using a Login Screenlet, adding a couple of line of code, and configuring it with your Liferay parameters, it’s done! You now have a working login screen in your application.

Okay, this seems promising, but what if the UI doesn’t fit your brand style? No problem whatsoever, the screenlets are designed to support this scenario, UI and business logic are decoupled so you can change it easily without having to rewrite the logic.

Last thing you might ask, what if the use case I want it’s not supported by the available screenlets? This is not a problem either, within Liferay Screens there is also a toolkit to build your own screenlets and reuse them in your projects.

Prototyping

Before working on the app, we have to visualize the application wireframe to split and schedule the task involved. We have created a basic prototype of the different screens of the application:


Conclusion

As you can see here, there are a lot of advantages to using Liferay Screens in your developments. But this is not all… in the next two posts, we will be unveiling other interesting parts of the project and its lifecycle.

Check out the next post of the series

Victor Galan 2018-10-23T10:28:00Z
Categories: CMS, ECM

How To Save $$$ on Advertising With Automated Personalization?

PrestaShop - Tue, 10/23/2018 - 02:45
Every online marketer has to make sure that the flow of new visitors into your PrestaShop is not ending.
Categories: E-commerce

Vtiger recognized as a 2018 Gartner Peer Insights Customers’ Choice for Sales Force Automation

VTiger - Mon, 10/22/2018 - 16:46
We are excited to announce that we have been named a Gartner’s 2018 Peer Insights Customers’ Choice for Sales Force Automation. All of us here at Vtiger are proud of this demonstration of our growing commitment to putting customers at the heart of what we do. The award, Gartner explains “is a recognition of vendors […]
Categories: CRM

Asset Display Contributors in Action

Liferay - Sun, 10/21/2018 - 14:19

Display pages functionality in Liferay always was tightly coupled to the Web Content articles, we never had plans to support more or less the same technology for other types of assets even though we had many of these types: Documents & Media, Bookmarks, Wiki, etc... Even User is an asset and every user has corresponding AssetEntry in the database. But for Liferay 7.1 we decided to change this significantly, we introduced a new concept for the Display Pages, based on Fragments, very flexible, much more attractive than the old ones and...we still support only Web Content articles visualization :). Good news for the developers is that the framework is extensible now and it is easy to implement an AssetDisplayContributor and visualize any type of asset using our great display pages, based on fragments and in this article I want to show you how to do it with an example.

Let's imagine that we want to launch a recruitment site, typical one with tons of job-offers, candidates profiles, thematic blogs etc. One of the main functionalities must be a candidate profile page - some sort of landing page with the basic candidate information, photo, personal summary, and skills. And this task can be solved using new Display Pages.

As I mentioned before, User is an Asset in Liferay and there is a corresponding AssetEntry for each User, it is good since for now, we support visualization only for the Asset Entries. To achieve our goal we need two things, first - an AssetDisplayContributor implementation for the User, to know which fields are mappable and which values correspond to those fields, and second - custom friendly URL resolver to be able to get our users profile page by a friendly URL with the user's screen name in it.

 Let's implement the contributor first, it is very simple(some repeated code is skipped, the full class available on GitHub):

@Component(immediate = true, service = AssetDisplayContributor.class) public class UserAssetDisplayContributor implements AssetDisplayContributor { @Override public Set<AssetDisplayField> getAssetDisplayFields( long classTypeId, Locale locale) throws PortalException { Set<AssetDisplayField> fields = new HashSet<>(); fields.add( new AssetDisplayField( "fullName", LanguageUtil.get(locale, "full-name"), "text"));   /* some fields skipped here, see project source for the full implementation */ fields.add( new AssetDisplayField( "portrait", LanguageUtil.get(locale, "portrait"), "image")); return fields; } @Override public Map<String, Object> getAssetDisplayFieldsValues( AssetEntry assetEntry, Locale locale) throws PortalException { Map<String, Object> fieldValues = new HashMap<>(); User user = _userLocalService.getUser(assetEntry.getClassPK()); fieldValues.put("fullName", user.getFullName());   /* some fields skipped here, see project source for the full implementation */       ServiceContext serviceContext = ServiceContextThreadLocal.getServiceContext(); fieldValues.put( "portrait", user.getPortraitURL(serviceContext.getThemeDisplay())); return fieldValues; } @Override public String getClassName() { return User.class.getName(); } @Override public String getLabel(Locale locale) { return LanguageUtil.get(locale, "user"); } @Reference private UserLocalService _userLocalService; }

As you can see, there are two main methods - getAssetDisplayFields which defines the set of AssetDisplayField objects, with the field name, label and the type (for the moment we support two types - text and image trying to convert to text all non-text values, like numbers, booleans, dates, and lists of strings) and getAssetDisplayFieldsValues which defines the values for those fields using specific AssetEntry instance.  There is a possibility to provide different field sets for the different subtypes of entities like we do it for the different Web Content structures, using the classTypeId parameter.

The second task is to implement corresponding friendly URL resolver to be able to get our profiles by users screen name. Here I'll show only the implementation of the getActualURL method of FriendlyURLResolver interface because it is the method that matters, but the full code of this resolver is also available in GitHub.

@Override public String getActualURL( long companyId, long groupId, boolean privateLayout, String mainPath, String friendlyURL, Map<String, String[]> params, Map<String, Object> requestContext) throws PortalException { String urlSeparator = getURLSeparator(); String screenName = friendlyURL.substring(urlSeparator.length()); User user = _userLocalService.getUserByScreenName(companyId, screenName); AssetEntry assetEntry = _assetEntryLocalService.getEntry(User.class.getName(), user.getUserId()); HttpServletRequest request = (HttpServletRequest)requestContext.get("request"); ServiceContext serviceContext = ServiceContextFactory.getInstance(request); AssetDisplayPageEntry assetDisplayPageEntry = _assetDisplayPageEntryLocalService.fetchAssetDisplayPageEntry( assetEntry.getGroupId(), assetEntry.getClassNameId(), assetEntry.getClassPK()); if (assetDisplayPageEntry == null) { LayoutPageTemplateEntry layoutPageTemplateEntry = _layoutPageTemplateEntryService. fetchDefaultLayoutPageTemplateEntry(groupId, assetEntry.getClassNameId(), 0); _assetDisplayPageEntryLocalService.addAssetDisplayPageEntry( layoutPageTemplateEntry.getUserId(), assetEntry.getGroupId(), assetEntry.getClassNameId(),   assetEntry.getClassPK(), layoutPageTemplateEntry.getLayoutPageTemplateEntryId(), serviceContext); } String requestUri = request.getRequestURI(); requestUri = StringUtil.replace(requestUri, getURLSeparator(), "/a/"); return StringUtil.replace( requestUri, screenName, String.valueOf(assetEntry.getEntryId())); }

The key part here is that we need to know which AssetDisplayPageEntry corresponds to the current user. For the Web Content articles, we have a corresponding UI to define Display Page during the content editing. In the case of User, it is also possible to create the UI and save the ID of the page in the DB but to make my example simple I prefer to fetch default display page for the User class and create corresponding AssetDisplayPageEntry if it doesn't exist. And at the end of the method, we just redirect the request to our Asset Display Layout Type Controller to render the page using corresponding page fragments.

That's it. There are more tasks left, but there is no need to deploy anything else. Now let's prepare the fragments, create a Display Page and try it out! For our Display Page, we need 3 fragments: Header, Summary, and Skills. You can create your own fragments with editable areas and map them as you like, but in case if you are still not familiar with the new Display Pages mapping concept I recommend you to download my fragments collection and import them to your site.

When you have your fragments ready you can create a Display Page, just go to Build -> Pages -> Display Pages, click plus button and put the fragments in the order you like. This is how it looks using my fragments: 

Clicking on any editable area(marked with the dashed background) you can map this area to any available field of available Asset Type(there should be 2 - Web Content Article and User). Choose User type and map all the fields you would like to show on the Display Page and click Publish button. After publishing it is necessary to mark our new Display Page as default for this Asset Type, this action is available in the kebab menu of the display page entry:

Now we can create a user and try our new Display Page. Make sure you specified all the fields you mapped, in my case the fields are - First name, Last name, Job title, Portrait, Birthday, Email, Comments(as a Summary), Tags(as Skills list) and Organization(as Company). Save the user and use it's screen name to get the final result:

It is possible to create a corresponding AssetDisplayContributor for any type of Asset and use it to visualize your assets in a brand new way using Fragment-based Display Pages.

Full code of this example is available here.

Hope it helps! If you need any help implementing contributors for your Assets feel free to ask in comments.

Pavel Savinov 2018-10-21T19:19:00Z
Categories: CMS, ECM

Online training: Custom Fields and Profiles, October 24th

CiviCRM - Fri, 10/19/2018 - 10:12

Would you like to gather information from your contacts that is relevant to your organization when they sign up for an event, membership or make an online donation? Do you have questions about creating an online newsletter sign up? 

The upcoming training session "Customize your Database with Custom Fields and Profiles" taught by Cividesk will demonstrate best practices for creating custom fields and how to include those fields in a profile. You will also learn about the many uses of profiles and how this feature can be maximized to improve your organization's efficiency.

Categories: CRM

How to use the price comparison for an online shop?

PrestaShop - Fri, 10/19/2018 - 07:47
Still don´t know the big advantages of using a price comparer for your business site?
Categories: E-commerce

UK CiviSprint as a newbie

CiviCRM - Fri, 10/19/2018 - 03:41

I didn’t hide the fact that I’d been feeling daunted by the prospect of the Sprint. Knowing that I’d be the least techie by some way even amongst the non-devs, I was also acutely aware of being a newbie to the community - after a year and a half as a CiviCRM user, I’d only had five weeks of working with Rose Lanigan and learning the basics of implementation. But I needn’t have worried, soon realising that:

a)       In any group, someone has to be the least technical. It’s an opportunity to learn and to bring a different perspective.

Categories: CRM

Fragments extension: Fragment Entry Processors

Liferay - Fri, 10/19/2018 - 00:02

In Liferay 7.1 we presented a new vision to the page authoring process. The main idea was to empower business users to create pages and visualize contents in a very visual way, without a need to know technical stuff like Freemarker or Velocity for the Web Content templates. To make it possible we introduced the fragment concept.

In our vision fragment is a construction block which can be used to build new content pages, display pages or content page templates. Fragment consists of HTML markup, CSS stylesheet, and Javascript code.

Despite the fact that we really wanted to create a business-user-friendly application, we always remember about our strong developers community and their needs. Fragments API is extensible and allows you to create your custom markup tags to enrich your fragments code and in this article, I would like to show you how to create your own processors for the fragments markup.

As an example, we are going to create a custom tag which shows a UNIX-style fortune cookie :). Our fortune cookie module has the following structure:

We use Jsoup library to parse fragments HTML markup so we have to include it into our build file(since it doesn't come within the Portal core) among other dependencies. 

sourceCompatibility = "1.8" targetCompatibility = "1.8" dependencies { compileInclude group: "org.jsoup", name: "jsoup", version: "1.10.2" compileOnly group: "com.liferay", name: "com.liferay.fragment.api", version: "1.0.0" compileOnly group: "com.liferay", name: "com.liferay.petra.string", version: "2.0.0" compileOnly group: "com.liferay.portal", name: "com.liferay.portal.kernel", version: "3.0.0" compileOnly group: "javax.portlet", name: "portlet-api", version: "3.0.0" compileOnly group: "javax.servlet", name: "javax.servlet-api", version: "3.0.1" compileOnly group: "org.osgi", name: "org.osgi.service.component.annotations", version: "1.3.0" }

OSGi bnd.bnd descriptor has nothing special because we don't export any package and don't provide any capability:

Bundle-Name: Liferay Fragment Entry Processor Fortune Bundle-SymbolicName: com.liferay.fragment.entry.processor.fortune Bundle-Version: 1.0.0

Every Fragment Entry Processor implementation has two main methods, first one - to process fragments HTML markup and second - to validate the markup to avoid saving fragments with invalid markup.

/** * @param fragmentEntryLink Fragment Entry link object to get editable * values needed for a particular case processing. * @param html Fragment markup to process. * @param mode Processing mode (@see FragmentEntryLinkConstants) * @return Processed Fragment markup. * @throws PortalException */ public String processFragmentEntryLinkHTML( FragmentEntryLink fragmentEntryLink, String html, String mode) throws PortalException; /** * @param html Fragment markup to validate. * @throws PortalException In case of any invalid content. */ public void validateFragmentEntryHTML(String html) throws PortalException;

FragmentEntryLink object gives us access to the particular fragment usage on a page, display page or page template and can be used if we want our result to depend on the particular usage parameters. Mode parameter can be used to give additional processing(or remove unnecessary processing) options in the EDIT(or VIEW) mode.

In this particular case, we don't need the validation method, but we have a good example in the Portal code.

Let's implement our fortune cookie tag processor! The only thing we have to do here is to iterate through all fortune tags we meet and replace them with a random cookie text. As I mentioned before, we use Jsoup to parse the markup and work with the document:

@Override public String processFragmentEntryLinkHTML( FragmentEntryLink fragmentEntryLink, String html, String mode) { Document document = _getDocument(html); Elements elements = document.getElementsByTag(_FORTUNE); Random random = new Random(); elements.forEach( element -> { Element fortuneText = document.createElement("span"); fortuneText.attr("class", "fortune"); fortuneText.text(_COOKIES[random.nextInt(7)]); element.replaceWith(fortuneText); }); Element bodyElement = document.body(); return bodyElement.html(); } private Document _getDocument(String html) { Document document = Jsoup.parseBodyFragment(html); Document.OutputSettings outputSettings = new Document.OutputSettings(); outputSettings.prettyPrint(false); document.outputSettings(outputSettings); return document; } private static final String[] _COOKIES = { "A friend asks only for your time not your money.", "If you refuse to accept anything but the best, you very often get it.", "Today it's up to you to create the peacefulness you long for.", "A smile is your passport into the hearts of others.", "A good way to keep healthy is to eat more Chinese food.", "Your high-minded principles spell success.", "The only easy day was yesterday." }; private static final String _FORTUNE = "fortune";

 

That is it. After deploying this module to our Portal instance, fortune tag is ready to use in the Fragments editor:

It is up to you how to render your personal tag, which attributes to use, which technology to use to process tags content. You can even create your own script language, or apply the one which you already have in your CMS to avoid massive refactoring and use existing templates as-is.

Full Fortune Fragment Entry Processor code can be found here.

Hope it helps!

Pavel Savinov 2018-10-19T05:02:00Z
Categories: CMS, ECM

Introduction to the Agile Data Lake

Talend - Thu, 10/18/2018 - 09:41

Let’s be honest, the ‘Data Lake’ is one of the latest buzz-words everyone is talking about. Like many buzzwords, few really know how to explain what it is, what it is supposed to do, and/or how to design and build one.  As pervasive as they appear to be, you may be surprised to learn that Gartner predicts that only 15% of Data Lake projects make it into production.  Forrester predicts that 33% of Enterprises will take their attempted Data Lake projects off life-support.  That’s scary!  Data Lakes are about getting value from enterprise data, and given these statistics, its nirvana appears to be quite elusive.  I’d like to change that and share my thoughts and hopefully providing some guidance for your consideration on how to design, build, and use a successful Data Lake; An Agile Data Lake.  Why agile? Because to be successful, it needs to be.

Ok, to start, let’s look at the Wikipedia definition for what a Data Lake is:

“A data lake is a storage repository that holds a vast amount of raw data in its native format, incorporated as structured, semi-structured, and unstructured data.”

Not bad.  Yet considering we need to get value from a Data Lake this Wikipedia definition is just not quite sufficient. Why? The reason is simple; you can put any data in the lake, but you need to get data out and that means some structure must exist. The real idea of a data lake is to have a single place to store of all enterprise data, ranging from raw data (which implies an exact copy of source system data) through transformed data, which is then used for various business needs including reporting, visualization, analytics, machine learning, data science, and much more.

I like a ‘revised’ definition from Tamara Dull, Principal Evangelist, Amazon Web Services, who says:

“A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data, where the data structure and requirements are not defined until the data is needed.”

Much better!  Even Agile-like. The reason why this is a better definition is that it incorporates both the prerequisite for data structures and that the stored data would then be used in some fashion, at some point in the future.  From that we can safely expect value and that exploiting an Agile approach is absolutely required.  The data lake therefore includes structured data from relational databases (basic rows and columns), semi-structured data (like CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (typically images, pictures, audio, & video) thus creating a centralized data store accommodating all forms of data.  The data lake then provides an information platform upon which to serve many business use cases when needed.  It is not enough that data goes into the lake, data must come out too.

And, we want to avoid the ‘Data Swamp’ which is essentially a deteriorated and/or unmanaged data lake that is inaccessible to and/or unusable by its intended users, providing little to no business value to the enterprise.  Are we on the same page so far?  Good.

Data Lakes – In the Beginning

Before we dive deeper, I’d like to share how we got here.  Data Lakes represent an evolution resulting from an explosion of data (volume-variety-velocity), the growth of legacy business applications plus numerous new data sources (IoT, WSL, RSS, Social Media, etc.), and the movement from on-premise to cloud (and hybrid). 

Additionally, business processes have become more complex, new technologies have recently been introduced enhancing business insights and data mining, plus exploring data in new ways like machine learning and data science.  Over the last 30 years we have seen the pioneering of a Data Warehouse (from the likes of Bill Inmon and Ralph Kimball) for business reporting all the way through now to the Agile Data Lake (adapted by Dan Linstedt, yours truly, and a few other brave souls) supporting a wide variety of business use cases, as we’ll see.

To me, Data Lakes represent the result of this dramatic data evolution and should ultimately provide a common foundational, information warehouse architecture that can be deployed on-premise, in the cloud, or a hybrid ecosystem. 

Successful Data Lakes are pattern based, metadata driven (for automation) business data repositories, accounting for data governance and data security (ala GDPR & PII) requirements.  Data in the lake should present coalesced data and aggregations of the “record of truth” ensuring information accuracy (which is quite hard to accomplish unless you know how), and timeliness.  Following an Agile/Scrum methodology, using metadata management, applying data profiling, master data management, and such, I think a Data Lake must represent a ‘Total Quality Management” information system.  Still with me?  Great!

What is a Data Lake for?

Essentially a data lake is used for any data-centric, business use case, downstream of System (Enterprise) Applications, that help drive corporate insights and operational efficiency.  Here are some common examples:

  • Business Information, Systems Integration, & Real Time data processing
  • Reports, Dashboards, & Analytics
  • Business Insights, Data Mining, Machine Learning, & Data Science
  • Customer, Vendor, Product, & Service 360

How do you build an Agile Data Lake? As you can see there are many ways to benefit from a successful Data Lake.  My question to you is, are you considering any of these?  My bet is that you are.  My next questions are; Do you know how to get there?  Are you able to build a Data Lake the RIGHT way and avoid the swamp?  I’ll presume you are reading this to learn more.  Let’s continue…

There are three key principles I believe you must first understand and must accept:

  • ⇒ A PROPERLY implemented Ecosystem, Data Models, Architecture, & Methodologies
  • ⇒ The incorporation of EXCEPTIONAL Data Processing, Governance, & Security
  • ⇒ The deliberate use of Job Design PATTERNS and BEST PRACTICES

A successful Data Lake must also be agile which then becomes a data processing and information delivery mechanism designed to augment business decisions and enhance domain knowledge.  A Data Lake, therefore, must have a managed lifecycle.  This life cycle incorporates 3 key phases:

  1. INGESTION:
    • Extracting raw source data, accumulating (typically written to flat files) in a landing zone or staging area for downstream processing & archival purposes
  2. ADAPTATION:
    • Loading & Transformation of this data into usable formats for further processing and/or use by business users
  3. CONSUMPTION:
    • Data Aggregations (KPI’s, Data-points, or Metrics)
    • Analytics (actuals, predictive, & trends)
    • Machine Learning, Data Mining, & Data Science
    • Operational System Feedback & Outbound Data Feeds
    • Visualizations, & Reporting

The challenge is how to avoid the swamp.  I believe you must use the right architecture, data models, and methodology.  You really must shift away your ‘Legacy’ thinking; adapt and adopt a ‘Modern’ approach.  This is essential.  Don’t fall into the trap of thinking you know what a data lake is and how it works until you consider these critical points.

Ok then, let’s examine then these three phases a bit more.  Data Ingestion is about capturing data, managing it, and getting it ready for subsequent processing.  I think of this like a box crate of data, dumped onto the sandy beach of the lake; a landing zone called a ’Persistent Staging Area’.  Persistent because once it arrives, it stays there; for all practical purposes, once processed downstream, becomes an effective archive (and you don’t have to copy it somewhere else).  This PSA will contain data, text, voice, video, or whatever it is, which accumulates.

You may notice that I am not talking about technology yet.  I will but, let me at least point out that depending upon the technology used for the PSA, you might need to offload this data at some point.  My thinking is that an efficient file storage solution is best suited for this 1st phase.

Data Adaptation is a comprehensive, intelligent coalescence of the data which must adapt organically to survive and provide value.  These adaptations take several forms (we’ll cover them below) yet essentially reside 1st in a raw, lowest level of granulation, data model which then can be further processed, or as I call it, business purposed, for a variety of domain use cases.  The data processing requirements here can be quite involved so I like to automate as much of this as possible.  Automation requires metadata.  Metadata management presumes governance.  And don’t forget security.  We’ll talk about these more shortly.

Data Consumption is not just about business users, it is about business information, the knowledge it supports, and hopefully, the wisdom derived from it.  You may be familiar with the DIKW Pyramid; Data > Information > Knowledge > Wisdom.  I like to insert ‘Understanding’ after ‘Knowledge’ as it leads wisdom.

Data should be treated as a corporate asset and invested as such.  Data then becomes a commodity and allows us to focus on the information, knowledge, understanding, and wisdom derived from it.  Therefore, it is about the data and getting value from it. 

Data Storage Systems: Data Stores

Ok, as we continue to formulate the basis for building a Data Lake, let’s look at how we store data.  There are many ways we do this.  Here’s a review:

  • DATABASE ENGINES:
    • ROW: traditional Relational Database System (RDBMS) (ie: Oracle, MS SQL Server, MySQL, etc)
  • COLUMNAR: relatively unknown; feels like a RDBMS but optimized for Columns  (ie: Snowflake, Presto, Redshift, Infobright, & others)
  • NoSQL – “Not Only SQL”:
    • Non-Relational, eventual consistency storage & retrieval systems (ie: Cassandra, MongoDB, & more)
  • HADOOP:
    • Distributed data processing framework supporting high data Volume, Velocity, & Variety (ie: Cloudera, Hortonworks, MapR, EMR, & HD Insights)
  • GRAPH – “Triple-Store”:
    • Subject-Predicate-Object, index-free ‘triples’; based upon Graph theory (ie: AlegroGraph, & Neo4J)
  • FILE SYSTEMS:
    • Everything else under the sun (ie: ASCII/EBCDIC, CSV, XML, JSON, HTML, AVRO, Parquet)

There are many ways to store our data, and many considerations to make, so let’s simplify our life a bit and call them all ‘Data Stores’, regardless of them being Source, Intermediate, Archive, or Target data storage.  Simply pick the technology for each type of data store as needed.

Data Governance

What is Data Governance?  Clearly another industry enigma.  Again, Wikipedia to the rescue:

Data Governance is a defined process that an organization follows to ensure that high quality data exists throughout the complete lifecycle.”

Does that help?  Not really?  I didn’t think so.  The real idea of data governance is to affirm data as a corporate asset, invest & manage it formally throughout the enterprise, so it can be trusted for accountable & reliable decision making.  To achieve these lofty goals, it is essential to appreciate Source through Target lineage.  Management of this lineage is a key part of Data Governance and should be well defined and deliberately managed.  Separated into 3 areas, lineage is defined as:

  • ⇒ Schematic Lineage maintains the metadata about the data structures
  • ⇒ Semantic Lineage maintains the metadata about the meaning of data
  • ⇒ Data Lineage maintains the metadata of where data originates & its auditability as it changes allowing ‘current’ & ‘back-in-time’ queries

It is fair to say that a proper, in-depth discussion on data governance, metadata management, data preparation, data stewardship, and data glossaries are essential, but if I did that here we’d never get to the good stuff.  Perhaps another blog?  Ok, but later….

Data Security

Data Lakes must also ensure that personal data (GDPR & PII) is secure and can be removed (disabled) or updated upon request.  Securing data requires access policies, policy enforcement, encryption, and record maintenance techniques.  In fact, all corporate data assets need these features which should be a cornerstone of any Data Lake implementation.  There are three states of data to consider here:

  • ⇒ DATA AT REST in some data store, ready for use throughout the data lake life cycle
  • ⇒ DATA IN FLIGHT as it moves through the data lake life cycle itself
  • ⇒ DATA IN USE perhaps the most critical, at the user-facing elements of the data lake life cycle

Talend works with several technologies offering data security features.  In particular, ‘Protegrity Cloud Security’ provides these capabilities using Talend specific components and integrated features well suited for building an Agile Data Lake.  Please feel free to read “BUILDING A SECURE CLOUD DATA LAKE WITH AWS, PROTEGRITY AND TALEND” for more details.  We are working together with some of our largest customers using this valuable solution.

Agile Data Lake Technology Options

Processing data into and out of a data lake requires technology, (hardware/software) to implement.  Grappling with the many, many options can be daunting.  It is so easy to take these for granted, picking anything that sounds good.  It’s only after or until better understanding the data involved, systems chosen, and development efforts does one find that the wrong choice has been made.  Isn’t this the definition of a data swamp?  How do we avoid this?

A successful Data Lake must incorporate a pliable architecture, data model, and methodology.  We’ve been talking about that already.  But picking the right ‘technology’ is more about the business data requirements and expected use cases.  I have some good news here.  You can de-couple the data lake designs from the technology stack.  To illustrate this, here is a ‘Marketecture’ diagram of depicting the many different technology options crossing through the agile data lake architecture.

As shown above, there are many popular technologies available, and you can choose different capabilities to suit each phase in the data lake life cycle.  For those who follow my blogs you already know I do have a soft spot for Data Vault.  Since I’ve detailed this approach before, let me simply point you to some interesting links:

You should know that Dan Linstedt created this approach and has developed considerable content you may find interesting.  I recommend these:

I hope you find all this content helpful.  Yes, it is a lot to ingest, digest, and understand (Hey, that sounds like a data lake), but take the time.  If you are serious about building and using a successful data lake you need this information.

The Agile Data Lake Life Cycle

Ok, whew – a lot of information already and we are not quite done.  I have mentioned that a data lake has a life cycle.  A successful Agile Data Lake Life Cycle incorporates the 3 phases I’ve described above, data stores, data governance, data security, metadata management (lineage), and of course: ‘Business Rules’.  Notice that what we want to do is de-couple ‘Hard’ business rules (that transform physical data in some way) from ‘Soft’ business rules (that adjust result sets based upon adapted queries).  This separation contributes to the life cycle being agile. 

Think about it, if you push physical data transformations upstream then when the inevitable changes occur, the impact is less to everything downstream.  On the flip side, when the dynamics of business impose new criteria, changing a SQL ‘where’ clauses downstream will have less impact on data models it pulls from.  The Business Vault provides this insulation from the Raw Data Vault as it can be reconstituted when radical changes occur.

Additionally, a Data Lake is not a Data Warehouse but in fact, encapsulates one as a use case.  This is a critical takeaway from this blog.  Taking this further, we are not creating ‘Data Marts’ anymore, we want ‘Information Marts’.   Did you review the DIKW Pyramid link I mentioned above?  Data should, of course, be considered and treated as a business asset.  Yet simultaneously, data is now a commodity leading us to information, knowledge, and hopefully: wisdom.

This diagram walks through the Agile Data Lake Life Cycle from Source to Target data stores.  Study this.  Understand this.  You may be glad you did.  Ok, let me finish to say that to be agile a data lake must:

  • BE ADAPTABLE
    • Data Models should be additive without impact to existing model when new sources appear
  • BE INSERT ONLY
    • Especially for Big Data technologies where Updates & Deletes are expensive
  • PROVIDE SCALABLE OPTIONS
    • Hybrid infrastructures can offer extensive capabilities
  • ALLOW FOR AUTOMATION
    • Metadata, in many aspects, can drive the automation of data movement
  • PROVIDE AUDITABLE, HISTORICAL DATA
    • A key aspect of Data Lineage

And finally, consider that STAR Schemas are, and always were, designed to be ‘Information Delivery Mechanisms’, a misunderstanding some in the industry has fostered for many years.  For many years we have all built Data Warehouses using STAR schemas to deliver reporting and business insights.  These efforts all too often resulted in raw data storage of the data warehouse in rigid data structures, requiring heavy data cleansing, and frankly high impact when upstream systems are changed or added. 

The cost in resources and budget has been a cornerstone to many delays, failed projects, and inaccurate results.  This is a legacy mentality and I believe it is time to shift our thinking to a more modern approach.  The Agile Data Lake is that new way of thinking.  STAR schemas do not go away, but their role has shifted downstream, where they belong and always intended for.

Conclusion

This is just the beginning, yet I hope this blog post gets you thinking about all the possibilities now.

As a versatile technology and coupled with a sound architecture, pliable data models, strong methodologies, thoughtful job design patterns, and best practices, Talend can deliver cost-effective, process efficient and highly productive data management solutions.  Incorporate all of this as I’ve shown above and not only will you create an Agile Data Lake, but you will avoid the SWAMP!

Till next time…

The post Introduction to the Agile Data Lake appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Using BOMs to Manage Liferay Dependency Versions

Liferay - Wed, 10/17/2018 - 15:15

Liferay is a large project, and many developers who are attempting to get their customizations to work with Liferay will often end up asking the question, "What version of module W should I use at compile time when I'm running on Liferay X.Y.Z?" To answer that question, Liferay has some instructions on how to find versions in its document, Configuring Dependencies.

This blog entry is really to talk about what to do in situations where those instructions just aren't very useful.

Path to Unofficial BOMs

First, a little bit of background, because I think context is useful to know, though you can skip it if you want to get right to working with BOMs.

Back in late 2016, I started to feel paranoid that we'd start introducing major version changes to packages in the middle of a release and nobody would notice. To ease those concerns, I wrote a tool that indexed all of the packageinfo files in Liferay at each tag, and then I loaded up these metadata files with a Jupyter notebook and did a check for a major version change.

Then, like many other is it worth the time problems, it evolved into a script that I'd run once a week, and a small web-based tool so that I wouldn't have to fire up Jupyter every time I needed to check what was essentially static information.

Fast forward to February 2017, and our internal knowledge base was updated to allow for a new wiki format which (accidentally?) provided support for HTML with script tags. So, I chose to share my mini web-based tool to the wiki, which then lead our support team in Spain to share a related question that they'd been pondering for awhile.

Imagine if you happened to need to follow the Liferay document, Configuring Dependencies, for a lot of modules. Doesn't that lookup process get old really fast?

So, given that it was clearly possible to create an unofficial reference document for every Liferay exported package, wouldn't it be nice if we could create an official reference document that recorded every Liferay module version?

Since I had all of the metadata indexed anyway, I put together a separate tool that displayed the information stored in bnd.bnd files at every tag, which sort of let you look up module version changes between releases. This let people get a sense for what an official reference document might look like.

(Note: I learned a year later that bnd.bnd files are not the correct file to look at if you want to know the version at a given tag. Rather, you need to look at the artifact.properties files saved in the modules/.releng folder for that information. So in case it helps anyone feel better, Liferay's release and versioning process isn't obvious to anyone not directly involved with the release process, whether you're a Liferay employee or not.)

From the outside looking in, you might ask, why is it that our team didn't ask for Liferay to officially provide a "bill of materials" (BOMs), as described in the Introduction to Dependency Mechanism in the official Maven documentation? That way, you'd only specify the version of Liferay you're using, and the BOM would take care of the rest. If such a BOM existed, a lookup tool would be completely unnecessary.

Well, that's how the request actually started at the time of DXP's release, but since it wasn't happening, it got downgraded to a reference document which looked immediately achievable.

Fast forward to today. Still no official reference document for module versions, still no official Liferay BOMs.

However, by chance, I learned that the IDE team has been evaluating unofficial BOMs currently located on repository.liferay.com. These BOMs were generated as proof of concepts on what that BOM might include, and are referenced in some drafts of Liferay IDE tutorials. Since I now had an idea of what the IDE team itself felt a BOM should look like, I updated my web-based tool to use all of the collected metadata to dynamically generate BOMs for all indexed Liferay versions.

Install the Unofficial BOM

For sake of an example, assume that you want to install release-dxp-bom-7.1.10.pom.

The proof of concept for this version exists in the liferay-private-releases repository of repository.liferay.com, and Liferay employees can setup access to that repository to acquire that file. Since there are no fix packs, it is also functionally equivalent to the original 7.1.0 GA1 release, and you can use com.liferay.portal:release.portal.bom:7.1.0 instead.

However, if you wish to use a version for which a proof of concept has not been generated (or if you're a non-employee wanting to use an unofficial DXP BOM), you can try using Module Version Changes Since DXP Release to use the indexed metadata and generate a BOM for your Liferay version. If you go that route, open up the file in a text editor, and you should find something that looks like the following near the top of the file:

<groupId>com.liferay</groupId> <artifactId>release.dxp.bom</artifactId> <version>7.1.10</version> <packaging>pom</packaging>

With those values for the GROUP_ID, ARTIFACT_ID, VERSION, and PACKAGING, you would install the BOM to your local Maven repository by substituting in the appropriate values into the following mvn install:install-file command:

mvn install:install-file -Dfile=release.dxp.bom-7.1.10.pom -DgroupId="${GROUP_ID}" \ -DartifactId="${ARTIFACT_ID}" -Dversion="${VERSION}" \ -Dpackaging="${PACKAGING}"

And that's basically all you need to do when installing a BOM that's not available in any repository you can access.

Install Multiple BOMs

If you only have a handful of BOMs, you could repeat the process mentioned above for each of your BOMs. If you have a lot of BOMs to install (for example, you're a Liferay employee that might need to build against arbitrary releases, and you decided to use the earlier linked page and generate something for every Liferay fix pack), you may want to script it.

To keep things simple for pulling values out of an XML file, you should install the Python package yq, which provides you with the tool xq, which provides you with XML processing at the command-line. This tool is similar to the popular tool jq, which provides you with JSON processing at the command line.

pip install yq

Once yq is installed, you can add the following to a Bash script to install the auto-generated BOMs to your local Maven repository:

#!/bin/bash install_bom() { local GROUP_ID=$(cat ${1} | xq '.project.groupId' | cut -d'"' -f 2) local ARTIFACT_ID=$(cat ${1} | xq '.project.artifactId' | cut -d'"' -f 2) local VERSION=$(cat ${1} | xq '.project.version' | cut -d'"' -f 2) local PACKAGING=$(cat ${1} | xq '.project.packaging' | cut -d'"' -f 2) mvn install:install-file -Dfile=${1} -DgroupId="${GROUP_ID}" \ -DartifactId="${ARTIFACT_ID}" -Dversion="${VERSION}" \ -Dpackaging="${PACKAGING}" echo "Installed ${GROUP_ID}:${ARTIFACT_ID}:${VERSION}:${PACKAGING}" } for bom in *.pom; do install_bom ${bom} done Use BOMs in Blade Samples

In case you've never used a BOM before, I'll show you how you would use them if you were to build projects in the Blade Samples repository.

Reference the BOM in Maven

First, update the parent pom.xml so that child projects know which dependency versions are available by simply adding the BOM as a dependency.

<dependencyManagement> <dependencies> <dependency> <groupId>com.liferay.portal</groupId> <artifactId>release.dxp.bom</artifactId> <version>7.1.10</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement>

We set this to be an import scope dependency, so that we don't have to download all of Liferay's release artifacts just to have the version numbers (they will be downloaded as-needed when the specific artifacts are resolved).

Then, in order to have a child project use this version, simply update the pom.xml in the child project to not include the version explicitly for any of the dependencies.

<dependency> <groupId>com.liferay.portal</groupId> <artifactId>com.liferay.portal.kernel</artifactId> <scope>provided</scope> </dependency>

As noted in Introduction to Dependency Mechanism, the version specified in the parent POM will then be chosen as the dependency version. Running mvn package in the child project will then download the actual versions for the noted Liferay release.

Note: If this is the first time you've run any Maven commands in the liferay-blade-samples repository, you'll want to make sure that all of the parent projects are installed or the build will fail. If you are new to Maven and aren't sure how to read pom.xml files, this is achieved with the following steps:

  1. Run mvn -N install in liferay-blade-samples/maven
  2. Run mvn install in liferay-blade-samples/parent.bnd.bundle.plugin.
Reference the BOM in Gradle

Liferay workspace uses an older version of Gradle, and so BOMs aren't supported by default. To get support for BOMs, we'll first need to bring in io.spring.gradle:dependency-management-plugin:1.0.6.RELEASE.

The first step in doing this is to update the parent build.gradle so that Gradle knows where to find the plugin.

buildscript { dependencies { ... classpath group: "io.spring.gradle", name: "dependency-management-plugin", version: "1.0.6.RELEASE" } ... }

The next step in doing this is to update the parent build.gradle so that Gradle makes sure to apply the plugin to each child project.

subprojects { subproject -> ... apply plugin: "io.spring.dependency-management" ... }

Because we're installing the BOMs to our local Maven repository, the next step in doing this is to update the parent build.gradle so that Gradle knows to check that local repository. We can then also add the BOM to a dependencyManagement block.

subprojects { subproject -> ... repositories { mavenLocal() ... } dependencyManagement { imports { mavenBom "com.liferay.portal:release.dxp.bom:7.1.10" } } ... }

Then, in order to have a child project use this version, simply update the build.gradle in the child project to not include the version explicitly for any of the dependencies.

dependencies { compileOnly group: "com.liferay.portal", name: "com.liferay.portal.kernel" ... } Minhchau Dang 2018-10-17T20:15:00Z
Categories: CMS, ECM

Open Source Program Benefits Survey Results

Open Source Initiative - Wed, 10/17/2018 - 08:37

There are many organizations out there, from companies like Red Hat to internet scale giants like Google and Facebook that have established an open source programs office (OSPO). The TODO Group, a network of open source program managers, recently performed the first ever annual survey of corporate open source programs and revealed some interesting findings on the actual benefits of open source programs.

According to the survey, the top three benefits of managing an open source program are:

  • awareness of open source usage/dependencies
  • increased developer agility/speed
  • better and faster license compliance
Corporate Open Source Programs on the Rise

According to the survey, 53% of companies have an open source program or plan to establish one in the near future:

An interesting factoid to see is that large companies are about twice as likely to run an open source program than smaller companies (63 percent vs. 37 percent). Also, technology industry organizations were more likely to have an open source program than traditional industry verticals such as the financial services industry. Another interesting trend was that most open source programs tend to start informally as a working group, committee or a few key open source developers and then evolve into formal programs over time, typically within a company’s engineering department.

Research Shows Giving Back Is A Competitive Advantage

It’s important to note that companies aren’t forming open source programs and giving back to open source for purely altruistic means. Recent research from Harvard Business School shows that open source contributing companies capture up to 100% more productive value from open source than companies who do not contribute back. In particular, the example of Linux was used showcased in the research:

"It’s not necessarily that the firms that contribute are more productive on the whole. It’s that they get more in terms of productivity output from their usage of the Linux operating system than do companies that use Linux without contributing."

In the survey, it was notable that 44 percent of companies with open source programs contribute code upstream compared to only 6 percent for companies without an open source program. If you want to sustain open source and give your business a competitive advantage, an open source program can help.

Finally, you’ll be happy to learn that the survey results and questions are open sourced under the CC-BY-SA. The TODO Group plans to run this survey on an annual basis moving forward and in true open source fashion, we’d love your feedback on any new questions to ask, please leave your thoughts in the comments or on GitHub.

 About the Author: Chris Aniszczyk is currently a Vice President at OSI Affiliate Member Linux Foundation, focused on developer relations and running the Open Container Initiative (OCI) / Cloud Native Computing Foundation (CNCF).

Image credit: OSPO.png (CC BY-SA 4.0) by The Open Source Initiative is a derivitive of "Trollback + Company office.JPG" by Trollbackco from Wikimedia Commons, and used/edited with permission via CC BY-SA 4.0

Categories: Open Source

Why the cloud can save big data

SnapLogic - Tue, 10/16/2018 - 14:36

This article originally appeared on computable.nl. Many companies like to use big data to make better decisions, strengthen customer relationships, and increase efficiency within the company. They are confronted with a dizzying array of technologies – from open source projects to commercial software – that can help to get a better grip on the large[...] Read the full article here.

The post Why the cloud can save big data appeared first on SnapLogic.

Categories: ETL

We want to invite you to DEVCON 2018

Liferay - Tue, 10/16/2018 - 10:04

Every year we, the developers doing amazing things with Liferay's products, have this unique opportunity to meet, learn and enjoy those long technical discussions with each other. All that happens at DEVCON - our main developer conference which is taking place in Amsterdam from November 6th to 8th. This year's agenda is filled with sessions delivering in depth technical details about the products, presenting  new technologies and showcasing spectacular use cases.

Following the tradition, DEVCON starts with an un-conference, the day before the main conference. That is a full day consisting solely of the best parts of any traditional conference - the meetings and conversations at the halls between the talks. It's a day full of discussions with experts and colleagues on topics that attendees bring in.

We try to keep the prices for DEVCON at reasonable level and provide several kind of promotions for partners and organizations we have business relationships with. Yet there are talented developers in our community who are working alone or for non-profit organizations or in not so well developed parts of this world or for whatever reason can not afford a DEVCON ticket. This year we want to help some of them. 

We have free DEVCON tickets to give away!

As much as we would love to invite the whole community, we have to live by the market rules! So we only have limited number of tickets. To help us decide, please send an email to developer-relations@liferay.com with "Free DEVCON ticket" subject and tell us why you think you should be one of the people we give that free ticket to. We will decide between those that have the most convincing, creative and fun reasons.

See you in Amsterdam!

David Gómez 2018-10-16T15:04:00Z
Categories: CMS, ECM

Introducing Talend Data Catalog: Creating a Single Source of Trust

Talend - Tue, 10/16/2018 - 06:07

Talend Fall ’18 is here! We’ve released a big update to the Talend platform this time around including support for APIs, as well as new big data and serverless capabilities. You will see blogs from my colleagues to highlight those major new product and features introductions. On my side, I’ve been working passionately to introduce Talend Data Catalog, which I believe has the potential to change the way data is consumed and managed within our enterprise. Our goal with this launch is to help our customers deliver insight-ready data at scale so they can make better and faster decisions, all while spending less time looking for data or making decisions with incomplete data.

You Can’t Be Data Driven without a Data Catalog

Before we jump into features, let’s look at why you need a data catalog. Remember the early days of the Internet? Suddenly, it became so easy and cheap to create content and publish it to anyone that everybody actually did it. Soon enough, that created a data sprawl, and the challenge was not any more to create content but to find it. After two decades we know that winners in the web economy are those that created a single point of access to content in their category: Google, YouTube, Baidu, Amazon, Wikipedia.

Now, we are faced with a similar data sprawl in out data-driven economy. IDC research has found that today data professionals are spending 81% of their time searching, preparing, and protecting data with little time left to turn it into business outcomes. It has become crucial that organizations establish this same single source of access to their data to be in the winner’s circle.

Although technology can help to fix the issue, and I’ll come back on it later in the article, among these, enterprises need to set up a discipline to organize their data at scale, and this discipline is called data governance. But traditional data governance must be re-invented with this data sprawl:  according to Gartner, “through 2022, only 20% of organizations investing in information will succeed in scaling governance for digital business.” Given the sheer number of companies that are awash in data, that percentage is just too small.

Modern data governance is not only about minimizing data risks but also about maximizing data usage, which is why traditional authoritative data governance approaches are not enough. There is a need for a more agile, bottom-up approach. That strategy starts with the raw data, links it to its business context so that it becomes meaningful, takes control of its data quality and security, and fully organizes it for massive consumption.

Empowering this new discipline is the promise of data catalogs, leveraging modern technologies like smart semantics and machine learning to organize data at scale and turns data governance into a team sport by engaging anyone for social curation. 

With the newly introduced Talend Data Catalog, companies can organize their data at scale to make data accessible like never before and address challenges head-on. By empowering organizations to create a single source of trusted data, it’s a win for both the business with the ability to find the right data, as well as the CIO and CDO who can now control data better to improve data governance. Now let’s dive into some details on what the Talend Data Catalog is.

Intelligently discover your data

Data catalogs are a perfect fit for companies that modernized their data infrastructures with data lakes or cloud-based data warehouses, where thousands of raw data items can reside and can be accessed at scale. The catalog acts as the fish finder for that data lake, leveraging crawlers across different file systems, traditional, Hadoop, or cloud, and across typical file format. Then automatically extracts metadata and profiling information, for referencing, change management classification and accessibility.

Not only can it bring all of those metadata together in a single place, but it can also automatically draw the links between datasets and connect them to a business glossary. In a nutshell, this allows businesses to:

  • Automate the data inventory
  • Leverage smart semantics for auto-profiling, relationships discovery and classification
  • Document and drive usage now that the data has been enriched and becomes more meaningful

The goal of the data catalog is to unlock data from the application where they reside.

Orchestrate data curation

Once the metadata has been automatically harvested in a single place, data governance can be orchestrated in a much more efficient way. Talend Data Catalog allows businesses to define the critical data elements in its business glossary and assign data owners for those critical data elements. The data catalog then relates those critical data elements to the data points that refer it across the information system.

Now data is in control and data owners can make sure that their data is properly documented and protected. Comments, warnings, or validation can be crowdsourced from any business user for collaborative, bottom-up governance. Finally, the data catalog draws end-to-end data lineage and manages version control. It guarantees accuracy and provides a complete view of the information chain, which are both critical for data governance and data compliance.

Easy search-based access to trusted data

Talend Data Catalog makes it possible for businesses to locate, understand, use, and share their trusted data faster by searching and verifying data’s validity before sharing with peers. Its collaborative user experience enables anyone to contribute metadata or business glossary information.

Data governance is most often associated with control. A discipline that allows businesses to centrally collect data, process, and consume under certain rules and policies. The beauty of Talend Data Catalog is that not only does it control data but liberates it for consumption as well. This allows data professionals to find, understand, and share data ten times faster. Now data engineers, scientists, analysts, or even developers can spend their time on extracting value from those data sets rather than searching for them or recreating them – removing the risk of your data lake turning into a data swamp.

A recently published IDC report, “Data Intelligence Software for Data Governance,” advocates the benefits of modern data governance and positions the Data Catalog as the cornerstone of what they define as Data Intelligence Software. In the report, IDC calls it a “technology that supports enablement through governance is called data intelligence software and is delivered in metadata management, data lineage, data catalog, business glossary, data profiling, mastering, and stewardship software.”

For more information, check out the full capabilities of the Talend Data Catalog here.

The post Introducing Talend Data Catalog: Creating a Single Source of Trust appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL
Syndicate content