Assistance with Open Source adoption

Open Source News

Give your users customised automated tours of civicrm

CiviCRM - Tue, 10/02/2018 - 18:57

The Civi Summit was a great event - full of lots of nice surprises. One that stands out for me was that what started out as some wishful thinking - namely having the ability to provide on-page tours/tutorials - ended up with us being able to beta test a 'proof of concept' before we left.

While some amongst us (ahem) were sampling the whiskey and fine IPAs late in the evening along to the strumming of the musically-able folk, others remained focussed on their laptops - and in Coleman's case this meant getting us a working prototype of a tour/tutorial system for civi pages.

Categories: CRM

Moving Big Data to the cloud: A big problem?

SnapLogic - Tue, 10/02/2018 - 13:49

Originally published on Data Centre Review. Digital transformation is overhauling the IT approach of many organizations and data is at the center of it all. As a result, organizations are going through a significant shift in where and how they manage, store and process this data. To manage big data in the not so distant[...] Read the full article here.

The post Moving Big Data to the cloud: A big problem? appeared first on SnapLogic.

Categories: ETL

California Leads the US in Online Privacy Rules

Talend - Tue, 10/02/2018 - 12:12

With California often being looked to as the state of innovation, the newly enforced California Consumer Privacy Act (CCPA) came as no surprise. This new online privacy law gives consumers the right to know what information companies are collecting about them, why they are collecting that data, and who they are sharing it with.

Some specific industries such as Banking or Health Sciences had already considered this type of compliance at the core of their digital transformation. But as the CCPA applies to potentially any company, no matter its size or industry, anyone serious about personalizing interactions with their visitors, prospects, customers, and employees needs to pay attention.

Similarities to GDPR

Although there are indeed some differences between GDPR and the CCPA, in terms of the data management and governance frameworks that needs to be established, the two are similar. These similarities include:

  • You need to know where your personal data is across your different system, which means that you need to run a data mapping exercise
  • You need to create a 360° view of your personal data and manage consent at a fine grain, although CCPA looks more permissive on consent than GDPR
  • You need to publish a privacy notice where you tell the regulation authorities, customers and other stakeholders what you are doing with the personal information within your database. You need to anonymize data (i.e. through data masking) for any other systems that includes personal data, but that you want to scope out from your compliance effort and privacy notice.   
  • You need to foster accountabilities so that the people in the companies that participate in the data processing effort are engaged for compliance
  • You need to know where your data is, including when it is shared or processed through third parties such as business partners or cloud providers. You need to control cross border data transfers and potential breaches while transparently communicating in cases of breaches 
  • You need to enact the data subject access rights, such as the right for data access, data rectification, data deletion, and data portability. CCPA allows a little more time to answer to a request, 45 days versus 1 month.  

Download Data Governance & Sovereignty: 16 Practical Steps towards Global Data Privacy Compliance now.
Download Now

Key Takeaways from the CCPA

The most important takeaway is that data privacy regulations are burgeoning for companies all over the world. With the stakes getting higher and higher, from the steep fines to the reputation risks, compliance consumers that can negatively affect the benefits of digital transformation).

While this law in its current state is specific to California, the idea of a ripple effect at the federal level might not be far off.  So instead of seeing it as a burden, such regulations should be taken as an opportunity. In fact, one of the side effects of all those regulations, today with data scandals now negatively impacting millions of consumers, is that data privacy now makes the headlines. Consumers are now understanding how valuable their data can be and how damaging the impact of losing control over personal data could be.

The lesson learned is that, although regulatory compliance is often what triggers a data privacy compliance project, it shouldn’t be the only driver. The goal is rather to establish a system of trust with your customers for their personal data. In a recent benchmark, where we exercised our right of data access and privacy against more than 100 companies, we could demonstrate that most company are very low on their maturity for achieving that goal. But it demonstrated as well that the best in class are setting the standards for turning it into a memorable experience.






The post California Leads the US in Online Privacy Rules appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

CiviCRM - Mobile Application – Smartcivi – PART 2

CiviCRM - Tue, 10/02/2018 - 11:50


 Now Smartcivi Mobile Application is available in IOS and Android users

This Blog is an update to the previous blog created on 16 Sep 2018 (

Additional Feature in this Release 

Event list added to the Application which will display the Registered Event along with a map option to which opens event location in google map.

General testing: 

Categories: CRM

Upgrade WURFL's database into Liferay Mobile Device Detection Lite 

Liferay - Tue, 10/02/2018 - 10:02

If you're reading this post is because you need to know which device currently access on your Liferay through Liferay Mobile Device Detection Lite. Specially, you can not explain why Liferay detects a different version of your modern, cool and super updated device!

Don't worry! I try to explain you what to do.

WURFL's database

Before to explain,  do you know the WURLF's database? If you don't, you can see this shortly video!

In order to detect your device, you already know to download and install the Liferay Mobile Device Detection Lite from marketplace

This app contain a WURLF's database prepopulated inside the bundle through an external service called 51Degrees. This database is populated only during the build of bundle and not at runtime.

processResources { into("META-INF") { from { FileUtil.get(project, "") } } }

code from build.gradle of this app

The result was a 51Degrees.dat file inside the META-INF folder and, as you can image, this file is the engine of the device detection process.

Currently the last release (build) of Liferay Mobile Device Detection Lite was one year ago on marketplace and now the devices data are very old.

How to upgrade this WURFL's database

You can see, on the following image, the configuration of Liferay Mobile Device Detection Lite (51Degrees Device Detection) and how is linked the WURLF's database.

Unfortunately this configuration check on file system only inside your bundle and we can't link an URL or set an absolute path of this data file put in other places.

The only way to add or replace files inside existing bundle is a fragment. Now we are use this way in order to add a new WURFL's database.

You can check here my project on GitHub where I have put the file under META-INF folder and through the bnd file we explain to Liferay to "put" this file inside the original bundle.


At the end we can change the configuration and link the new WURLF's database and restart the server.


This database is not updated daily and you can check here the update status of this file. When you try to add a new file don't use the same filename but change it.

Davide Abbatiello 2018-10-02T15:02:00Z
Categories: CMS, ECM

A Recap of the 2018 CiviCRM Governance Summit & DevCamp

CiviCRM - Mon, 10/01/2018 - 15:40

Last week, stakeholders from the CiviCRM community came together to discuss issues of governance and sustainability, and to review recent developments in CiviCRM as well as how it’s managed. We called it a Governance Summit because there was a lot of interest in governance. Regardless, this was fundamentally a “Community Summit” where members of the community could work together to improve CiviCRM as a whole. This post is a recap of much of the work that took place at the summit. 

Categories: CRM

CiviVolunteer online training session: Fri. October 5th

CiviCRM - Mon, 10/01/2018 - 12:18

Cividesk is offering the Essentials of CiviVolunteer training session this Friday, October 5th at 9 am MT/ 10 am CT/ 11 am ET.  This 2-hour live training session will cover all you need to know to successfully manage and track volunteers in CiviCRM.

Click here for more details on this training session and/or to register for Friday's class.  

Categories: CRM

CiviCamp Manchester this Friday!

CiviCRM - Mon, 10/01/2018 - 08:42

We are looking forward to the next CiviCamp in the UK! With only a few days to go we thought you might be interested to hear which sessions we’ve got lined up. If you’ve not booked on, there are still a few places so please get in soon.

We’ve been busy in the background confirming sessions and making sure everything is in place for a great day. For those planning travel or needing information about the venue, you can check out the information here:

Categories: CRM

Managing Feature Requests in Gitlab

CiviCRM - Sun, 09/30/2018 - 18:55

We’re continuing to use Gitlab ( more and more as both a project management and development tool. One area that we’ve been tinkering with over the past several months is using Gitlab for feature requests in CiviCRM. As you can imagine, there’s real potential here to empower the CiviCRM community to create, discuss and promote new features and functionality in CiviCRM.

Categories: CRM

Understanding CiviCRM Form Builder for the layman

CiviCRM - Sun, 09/30/2018 - 18:46

The recent DevCamp in New Jersey presented several sessions on new developments in CiviCRM land as well as showcased several of its inner workings. One session presented by Core Team member Tim Otten stood out for me: Form Builder. If you’re like me, you listen to folks like Tim with a great deal of respect and appreciation for what they say (and do).

Categories: CRM

Making the Bet on Open Source

Talend - Fri, 09/28/2018 - 17:20

Today, Docker or Kubernetes are obvious choices. But, back in 2015, these technologies were just emerging and hoping for massive adoption. How do tech companies make the right open source technology choices early?

As a CTO today, if you received an email from your head of Engineering saying, “Can we say that Docker is Enterprise production ready now?,” Your answer would undoubtedly be “yes”. If you hadn’t started leveraging Docker already, you would be eager to move on the technology that Amazon and so many other leading companies are now using as the basis of their applications’ architectures. However, what would your reaction be if you had received that email four years ago when Docker was still far from stable, lacked integration, support or tooling with all the major operating systems and Enterprise platforms, On-Premise or Cloud? Well, that is the situation that we at Talend were facing in 2015.

By sharing our approach and our learnings from choosing to develop with Docker and Kubernetes, I hope that we can help other CTOs and tech companies’ leaders with their decisions to go all-in with today’s emerging technologies.

Increasing Docker use from demos to enterprise-ready products

Back in 2014, as we were architecting our next generation Cloud Integration platform, micro-services and containerization were two trends that we closely monitored.

Talend, which is dedicated to monitoring emerging projects and technologies identified Docker as a very promising containerization technology that we could use to run our micro-services. That same year, one of our pre-sales engineers had heard about Docker at a Big Data training and learned about its potential to accelerate the delivery of product demos to its prospects as a more efficient alternative to VMWare or Virtual Box images.

From that day, Docker usage across Talend has seen an explosive growth, from the pre-sales use case of packaging demos to providing reproduction environments to tech support or quality engineering and of course its main usage around service and application containerization for R&D and Talend Cloud.

During our evaluation, we did consider some basic things like we would with any up-and-coming open source technology. First, we needed to determine the state of the security features offered by Docker. Luckily, we found that we didn’t need to build anything on top of what Docker already provided which was a huge plus for us.

Second, like many emerging open source technologies, Docker was not as mature as it is today, so it was still “buggy.” Containers would sometimes fail without any clear explanation, which would mean that we would have to invest time to read through the logs to understand what went wrong—a reality that anyone who has worked with a new technology understands well. Additionally, we had to see how this emerging technology would fit with our existing work and product portfolio, and determine whether they would integrate well. In our case, we had to check how Docker would work with our Java-based applications & services and evaluate if the difficulties that we ran into there would be considered a blocker for future development.

Despite our initial challenges, we found Docker to be extremely valuable and promising as it greatly improved our development life cycle by facilitating the rapid exchange and reuse of pieces of work between different teams. In the first year of evaluation, Docker quickly became the primary technology used by QA to rapidly setup testing environments at a fraction of the cost and with better performance compare to the more traditional Virtual environments (VMWare or VirtualBox).

After we successfully used Docker in our R&D processes, we knew we had made the right choice and that it was time to take it to the next level and package our own services for the benefit of our customers. With the support of containers and more specifically Docker by major cloud players such as AWS, Azure or Google, we had the market validation that we needed to completely “dockerize” our entire cloud-based platform, Talend Cloud.

While the choice to containerize our software with Docker was relatively straightforward, the choice to use Kubernetes to orchestrate those containers was not so evident at the start.

Talend’s Road to Success with Kubernetes

In 2015, Talend started to consider technologies that might orchestrate containers which were starting to make up our underlying architectures, but the technology of choice wasn’t clear. At this point, we faced a situation that every company has experienced: deciding what technology to work with and determining how to decide what technology would be the best fit.

At Talend, portability and agility are key concepts, and while Docker was clearly multiplatform, each of the cloud platform vendors had their own flavor of the orchestration layer.

We had to bet on an orchestration layer that would become the de facto standard or be compatible with major cloud players. Would it be Kubernetes, Apache Mesos or Docker Swarm?

Initially, we were evaluating both Mesos and Kubernetes. Although Mesos was more stable than Kubernetes at the time and its offering was consistent with Talend’s Big Data roadmap, we were drawn to the comprehensiveness of the Kubernetes applications. The fact that Google was behind Kubernetes gave us some reassurance around its scalability promises.

At the time, we were looking for container orchestration offerings, using Mesos required that we bundle several other applications for it to have the functionality we needed. On the other hand, Kubernetes’ applications had everything we needed already bundled together. We also thought about our customers: We wanted to make sure we chose the solution that would be the easiest for them to configure and maintain. Last—but certainly not least—we looked at the activity of the Kubernetes community. We found it promising that many large companies were not only contributing to the project but were also creating standards for it as well. The comprehensive nature of Kubernetes and the vibrancy of its community led us to switch gears and go all-in with Kubernetes.

As with any emerging innovative technology, there are constant updates and project releases with Kubernetes, which results in several iterations of updates in our own applications. However, this was a very small concession to make to use such a promising technology.

Similar to our experience with Docker, I tend to believe that we made the right choice with Kubernetes. Its market adoption (AWS EKS, Azure AKS, OpenShift Kubernetes) proved us right. The technology has now been incorporated into several of our products, including one of our recent offerings, Data Streams.

Watching the team go from exploring a new technology to actually implementing it was a great learning experience that was both exciting and very rewarding.

Our Biggest Lessons in Working with Emerging Technologies

Because we have been working with and contributing to the open source community since we released our open source offering Talend Open Studio for Data Integration in 2006, we are no strangers to working with innovative, but broadly untested technologies or sometimes uncharted territories. However, this experience with Docker and Kubernetes has emphasized some of the key lessons we have learned over the years working with emerging technologies:

  • Keep your focus: During this process, we learned that working with a new, promising technology requires that you keep your end goals in mind at all times. Because the technologies we worked with are in constant flux, it could be easy to get distracted by any new features added to the open source projects. It is incredibly important to make sure that the purpose of working with a particular emerging technology remains clear so that development won’t be derailed by new features that could be irrelevant to the end goal.
  • Look hard at the community: It is incredibly important to look to the community of the project you choose to work with. Be sure to look at the roadmap and the vision of the project to make sure it aligns with your development (or product) vision. Also, pay attention to the way the community is run—you should be confident that it is run in a way that will allow the project to flourish.
  • Invest the time to go deep into the technology: Betting on innovation is hard and does not work overnight. Even if it is buggy, dive into the technology because it can be worth it in the end. From personal experience, I know it can be a lot of work to debug but be sure to keep in mind that the technology’s capabilities—and its community—will grow, allowing your products (and your company) to leverage the innovations that would be very time consuming, expensive and difficult to build on your own.

Since we first implemented Docker and Kubernetes, we have made a new bet on open source: Apache Beam. Will it be the next big thing like Docker and Kubernetes? There’s no way to know at this point—but when you choose to lead with innovation, you can never take the risk-free, well-travelled path. Striving for innovation is a never-ending race, but I wouldn’t have it any other way.

The post Making the Bet on Open Source appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

CiviCamp in Halle (Saale), Deutschland, am 15. November 2018

CiviCRM - Fri, 09/28/2018 - 01:55

CiviCRM-Anwender, -Implementierer und -Interessierte zusammenbringen: Das ist das Ziel der CiviCamps, die von der CiviCRM-Community auf der ganzen Welt veranstaltet werden.

Das „CiviCamp“ ist eine ganztägige Konferenz mit zwei parallelen Vortragsreihen: Eine Reihe mit Vorträgen für Leute, die sich für CiviCRM interessieren oder es auch bereits selber verwenden. Und eine andere Reihe für Leute, die schon viel über CiviCRM wissen, und jetzt noch mehr lernen möchten. – über die neuesten Entwicklungen und Extensions, und raffinierte Lösungsansätze für alltägliche Probleme.

Categories: CRM

Membership online training - Tuesday, October 2nd

CiviCRM - Thu, 09/27/2018 - 15:02

If you missed the Cividesk online training course, the Fundamentals of Membership Management in August, we have scheduled another session for this Tuesday, October 2nd from 9 to 11 am am MT.  This informative course covers all the basics of managing memberships and will help you get off to a great start using CiviMember.

Categories: CRM

How to load your Salesforce data into NetSuite

SnapLogic - Thu, 09/27/2018 - 13:06

Connecting customer relationship management (CRM) software with enterprise resource planning (ERP) technology is a fairly common integration requirement for organizations looking to complete a series of business goals from sales forecasting, revenue accounting by product or portfolio, to identifying highest revenue by industry or geography. To achieve these goals, integrators need to synchronize data across[...] Read the full article here.

The post How to load your Salesforce data into NetSuite appeared first on SnapLogic.

Categories: ETL

From Dust to Trust: How to Make Your Salesforce Data Better

Talend - Wed, 09/26/2018 - 17:15

Salesforce is like a goldmine. You own it but it’s up to you to extract gold out of it. Sound complicated? With Dreamforce in full swing, we are reminded that trusted data is the key to success for any organization.

According to a Salesforce survey, “68% of sales professionals say it is absolutely critical or very important to have a single view of the customer across departments/roles. Yet, only 17% of sales teams rate their single view of the customer capabilities as outstanding.”

As sales teams are willing to change into high-performing trusted advisors, they are still spending most of their time on non-selling activities. The harsh reality is that sales people cannot wait to get clean, complete, accurate and consistent data into their systems.  They often end up spending lots of time on their own correcting bad records and reuniting customer insights. To minimize their time spent on data and boost their sales numbers, they need your help to rely on single customer view filled with trusted data.

Whether you’re working for a nonprofit that’s looking for more donors or at a company looking to get qualified leads, managing data quality in your prospects or donator CRM pipeline is crucial.

Watch Better Data Quality for All now.
Watch Now

Quick patches won’t solve your data quality problem in the long run

Salesforce was intentionally designed to digitally transform your business processes but was unfortunately not natively built to process and manage your data. As data is exploding, getting trusted data is becoming more and more critical. As a result, lots of Incubators’ apps started emerging on the Salesforce Marketplace. You may be tempted to use them and patch your data with quick data quality operations. 

But you may end up with separate features built by separate companies with different levels of integration, stability, and performance. You also take the risk of having the app not supported over the long term, putting your data pipeline and operations at risk. This in turn, will only make things worse by putting all the data quality on your shoulders whereas you may rely on your sales representative to resolve data. And you do not want to become the bottleneck of your organization.

After the fact Data Quality is not your best option

Some Business Intelligence Solutions have started emerging, further allowing you to prepare your data at the Analytical Level. But this is often a one-shot option for one single need and not solving the fulfilling the full need. You will still have bad data to input into Salesforce. Salesforce Data can be used in multiple scenarios by multiple persons. Operating Data Quality directly into Salesforce Marketing, Service or Commerce Cloud is the best approach to deliver trusted data at its source so that everybody can benefit from it.

The Rise of Modern Apps to boost engagement:

Fortunately, Data Quality has evolved to become a team activity rather than a single isolated job. You then need to find ways and tools to engage your sales org into data resolution initiatives. Modern apps are key here to make that it a success.

Data Stewardship to delegate errors resolution with business experts

Next-generation data stewardship tools such as Talend Data Stewardship give you the ability to reach everyone who knows the data best within the organization. In parallel, business experts will be comfortable editing and enriching data within UI friendly tool that makes the job easier. Once you captured tacit knowledge from end users, you can scale it to millions of records thru built in machine learning capabilities within Talend Data Stewardship.

Data Preparation to discover and clean data directly with Salesforce

Self-service is the way to get data quality standards to scale. Data analyst spend 60% of their time cleaning data and getting it ready to use. Reduced time and effort mean more value and more insight to be extracted from data. Talend Data Preparation deals with this problem. It is a self – service application that allows potentially anyone to access a data set and then cleanse, standardize, transform, or enrich the data. With it’s ease of use, Data Preparation helps to solves  organizational pain points where often times employees are spending so much time crunching data in Excel or expecting their colleagues to do that on their behalf.

Here are two use cases to learn from:

Use Case 1: Standardizing Contact Data and removing duplicates from Salesforce

Duplicates are the bane of CRM Systems. When entering data into Salesforce, Sales Rep can be in a rush and create duplicates that stay for long. Let them pollute your CRM and it will impact every user and sales rep confidence in your data.

Data Quality here has a real direct business impact on your sales productivity and your marketing campaigns too.

Bad Data mean unreachable customers or untargeted prospects that escape from your customized campaigns leading to low conversion rate and lower revenue. 

With Talend Data Prep, you can really be a game changer: Data Prep allows you to connect natively and directly to your Salesforce platform and perform some ad-hoc data quality operations.

  • By entering your SDFC Credentials, you will get native access to customer fields you want to clean
  • Once data is displayed into Data Prep, Quality Bar and smart assistance will allow you to quickly spot your duplicates
  • Click the header of any column containing duplicates from your dataset.
  • Click the Table tab of the functions panel to display the list of functions that can be applied on the whole table
  • Point your mouse over the Remove duplicate rows function to preview its result and click to apply it
  • Once you perform this operation, your duplicates can be removed
  • You can also register this as a recipe you may want to apply it to other data sources
  • You also have some options in Data Prep to certify your dataset so other team members know this data source can be trusted
  • Collaborate with IT to expand your jobs with Talend Studio to fully automate your data quality operations and proceed with advanced matching operations

Use case 2:  Real time Data Masking into Salesforce

The GDPR defines pseudonymization as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” Pseudonymization or anonymization therefore, may significantly reduce the risks associated with data processing, while also maintaining the data’s utility.

Using Talend Cloud, you can process it directly into Salesforce. Talend Data Preparation enables any business users to obfuscate data the easy way. After native connection with Salesforce Dataset:

  • Click the header of any column containing data to be masked from your dataset
  • Click the Table tab of the functions panel to display the list of functions that can be applied
  • Point your mouse over the Obfuscation function and click to apply it
  • Once you perform this operation, data will be masked and anonymized

When confronted with in-depth fields and more sophisticated data masking techniques, data engineers will take the lead operating pattern data masking techniques directly into Talend Studio and perform them into Salesforce within personal fields such as Security Numbers or Credit Cards.  You can still easily spot data to be masked into Data Prep and ask data engineers to perform anonymization techniques into Talend Studio in a second phase.


Without data quality tools and methodology, you will then end up with unqualified, unsegmented or unprotected customers’ accounts leading to lower revenue, lower marketing effectiveness and more importantly frustrated sales rep spending their time for trusted client data.  As strong as it may be, your Salesforce goldmine can easily transform itself into dust if you don’t put trust into your systems. Only platforms such as Talend Cloud with powerful data quality solutions can help you to extract hidden gold from your Salesforce data and deliver it trusted to the whole organization.

Want to know more? Go to Talend Connect London on October 15th & 16th or Talend Connect Paris on October 17th & 18th to learn from real business cases such as Greenpeace, Petit Bateau.

Whatever your background, technical or not, there will be a session that meets your needs.  We have plenty of use cases and data quality jobs we’ll expose both in technical and customer tracks.





The post From Dust to Trust: How to Make Your Salesforce Data Better appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Building Agile Data Lakes with Robust Ingestion and Transformation Frameworks – Part 1

Talend - Wed, 09/26/2018 - 10:31

This post was authored by Venkat Sundaram from Talend and Ramu Kalvakuntla from Clarity Insights.

With the advent of Big Data technologies like Hadoop, there has been a major disruption in the information management industry. The excitement around it is not only about the three Vs – volume, velocity and variety – of data but also the ability to provide a single platform to serve all data needs across an organization. This single platform is called the Data Lake. The goal of a data lake initiative is to ingest data from all known systems within an enterprise and store it in this central platform to meet enterprise-wide analytical needs.

However, a few years back Gartner warned that a large percentage of data lake initiatives have failed or will fail – becoming more of a data swamp than a data lake. How do we prevent this? We have teamed up with one of our partners, Clarity Insights, to discuss the data challenges enterprises face, what caused data lakes to become swamps, discuss the characteristics of a robust data ingestion framework and how it can help make the data lake more agile. We have partnered with Clarity Insights on multiple customer engagements to build these robust ingestion and transformation frameworks to build their enterprise data lake solution.

Download Hadoop and Data Lakes now.
Download Now

Current Data Challenges:

Enterprises face many challenges with data today, from siloed data stores and massive data growth to expensive platforms and lack of business insights. Let’s take a look at these individually:

1. Siloed Data Stores

Nearly every organization is struggling with siloed data stores spread across multiple systems and databases. Many organizations have hundreds, if not thousands, of database servers. They’ve likely created separate data stores for different groups such as Finance, HR, Supply Chain, Marketing and so forth for convenience’s sake, but they’re struggling big time because of inconsistent results.

I have personally seen this across multiple companies: they can’t tell exactly how many active customers they have or what the gross margin per item is because they get varying answers from groups that have their own version of the data, calculations and key metrics.

2. Massive Data Growth

No surprise that data is growing exponentially across all enterprises. Back in 2002 when we first built a Terabyte warehouse, our team was so excited! But today even a Petabyte is still small. Data has grown a thousandfold—in many cases in less than two decades‚—causing organizations to no longer be able to manage it all with their traditional databases.

Traditional systems scale vertically rather than horizontally, so when my current database reaches its capacity, we just can’t add another server to expand; we have to forklift into newer and higher capacity servers. But even that will have limitations. IT has become stuck in this deep web and is unable to manage systems and data efficiently.

Diagram 1: Current Data Challenges


3. Expensive Platforms

 Traditional relational MPP databases are appliance-based and come with very high costs. There are cases where companies are paying more than $100K per terabyte and are unable to keep up with this expense as data volumes rapidly grow from terabytes to exabytes.

4. Lack of Business Insights

Because of all of the above challenges, business is just focused on descriptive analytics, like a rear mirror view of what happened yesterday, last month, last year, year over year, etc., instead of focusing on predictive and prescriptive analytics to find key insights on what to do next.

What is the Solution?

One possible solution is consolidating all disparate data sources into a single platform called a data lake. Many organizations have started this path and failed miserably. Their data lakes have morphed into unmanageable data swamps.

What does a data swamp look like? Here’s an analogy: when you go to a public library to borrow a book or video, the first thing you do is search the catalog to find out whether the material you want is available, and if so, where to find it. Usually, you are in and out of the library in a couple of minutes. But instead, let’s say when you go to the library there is no catalog, and books are piled all over the place—fiction in one area and non-fiction in another and so forth. How would you find the book you are looking for? Would you ever go to that library again? Many data lakes are like this, with different groups in the organization loading data into it, without a catalog or proper metadata and governance.

A data lake should be more like a data library, where every dataset is being indexed and cataloged, and there should be a gatekeeper who decides what data should go into the lake to prevent duplicates and other issues. For this to happen properly, we need an ingestion framework, which acts like a funnel as shown below.

Diagram 2: Data Ingestion Framework / Funnel

A data ingestion framework should have the following characteristics:
  • A Single framework to perform all data ingestions consistently into the data lake.
  • Metadata-driven architecture that captures the metadata of what datasets to be ingested, when to be ingested and how often it needs to ingest; how to capture the metadata of datasets; and what are the credentials needed connect to the data source systems.
  • Template design architecture to build generic templates that can read the metadata supplied in the framework and automate the ingestion process for different formats of data, both in batch and real-time
  • Tracking metrics, events and notifications for all data ingestion activities
  • Single consistent method to capture all data ingestion along with technical metadata, data lineage, and governance
  • Proper data governance with “search and catalog” to find data within the data lake
  • Data Profiling to collect the anomalies in the datasets so data stewards can look at them and come up with data quality and transformation rules

Diagram 3: Data Ingestion Framework Architecture

Modern Data Architecture Reference Architecture

Data lakes are a foundational structure for Modern Data Architecture solutions, where they become a single platform to land all disparate data sources and: stage raw data, profile data for data stewards, apply transformations, move data and run machine learning and advanced analytics, ultimately so organizations can find deep insights and perform what-if analysis.

Unlike traditional data warehouses, where business won’t see the data until it’s curated, using the modern data architecture businesses can ingest new data sources through the framework and analyze it within hours and days, instead of months and years.

In the next part of this series, we’ll discuss, “What is Metadata Driven Architecture?” and see how it enables organizations to build robust ingestion and transformation frameworks to build successful Agile data lake solutions. Let me know what your thoughts are in the comments and head to Clarity Insights for more info

The post Building Agile Data Lakes with Robust Ingestion and Transformation Frameworks – Part 1 appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

The Future of Java and How It Affects Java and Liferay Developers

Liferay - Wed, 09/26/2018 - 05:25

Java 11 has just been released (on Sep, 25th) and it comes with the consolidation of a series of changes, not only to the language and the platform, but to the release and support model that has lead to some noise on the future of Java.

Probably the two most notable concerns are the end of public updates for Java 8 and the uncertainty of the rights to use Oracle JDK without paying for commercial support.

Although it is true that with the new changes, Oracle is going to put focus on only the latest java version, and will offer commercial support for its JDK, it is also true that we -as Java and Liferay developers- will still be able to use Java and the JDK freely.


The changes in the release cadence and model

In 2017, it was already announced that Java was going to move faster, scheduling a new feature release every six months, on March and September. That meant that after Java 9, released on September 2017, Java 10 was going to be released on March 2018 and java 11 on September 2018, which just has happened.

The second big change has been the introduction of the concept of LTS (Long Time Support) versions, which are versions that are 'marked' to be maintained for more than six months. And this mark is not a compromise from Oracle, but a recommendation for the industry and community.

On the other side, the differences between Oracle JDK and OpenJDK have been eliminated. In fact, Oracle is leading the work on OpenJDK LTS code base during the six first months after the release. This makes OpenJDK the new default. 

After that, Oracle will provide updates for their Oracle JDK only to those customers that have a commercial license. But at the same time, Oracle will allow and encourage other vendors (like IBM, RedHat, Azul or the community based AdoptOpenJDK) to work on the OpenJDK LTS codebase to keep providing updates.

That means that Oracle will provide free updates for every Java version during the first six months after release, and other vendors and community initiatives will provide free updates for LTS versions for a longer period.


Will Java 8 still be freely available?

Java 8 was a LTS, so it is replaced by Java 11, which is also a LTS. And that means that oracle has announced that OpenJDK 8 will end its official support for commercial use in January 2019.

But the good news is that Red Hat has already applied to lead the development and updates of OpenJDK 8 after that date, and other companies like Amazon, Azul Systems or IBM have also announced that they will support Red Hat.

So we will actually have free Java 8 updates at least until September 2023, based on OpenJDK.


In conclusion

Although Oracle is focusing their effort on the six month release, there is still support for free updates for the LTS versions of Java, first provided by Oracle and, after that, maintained and updated by other vendors which will offer free updates and, in some cases, also will offer commercial support.

If you want to dig a little bit more on the details of all these changes, there is a comprehensive document with the title "Java is Still Free" written and updated by the community of Java Champions that has a lot of details this topic, and includes and updated table with the plans for support and updates, which so far, is as follows:

And for Liferay, we will also pay attention to this changes and the plans to support the different versions of Java in order to update our Liferay JDK Compatibility Support accordingly. 

David Gómez 2018-09-26T10:25:00Z
Categories: CMS, ECM

Product maintenance in CiviCRM

CiviCRM - Tue, 09/25/2018 - 21:23

As our North American colleagues (and those who have made the big trip over there) head into the governance sprint now seems like a good time to recap on product maintenance in CiviCRM. Product maintenance, as I discuss, is the monthly routine processes we do to incorporate patches & contributions into the CiviCRM product. This blog is kinda long & weedsy - so if it’s not for you then take a look at this baby octopus instead.


Categories: CRM

How JDK11 Affects Liferay Portal/DXP Users

Liferay - Tue, 09/25/2018 - 16:20

With the release of JDK11, Oracle's new Java SE Support Policy (and here) brings sweeping changes to the Java Enterprise community.

If you would like a good explanation of the changes to come, I highly recommend this video.

Here are my thoughts on how some of these changes will affect Liferay Portal/DXP users:

Starting with JDK11, you will no longer be able to use Oracle JDK for free for commercial purposes.

All commercial projects will require a subscription/service from Oracle to get the Oracle JDK.  The majority of Liferay users are commercial users who deploy on Oracle JDK.  If you do not pay for support from Oracle for their JDK or one of their other products such as Web Logic, you will need to make a decision on whether you wish to continue to use Oracle JDK.

An OpenJDK binary is now your free option

Oracle will continue to provide a free alternative with their Oracle OpenJDK binary.  There will also be others such as Azul, RedHat, IBM, and AdoptOpenJDK which will also provide their own binaries.  For now, Liferay has only certified, Oracle's OpenJDK binary.  We have to yet to determine whether all OpenJDK libraries can fall under the same name or if we need to certify them individually.

A new JDK will be released every 6 months and some of them will be marked as LTS release.

Prior to JDK9, all JDK's were essentially LTS releases.  We were able to receive years of bug fixes before we had to worry about a new release.  We will now see a new JDK release every 6 months (March, September).

As of now, Liferay has decided we will not certify every single major release of the JDK.  We will instead choose to follow Oracle's lead and certify only those marked for LTS.  If you have seen our latest compatibility matrix, you will notice that we did not certify JDK9 or JDK10.  We will instead certify JDK11 and JDK17 and JDK23 as those have been the ones marked as LTS.  This is subject to change.

Oracle will only guarantee 2 updates per major release

JDK8 has received 172 updates so far. In contrast, JDK9, the first release that Oracle implemented this policy, had 4 updates, while JDK10 only got the minimum 2 updates. Although JDK11 is designated as an LTS release, there is no guarantee of more than 2 updates from Oracle.

We will have to wait until JDK12 is released to see what will happen with JDK11.  The optimistic side of me feels that the Java open source community will continue to backport security issues and bugs well after Oracle decides to stop.  We will have to wait and see.

January 2019 will be the last free public update for JDK8. 

If you are a Liferay user and you have not made a plan for your production servers, please start!

I will provide the paths available currently but these are in no way recommendations provided by Liferay.  It is up to you to make the best decision for your own company.

  • Continue to use Oracle JDK8 without any future patches
  • Continue to use Oracle JDK8 and pay for a subscription
  • Switch to Oracle JDK11 and pay for a subscription
  • Switch to Oracle OpenJDK11 binary (knowing that you will have to make this decision again in 6 months)
  • Switch to a certified version of IBM JDK.

I will try to update this list as more options become available i.e. we decide to certify AdoptOpenJDK, Azul Zulu, RedHat JDK.

I am eager to see how the rest of the enterprise community reacts to these changes. 

Please leave a comment below with any thoughts on Oracle's changes or suggestions on what you would like to see Liferay do in regards to JDK support.

David Truong 2018-09-25T21:20:00Z
Categories: CMS, ECM

How code-heavy approaches to integration impede digital transformation

SnapLogic - Tue, 09/25/2018 - 15:30

At the heart of digital transformation lies the urge to survive. Any company, no matter how powerful, can go bankrupt, suffer a wave of layoffs, or get thrust into the bad end of an acquisition deal. Market disruption, led by those who have put digital transformation into practice, contributes heavily to such calamities. Look no[...] Read the full article here.

The post How code-heavy approaches to integration impede digital transformation appeared first on SnapLogic.

Categories: ETL
Syndicate content