AdoptOS

Assistance with Open Source adoption

ETL

IP Expo Europe 2018 recap: AI and machine learning on display

SnapLogic - Thu, 10/11/2018 - 17:06

Artificial Intelligence (AI) and machine learning (ML) have come to dominate business and IT discussions the world over. From boardrooms to conferences to media headlines, you can’t escape the buzz, the questions, the disruption. And for good reason – more than any other recent development, AI and ML are transformative, era-defining technologies that are fundamentally[...] Read the full article here.

The post IP Expo Europe 2018 recap: AI and machine learning on display appeared first on SnapLogic.

Categories: ETL

Bitwise: Cloud Data Warehouse Modernization – Inside Look at Talend Connect London

Talend - Thu, 10/11/2018 - 09:28

With expectations of business users evolving beyond limitations of traditional BI capabilities, we see a general thrust of organizations developing a cloud-based data strategy that enterprise users can leverage to build better analytics and make better business decisions. While this vision for cloud strategy is fairly straightforward, the journey of identifying and implementing the right technology stack that caters to BI and analytical requirements across the enterprise can create some stumbling blocks if not properly planned from the get-go.

As a data management consulting and services company, Bitwise helps organizations with their modernization efforts. Based on what we see at our customers when helping to consolidate legacy data integration tools to newer platforms, modernize data warehouse architectures or implement enterprise cloud strategy, Talend fits as a key component of a modern data approach that addresses top business drivers and delivers ROI for these efforts.

For this reason, we are very excited to co-present “Modernizing Your Data Warehouse” with Talend at Talend Connect UK in London. If you are exploring cloud as an option to overcome limitations you may be experiencing with your current data warehouse architecture, this session is for you. Our Talend partner is well equipped to address the many challenges with the conventional data warehouse (that will sound all too familiar to you) and walk through the options, innovations, and benefits for moving to cloud in a way that makes sense to the traditional user.

For our part, we aim to show “how” people are moving to cloud by sharing our experiences for building the right business case, identifying the right approach, and putting together the right strategy. Maybe you are considering whether Lift & Shift is the right approach, or if you should do it in one ‘big bang’ or iterate – we’ll share some practical know-how for making these determinations within your organization.

With so many tools and technologies available, how do you know which are the right fit for you? This is where vendor neutral assessment and business case development, as well as ROI assessment associated with the identified business case, becomes essential for getting the migration roadmap and architecture right from the start. We will highlight a real-world example for going from CIO vision to operationalizing cloud assets, with some lessons learned along the way.

Ultimately, our session is geared to help demonstrate that by modernizing your data warehouse in cloud, you not only get the benefits of speed, agility, flexibility, scalability, cost efficiency, etc. – but it puts you in a framework with inherent Data Governance, Self-Service and Machine Learning capabilities (no need to develop these from scratch on your own), which are the cutting-edge areas where you can show ROI for your business stakeholders…and become a data hero.

Bitwise, a Talend Gold Partner for consulting and services, is proud to be a Gold Sponsor of Talend Connect UK. Be sure to visit our booth to get a demo on how we convert ANY ETL (such as Ab Initio, OWB, Informatica, SSIS, DataStage, and PL/SQL) to Talend with maximum possible automation.

About the author:

Ankur Gupta

EVP Worldwide Sales & Marketing, Bitwise

https://www.linkedin.com/in/unamigo/

The post Bitwise: Cloud Data Warehouse Modernization – Inside Look at Talend Connect London appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

New Talend APAC Cloud Data Infrastructure Now Available!

Talend - Wed, 10/10/2018 - 16:37

As businesses in the primary economic hubs in Asia such as Tokyo, Banglore, Sydney and Singapore are growing at a historical level, they are moving to the cloud like never before. For those companies, their first and foremost priority is to fully leverage the value of their data while meeting strict local data residency, governance, and privacy requirements. Therefore, keeping data in a cloud data center that’s on the other side of the globe simply won’t be enough.

That’s why Talend is launching a new cloud data infrastructure in Japan, in addition to its US data center and the EU data center across Frankfurt and Dublin, in a secure and highly scalable Amazon Web Services (AWS) environment, to allow APAC customers to get cloud data integration and data management services closer to where the data is stored. This is most beneficial to local enterprise businesses and foreign companies who have plans to open up offices in the local region.

There are several benefits Talend Cloud customers can expect from this launch.

Accelerating Enterprise Cloud Adoption

Whether your cloud-first strategy is about modernizing legacy IT infrastructure, leveraging a hybrid cloud architecture, or building a multiple cloud platform, Talend new APAC cloud data infrastructure will allow your transition to the cloud become more seamless. With a Talend Cloud instance independently available in APAC, companies can build a cloud data lake or a cloud data warehouse for faster, more scalable and more agile analytics with more ease.

More Robust Performance

For customers who are using Talend Cloud services in the Asia Pacific, this new cloud data infrastructure will lead to faster extract, transform and load time despite of the data volume. Additionally, it will boost performance for customers using AWS services such as Amazon EMR, Amazon Redshift, Amazon Aurora and Amazon DynamoDB.

Increased Data Security with Proximity

Maintaining data within the local region means the data do not have to make a long trip outside of the immediate area, which can reduce the risk of data security breaches at rest, in transit,  and in use and ease companies’ worries about security measures.

Reduced Compliance and Operational Risks

Because the new data infrastructure offers an instance of Talend Cloud that is deployed independently from the US or the EU, companies can maintain higher standards regarding their data stewards, data privacy, and operational best practices.

For Japan customers, they are likely to be better compliant with Japan’s stringent data privacy and security standards. In the case of industry and government regulation adjustments, Talend Cloud customers would still be able to maintain flexibility and agility to keep up with the changes.

If you are a Talend customer, you will soon have the opportunity to migrate your site to the new APAC data center. Log in or contact your account manager for more information.

Not a current Talend Cloud customers? Test drive Talend Cloud for 30 days free of charge or learn how Talend Cloud can help you connect your data from 900+ data sources to deliver big data cloud analytics instantly.

 

 

 

The post New Talend APAC Cloud Data Infrastructure Now Available! appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Five questions to ask about data lakes

SnapLogic - Tue, 10/09/2018 - 18:15

Data is increasingly being recognized as the corporate currency of the digital age. Companies want to leverage data to achieve deeper insights leading to competitive advantage over their peers. According to IDC projections, total worldwide data will surge to 163 zettabytes (ZB) by 2025, an increase of 10x the amount of what exists today. The[...] Read the full article here.

The post Five questions to ask about data lakes appeared first on SnapLogic.

Categories: ETL

5 Questions to Ask When Building a Cloud Data Lake Strategy

Talend - Tue, 10/09/2018 - 15:41

In my last blog post, I shared some thoughts on the common pitfalls when building a data lake. As the movement to the cloud gets more and more common, I’d like to further discuss some of the best practices when building a cloud data lake strategy. When going beyond the scope of integration tools or platforms for your cloud data lake, here are 5 questions to ask, that can be used as a checklist:

1. Does your Cloud Data Lake strategy include a Cloud Data Warehouse?

As many differences as there are between the two, people often times compare the two types of technology approaches. Data warehouses being the centralization of structured data, and Data Lakes often times being the holy grail of all types of data. (You can read more about the two approaches here.)

Not to confuse the two, as these technology approaches should actually be brought together. You will need a data lake to accommodate all types of data that your business deal with today, make it structured, semi-structured or unstructured, on-premise or in the cloud, or those newer types of data such as IoT data. The data lake often time has a landing zone and staging zone for raw data – data at this stage are not yet consumable, but you may want to keep them for future discovery or data science projects. On the other hand, a cloud data warehouse will be in the picture after data is cleansed, mapped and transformed, so that it is more consumable for business analysts to access and make the use of data for reporting or other analytical use. Data at this stage is often time highly processed to adjust to the data warehouse.

If your approach currently only works with a cloud data warehouse, then often time you are losing raw and some formats of data already, it is not so helpful for any prescriptive or advanced analytics projects, or machine learning and AI initiatives as some meanings within the data is already lost. Vice versa, if you don’t have a data warehouse alongside with your data lake strategy, you will end up with a data swamp where all data is kept with no structure, and not consumable by analysts.

From the integration perspective, make sure your integration tool work with both data lake and data warehouse technologies, which will lead us to the next question. 

Download Data Lakes: Purposes, Practices, Patterns, and Platforms now.
Download Now

2. Does your integration tool have ETL & ELT?

As much as you may know about ETL in your current on-premises data warehouse, moving it to the cloud is a different story, not to mention in a cloud data lake context. Where and how data is processed really depends on what you need for your business.

Similar to what we described in the first question, sometimes you need to keep more of the raw nature of the data, and other times you need more processing. This would require your integration tool to cope with both ETL and ELT capabilities, where the data transformation can be handled either before the data is loaded to your final target, e.g. a cloud data warehouse, or after data is landed there. ELT is more often leveraged when the speed of data ingestion is key to your project, or when you want to keep more intel about your data. Typically, cloud data lakes have a raw data store, then a refined (or transformed) data store. Data scientists, for example, prefer to access the raw data, whereas business users would like the normalized data for business intelligence.

Another use of ELT refers to the massive parallel processing capabilities coming with big data technologies such as Spark and Flink. If your use case requires such a strong processing power, then ELT is a better choice where the processing has more scalability.

3. Can your cloud data lake handle both simple ETL tasks and complex big data ones?

This may look like an obvious question but when you ask about this question, put yourself in the users’ shoes and really think through if your choice of tool can meet both requirements.

Not all of your data lake usage will be complex ones that require advanced processing and transformation, many of them can be simple activities such as ingesting new data into the data lake. Often times, the tasks go beyond the data engineering or IT team as well. So ideally the tool of your choice should be able to handle simple tasks fast and easy, but also can scale to the complexity to meet the requirements of advanced use cases.  Building a data lake strategy that can cope with both can help you make your data lake more consumable and practical for various types of users for different purposes.

4. How about batch and streaming needs?

You may think your current architecture and technology stack is good enough, and your business is not really in the Netflix business where streaming is a necessity. Get it? Well think again.

Streaming data has become a part of our everyday lives whether you realize it or not. The “Me” culture has put everything at the moment of now. If your business is on social media, you are in streaming. If IoT and sensor is the next growth market for your business, you are in streaming. If you have a website for customer interaction, you are in streaming. In IDC’s 2018 Data Integration and Integrity End User Survey, 93% of the respondents indicate the plan to use streaming technology by 2020. Real-time and streaming analytics have become a must for modern businesses today to create that competitive edge. So, this naturally raises the questions: can your data lake handle both your batch and streaming needs? Do you have the technology and people to work with streaming, which is fundamentally different from typical batch needs?

Streaming data is particularly challenging to handle because it is continuously generated by an array of sources and devices as well as being delivered in a wide variety of formats.

One prime example of just how complicated streaming data can be comes from the Internet of Things (IoT). With IoT devices, the data is always on; there is no start and no stop, it just keeps flowing. A typical batch processing approach doesn’t work with IoT data because of the continuous stream and the variety of data types it encompasses.

So make sure your data lake strategy and data integration layer can be agile enough to work with both use cases.

You can find more tips on streaming data, here.

5. Can your data lake strategy help cultivate a collaborative culture?

Last but not least, collaboration.

It may take one person to implement the technology, but it will take a whole village to implement it successfully. The only way to make sure your data lake is a success is to have people use it, improving the workflow one way or another.

In a smaller scope, the workflow in your data lake should be able to be reused and leveraged among data engineers. Less recreation will be needed, and operationalization can be much faster. In a bigger scope, the data lake approach can help improve the collaboration between IT and business teams. For example, your business teams are the experts of their data and they know the meaning and the context of data better than anyone else. Data quality can be much improved if the business team can work on the data for business rule transformations, while IT still governs that activity. Defining such a line with governance in place is a delicate work and no easy task. But you may think through your data lake approach, whether it’s governed but open at the same time to encourage not only final consume /usage of the data, but the improvement of data quality in the process, and be recycled to be available to a broader organization.

To summarize, there we go the 5 questions I would recommend asking when thinking about building a cloud data lake strategy. By no means are these the only questions you should think, but hopefully it initiates some thinking outside of your typical technical checklist. 

The post 5 Questions to Ask When Building a Cloud Data Lake Strategy appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

How to Implement a Job Metadata Framework using Talend

Talend - Tue, 10/09/2018 - 14:39

Today, data integration projects are not just about moving data from point A to point B, there is much more to it. The ever-growing volumes of data, the speed at which the data changes presents a lot of challenges in managing the end-to-end data integration process. In order to address these challenges, it is paramount to track the data-journey from source to target in terms of start and end timestamps, job status, business area, subject area, and the individuals responsible for a specific job. In other words, metadata is becoming a major player in data workflows. In this blog, I want to review how to implement a job metadata framework using Talend. Let’s get started!

Metadata Framework: What You Need to Know

The centralized management and monitoring of this job metadata are crucial to data management teams. An efficient and flexible job metadata framework architecture requires a number of things. Namely, a metadata-driven model and job metadata.

A typical Talend Data Integration job performs the following tasks for extracting the data from source systems and loading them into target systems.

  1. Extracting data from source systems
  2. Transforming the data involves:
    • Cleansing source attributes
    • Applying business rules
    • Data Quality
    • Filtering, Sorting, and Deduplication
    • Data aggregations
  3. Loading the data into a target systems
  4. Monitoring, Logging, and Tracking the ETL process

Figure 1: ETL process

Over the past few years, the job metadata has evolved to become an essential component of any data integration project. What happens when you don’t have job metadata in your data integration jobs? It may lead to incorrect ETL statistics and logging as well as difficult to handle errors occurred during the data integration process. A successful Talend Data Integration project depends on how well the job metadata framework is integrated with the enterprise data management process.

Job Metadata Framework

The job metadata framework is a meta-data driven model that integrates well with Talend product suite. Talend provides a set of components for capturing the statistics and logging information during the flight of the data integration process.

Remember, the primary objective of this blog is to provide an efficient way to manage the ETL operations with a customizable framework. The framework includes the Job management data model and the Talend components that support the framework.

Figure 2: Job metadata model

Primarily, the Job Metadata Framework model includes:

  • Job Master
  • Job Run Details
  • Job Run Log
  • File Tracker
  • Database High Water Mark Tracker for extracting the incremental changes

This framework is designed to allow the production support to monitor the job cycle refresh and look for the issues relating to job failure and any discrepancies while processing the data loads. Let’s go through each of piece of the framework step-by-step.

Talend Jobs

Talend_Jobs is a Job Master Repository table that manages the inventory of all the jobs in the Data Integration domain.

Attribute

Description

JobID

Unique Identifier to identify a specific job

JobName

Job Name is the name of the job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination>

BusinessAreaName

Business Unit / Department or Application Area

JobAuthorDomainID

Job author Information

Notes

Additional Information related to the job

LastUpdateDate

The last updated date

Talend Job Run Details

Talend_Job_Run_Details registers every run of a job and its sub jobs with statistics and run details such as job status, start time, end time, and total duration of main job and sub jobs.

Attribute

Description

ID

Unique Identifier to identify a specific job run

BusinessAreaName

Business Unit / Department or Application Area

JobAuthorDomainID

Job author Information

JobID

Unique Identifier to identify a specific job

JobName

Job Name is the name of the job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination>

SubJobID

Unique Identifier to identify a specific sub job

SubJobName

Sub Job Name is the name of the sub job as per the naming convention (<type>_<subject area>_<table_name>_<target_destination>

JobStartDate

Main Job Start Timestamp

JobEndDate

Main Job End Timestamp

JobRunTimeMinutes

Main Job total job execution duration

SubJobStartDate

Sub Job Start Timestamp

SubJobEndDate

Sub Job End Timestamp

SubJobRunTimeMinutes

Sub Job total job execution duration

SubJobStatus

Sub Job Status (Pending / Complete)

JobStatus

Main Job Status (Pending / Complete)

LastUpdateDate

The last updated date

Talend Job Run Log

Talend_Job_Run_Log logs all the errors occurred during particular job execution. Talend_Job_Run_Log extracts the details from the Talend components specially designed for catching logs (tLogCatcher) and statistics (tStatCacher).

Figure 3: Error logging and Statistics

The tLogCatcher component in Talend operates as a log function triggered during the process by one of these components: Java exceptions, tDie or tWarn. In order catch exceptions coming from the job, tCatch function needs to be enabled on all the components.

The tStatCatcher component gathers the job processing metadata at the job level.

Attribute

Description

runID

Unique Identifier to identify a specific job run

JobID

Unique Identifier to identify a specific job

Moment

The time when the message is caught

Pid

The Process ID of the Job

parent_pid

The Parent process ID

root_pid

The root process ID

system_pid

The system process ID

project

The name of the project

Job

The name of the Job

job_repository_id

The ID of the Job file stored in the repository

job_version

The version of the current Job

context

The Name of the current context

priority

The priority sequence

Origin

The name of the component if any

message_type

Begin or End

message

The error message generated by the component when an error occurs. This is an After variable. This variable functions only if the Die on error checkbox is cleared.

Code

 

duration

Time for the execution of a Job or a component with the tStatCaher Statistics check box selected

Count

Record counts

Reference

Job references

Thresholds

Log thresholds for managing error handling workflows

Talend High Water Marker Tracker

Talend_HWM_Tracker helps in processing delta and incremental changes of a particular table. The High Water Tracker is helpful when the “Change Data Capture” is not enabled and the changes are extracted based on specific conditions such as “last_updated_date_time” or ‘revision_date_time.” In some cases, the High Water Mark relates to the highest sequence number when the records are processed based on the sequence number.

 Attribute

Description

Id

Unique Identifier to identify a specific source table

jobID

Unique Identifier to identify a specific job

job_name

The name of the Job

table_name

The name of the source table

environment

The source table environment

database_type

The source table database type

hwm_datetime

High Water Field (Datetime)

hwm_integer

High Water Field (Number)

hwm_Sql

High Water SQL Statement

Talend File Tracker

Talend_File_Tracker registers all the transactions related to file processing. The transaction details include source file location, destination location, file name pattern, file name suffix, and the name of the last file processed.

Attribute

Description

Id

Unique Identifier to identify a specific source file

jobID

Unique Identifier to identify a specific job

job_name

The name of the Job

environment

The file server environment

file_name_pattern

The file name pattern

file_input_location

The source file location

file_destination_location

The target file location

file_suffix

The file suffix

latest_file_name

The name of the last file processed for a specific file

override_flag

The override flag to re-process the file with the same name

update_datetime

The last updated date

Conclusion

This brings to the end of the implementing Job metadata framework using Talend. The following are key takeaways from this blog:

  1. The need and the importance of Job metadata framework
  2. The data model to support the framework
  3. The customizable data model to support different types of job patterns.

As always – let me know if you have any questions below and happy connecting!

The post How to Implement a Job Metadata Framework using Talend appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Cloudera 2.0: Cloudera and Hortonworks Merge to form a Big Data Super Power

Talend - Thu, 10/04/2018 - 18:39

We’ve all dreamed of going to bed one day and waking up the next with superpowers – stronger, faster and even perhaps with the ability to fly.  Yesterday that is exactly what happened to Tom Reilly and the people at Cloudera and Hortonworks.   On October 2nd they went to bed as two rivals vying for leadership in the big data space. In the morning they woke up as Cloudera 2.0, a $700M firm, with a clear leadership position.  “From the edge to AI”…to infinity and beyond!  The acquisition has made them bigger, stronger and faster. 

Like any good movie, however, the drama is just getting started, innovation in the cloud, big data, IoT and machine learning is simply exploding, transforming our world over and over, faster and faster.  And of course, there are strong villains, new emerging threats and a host of frenemies to navigate.

What’s in Store for Cloudera and Hortonworks 2.0

Overall, this is great news for customers, the Hadoop ecosystem and the future of the market.  Both company’s customers can now sleep at night knowing that the pace of innovation from Cloudera 2.0 will continue and accelerate.  Combining the Cloudera and Hortonworks technologies means that instead of having to pick one stack or the other, now customers can have the best of both worlds. The statement from their press release “From the Edge to AI” really sums up how complementary some of the investments that Hortonworks made in IoT complement Cloudera’s investments in machine learning.  From an ecosystem and innovation perspective, we’ll see fewer competing Apache projects with much stronger investments.  This can only mean better experiences for any user of big data open source technologies.

At the same time, it’s no secret how much our world is changing with innovation coming in so many shapes and sizes.  This is the world that Cloudera 2.0 must navigate.  Today, winning in the cloud is quite simply a matter of survival.  That is just as true for the new Cloudera as it is for every single company in every industry in the world.  The difference is that Cloudera will be competing with a wide range of cloud-native companies both big and small that are experiencing explosive growth.  Carving out their place in this emerging world will be critical.

The company has so many of the right pieces including connectivity, computing, and machine learning.  Their challenge will be, making all of it simple to adopt in the cloud while continuing to generate business outcomes. Today we are seeing strong growth from cloud data warehouses like Amazon RedshiftSnowflakeAzure SQL Data Warehouse and Google Big Query.  Apache Spark and service players like Databricks and Qubole are also seeing strong growth.  Cloudera now has decisions to make on how they approach this ecosystem and they choose to compete with and who they choose to complement.

What’s In Store for the Cloud Players

For the cloud platforms like AWS, Azure, and Google, this recent merger is also a win.  The better the cloud services are that run on their platforms, the more benefits joint customers will get and the more they will grow their usage of these cloud platforms.  There is obviously a question of who will win, for example, EMR, Databricks or Cloudera 2.0, but at the end of the day the major cloud players will win either way as more and more data, and more and more insight runs through the cloud.

Talend’s Take

From a Talend perspective, this recent move is great news.  At Talend, we are helping our customers modernize their data stacks.  Talend helps stitch together data, computing platforms, databases, machine learning services to shorten the time to insight. 

Ultimately, we are excited to partner with Cloudera to help customers around the world leverage this new union.  For our customers, this partnership means a greater level of alignment for product roadmaps and more tightly integrated products. Also, as the rate of innovation accelerates from Cloudera, our support for what we call “dynamic distributions” means that customers will be able to instantly adopt that innovation even without upgrading Talend.  For Talend, this type of acquisition also reinforces the value of having portable data integration pipelines that can be built for one technology stack and can then quickly move to other stacks.  For Talend and Cloudera 2.0 customers, this means that as they move to the future, unified Cloudera platform, it will be seamless for them to adopt the latest technology regardless of whether they were originally Cloudera or Hortonworks customers. 

You have to hand it to Tom Reilly and the teams at both Cloudera and Hortonworks.  They’ve given themselves a much stronger position to compete in the market at a time when people saw their positions in the market eroding.  It’s going to be really interesting to see what they do with the projected $125 million in annualized cost savings.  They will have a lot of dry powder to invest in or acquire innovation.  They are going to have a breadth in offerings, expertise and customer base that will allow them to do things that no one else in the market can do. 

The post Cloudera 2.0: Cloudera and Hortonworks Merge to form a Big Data Super Power appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Tips for enhancing your data lake strategy

SnapLogic - Thu, 10/04/2018 - 18:10

As organizations grapple with how to effectively manage ever voluminous and varied reservoirs of big data, data lakes are increasingly viewed as a smart approach. However, while the model can deliver the flexibility and scalability lacking in traditional enterprise data management architectures, data lakes also introduce a fresh set of integration and governance challenges that[...] Read the full article here.

The post Tips for enhancing your data lake strategy appeared first on SnapLogic.

Categories: ETL

Why Cloud-native is more than software just running on someone else’s computer

Talend - Thu, 10/04/2018 - 10:17

The cloud is not “just someone else’s computer”, even though that meme has been spreading so fast on the internet. The cloud consists of extremely scalable data centers with highly optimized and automated processes. This makes a huge difference if you are talking about the level of application software.

So what is “cloud-native” really?

“Cloud-native” is more than just a marketing slogan. And a “cloud-native application” is not simply a conventionally developed application which is running on “someone else’s computer”. It is designed especially for the cloud, for scalable data centers with automated processes.

Software that is really born in the cloud (i.e. cloud-native) automatically leads to a change in thinking and a paradigm shift on many levels. From the outset, cloud-native developed applications are designed with scalability in mind and are optimized with regard to maintainability and agility.

They are based on the “continuous delivery” approach and thus lead to continuously improving applications. The time from development to deployment is reduced considerably and often only takes a few hours or even minutes. This can only be achieved with test-driven developments and highly automated processes.

Rather than some sort of monolithic structure, applications are usually designed as a loosely connected system of comparatively simple components such as microservices. Agile methods are practically always deployed, and the DevOps approach is more or less essential. This, in turn, means that the demands made on developers increase, specifically requiring them to have well-founded “operations” knowledge.

Download The Cloud Data Integration Primer now.
Download Now

Cloud-native = IT agility

With a “cloud-native” approach, organizations expect to have more agility and especially to have more flexibility and speed. Applications can be delivered faster and continuously at high levels of quality, they are also better aligned to real needs and their time to market is much faster as well. In these times of “software is eating the world”, where software is an essential factor of survival for almost all organizations, the significance of these advantages should not be underestimated.

In this context: the cloud certainly is not “just someone else’s computer”. And the “Talend Cloud” is more than just an installation from Talend that runs in the cloud. The Talend Cloud is cloud-native.

In order to achieve the highest levels of agility, in the end, it is just not possible to avoid changing over to the cloud. Potentially there could be a complete change in thinking in the direction of “serverless”, with the prospect of optimizing cost efficiency as well as agility.  As in all things enterprise technology, time will tell. But to be sure, cloud-native is an enabler on the rise.

About the author Dr. Gero Presser

Dr. Gero Presser is a co-founder and managing partner of Quinscape GmbH in Dortmund. Quinscape has positioned itself on the German market as a leading system integrator for the Talend, Jaspersoft/Spotfire, Kony and Intrexx platforms and, with their 100 members of staff, they take care of renowned customers including SMEs, large corporations and the public sector. 

Gero Presser did his doctorate in decision-making theory in the field of artificial intelligence and at Quinscape he is responsible for setting up the business field of Business Intelligence with a focus on analytics and integration.

The post Why Cloud-native is more than software just running on someone else’s computer appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Moving Big Data to the cloud: A big problem?

SnapLogic - Tue, 10/02/2018 - 13:49

Originally published on Data Centre Review. Digital transformation is overhauling the IT approach of many organizations and data is at the center of it all. As a result, organizations are going through a significant shift in where and how they manage, store and process this data. To manage big data in the not so distant[...] Read the full article here.

The post Moving Big Data to the cloud: A big problem? appeared first on SnapLogic.

Categories: ETL

California Leads the US in Online Privacy Rules

Talend - Tue, 10/02/2018 - 12:12

With California often being looked to as the state of innovation, the newly enforced California Consumer Privacy Act (CCPA) came as no surprise. This new online privacy law gives consumers the right to know what information companies are collecting about them, why they are collecting that data, and who they are sharing it with.

Some specific industries such as Banking or Health Sciences had already considered this type of compliance at the core of their digital transformation. But as the CCPA applies to potentially any company, no matter its size or industry, anyone serious about personalizing interactions with their visitors, prospects, customers, and employees needs to pay attention.

Similarities to GDPR

Although there are indeed some differences between GDPR and the CCPA, in terms of the data management and governance frameworks that needs to be established, the two are similar. These similarities include:

  • You need to know where your personal data is across your different system, which means that you need to run a data mapping exercise
  • You need to create a 360° view of your personal data and manage consent at a fine grain, although CCPA looks more permissive on consent than GDPR
  • You need to publish a privacy notice where you tell the regulation authorities, customers and other stakeholders what you are doing with the personal information within your database. You need to anonymize data (i.e. through data masking) for any other systems that includes personal data, but that you want to scope out from your compliance effort and privacy notice.   
  • You need to foster accountabilities so that the people in the companies that participate in the data processing effort are engaged for compliance
  • You need to know where your data is, including when it is shared or processed through third parties such as business partners or cloud providers. You need to control cross border data transfers and potential breaches while transparently communicating in cases of breaches 
  • You need to enact the data subject access rights, such as the right for data access, data rectification, data deletion, and data portability. CCPA allows a little more time to answer to a request, 45 days versus 1 month.  

Download Data Governance & Sovereignty: 16 Practical Steps towards Global Data Privacy Compliance now.
Download Now

Key Takeaways from the CCPA

The most important takeaway is that data privacy regulations are burgeoning for companies all over the world. With the stakes getting higher and higher, from the steep fines to the reputation risks, compliance consumers that can negatively affect the benefits of digital transformation).

While this law in its current state is specific to California, the idea of a ripple effect at the federal level might not be far off.  So instead of seeing it as a burden, such regulations should be taken as an opportunity. In fact, one of the side effects of all those regulations, today with data scandals now negatively impacting millions of consumers, is that data privacy now makes the headlines. Consumers are now understanding how valuable their data can be and how damaging the impact of losing control over personal data could be.

The lesson learned is that, although regulatory compliance is often what triggers a data privacy compliance project, it shouldn’t be the only driver. The goal is rather to establish a system of trust with your customers for their personal data. In a recent benchmark, where we exercised our right of data access and privacy against more than 100 companies, we could demonstrate that most company are very low on their maturity for achieving that goal. But it demonstrated as well that the best in class are setting the standards for turning it into a memorable experience.

 

 

 

 

 

The post California Leads the US in Online Privacy Rules appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Salesforce Connector in CloverETL

CloverETL - Tue, 08/09/2016 - 09:24

In an effort to constantly improve the lives of our users, we’ve enhanced our Salesforce connectivity and added a new, user-friendly (yet powerful) Salesforce connector into the CloverETL 4.3. You can now easily read, insert, update and delete Salesforce data with CloverETL, without having to expose yourself to the nuts and bolts of the two systems […]

The post Salesforce Connector in CloverETL appeared first on CloverETL Blog on Data Integration.

Categories: ETL

Building your own components in CloverETL

CloverETL - Tue, 07/19/2016 - 09:51

In this post, I’d like to cover a few things not only related to building components, but also related to: subgraphs and their ability to make your life easier; working with CloverETL public API; …and some other things I consider useful. This should give you a good idea of how to build your own (reusable) components and make them […]

The post Building your own components in CloverETL appeared first on CloverETL Blog on Data Integration.

Categories: ETL

Code Debugging in CloverETL Designer

CloverETL - Mon, 07/04/2016 - 03:53

EDIT: Updated to 4.3 Milestone 2 version, adding conditional breakpoints and watch/inspect options.   Code debugging is a productivity feature well known to developers from various programming environments. It allows you to control the execution of a piece of code line-by-line, and look for problems that are hard to spot during normal runs. Why would […]

The post Code Debugging in CloverETL Designer appeared first on CloverETL Blog on Data Integration.

Categories: ETL

Replacing legacy data software with CloverETL

CloverETL - Thu, 06/30/2016 - 02:56

  What does legacy data software mean to you: old software that’s currently outdated or existing software that works? Or, I should ask, are you a developer or a business stakeholder? No matter which side of the discussion you are on, replacing legacy data software is always a difficult conversation between developers and business stakeholders. On one […]

The post Replacing legacy data software with CloverETL appeared first on CloverETL Blog on Data Integration.

Categories: ETL

Data Partitioning: An Elegant Way To Parallelize Transformations Without Branching

CloverETL - Wed, 06/22/2016 - 05:54

Ever wondered what to do with those annoyingly slow operations inside otherwise healthy and fast transformations? You’ve done everything you could do to meet the processing time window, and now there’s this wicked API call that looks up some data, or a calculation that just sits there and takes ages to complete, record by record, […]

The post Data Partitioning: An Elegant Way To Parallelize Transformations Without Branching appeared first on CloverETL Blog on Data Integration.

Categories: ETL

Data Integration Challenges: Define Your Customer

Data integration blog - Fri, 04/29/2011 - 07:56

The IT and business alignment is a widely discussed challenge of data integration. The major data integration problem adds up to this: define customer.

Data from different functional areas doesn’t join up: sales orders are associated with the newly contracted customers, but the marketing campaign data is associated with prospects. Is a customer someone who’s actually bought something from you, or is a customer someone who’s interested in buying something from you? Should a definition include a certain demographic factor that reflects your typical buyer? If sales, marketing, service, and finance can all agree on a single definition of customer, then all the associated transactions could be easily integrated.

The thing is that all these specialists have their understanding of the word “customer”. That is why it is next to impossible for them to agree on a single definition and you have to somehow manage data integration without it.

To solve this issue, you can define what each functional area (and each CRM system) means by “customer”. This is how we know that customer data coming from a marketing system includes prospects, as well as existing customers. With this information, you can build a semantic model to understand how the different definitions of customer relate to one another.

Using this model, it would be possible to associate supply data with parts, cost data with product class, marketing data with brands, and so on. The relationships among these entities allow for data integration from different functional areas. This semantic model may be complex, but try to accept it and don’t head for simplifying it. The world is complex. Data integration requires a sophisticated understanding of your business, and standardizing vocabulary is not going to be the right answer to this challenge.

Categories: ETL

iPaaS: A New Trend In Data Integration?

Data integration blog - Wed, 04/20/2011 - 09:51

iPaaS (integration platform-as-a-service) is a development platform for building integration applications. It provides a set of capabilities for data integration and application integration in the Cloud and on-premises.

There are very few vendors offering iPaaS solutions at the moment. Although Gartner recognizes and uses the term, it still sounds confusing to researchers and data integration experts. So how does iPaaS work and can it benefit your data integration efforts?

Integration platform delivers a combination of data integration, governance, security and other capabilities to link applications, SOA services, and Cloud services. In addition to basic features that a Cloud solution should have, such as multi-tenancy, elasticity, and reliability, there are other capabilities relevant for iPaaS:

    1. Intermediation, the ability to integrate applications and services using the Cloud scenarios, which include SaaS and Cloud services, on-premises apps and resources.
    2. Orchestration between services, which requires connectivity and the ability to map data.
    3. Service containers to enable users publish their own services using either RESTful or SOAP technologies.
    4. Security covers the ability to authenticate and authorize access to any resource on the platform, as well as to manage this access.
    5. Enterprise Data Gateway installed on-premises and used as a proxy to access enterprise resources.

Data integration and application integration with and within the Cloud is the concept that business owners should consider nowadays. As of today, iPaaS would mostly appeal to companies that don’t mind building their own IT solutions or to ISVs that need to integrate Cloud silos they have created previously. It will be interesting to see whether iPaaS will become the next trend in the data integration discipline.

Categories: ETL

Salesforce Integration with QuickBooks: Out-of-the-box Solution on its Way

Data integration blog - Wed, 04/06/2011 - 05:41

Salesforce.com and Intuit have signed a partnership agreement to provide Salesforce integration with QuickBooks to Intuit’s four million customers. The companies promise to finish developing the integrated solution in summer.

The solution is going to make CRM processes more convenient and transparent by displaying customer data along with financial information. Salesforce integration with QuickBooks will enable businesses to synchronize customer data in Salesforce.com CRM with financial data in QuickBooks and QuickBooks Online. This will solve an issue of double data entry in two different systems.

Salesforce integration with QuickBooks will help small business owners to make better decisions. According to Intuit’s survey, more than 50% of small businesses perform CRM activities manually with pen and paper or with software, which is not designed for that.

With thousands of small businesses using both QuickBooks and Salesforce.com, the integration of two systems is a great way to leverage the power of cloud computing and data integration strategies to help businesses grow.

Categories: ETL

Is Your Data Integration Technology Outdated?

Data integration blog - Sat, 04/02/2011 - 10:49

Spring is a good time to get rid of the old stuff and check out something new. This might as well be the time to upgrade your data integration tools. How can you learn if your data integration solution is outdated and should be replaced by something more productive? May be it just needs a little tuning? Here are the main check points to see if your solution’s performance still fits the industry standards.

Data transformation schemas
deal with both data structure and content. If data mappings are not as well-organized as possible, then a single transformation may take twice as long. Mapping problems can cause small delays that add up. The solution to the transformation issue is to make sure that data maps are written as efficiently as possible. You can compare your data integration solution to the similar ones to understand if the data transformation runs with the required speed.

Business rules processing are specific rules for the data that has to be validated. Too many rules can suspend your data integration processes. You have to make sure that the amount of rules in you data integration system is optimal, meaning that there are not too many of them running at the same time.

Network bandwidth and traffic—in many cases the performance is hindered not by the data integration tool itself, but by the size of the network you use. To avoid this issue, you need calculate the predicted performance under various loads and make sure you use the fastest network available for the data integration needs.

Data integration solution reminds a car: it can run but become slow if it is not properly tuned and taken care of. As we become more dependent upon the data integration technology, our ability to understand and optimize the performance issues will make a substantial difference.

Categories: ETL
Syndicate content