AdoptOS

Assistance with Open Source adoption

Open Source News

Vtiger recognized as a 2018 Gartner Peer Insights Customers’ Choice for Sales Force Automation

VTiger - Mon, 10/22/2018 - 16:46
We are excited to announce that we have been named a Gartner’s 2018 Peer Insights Customers’ Choice for Sales Force Automation. All of us here at Vtiger are proud of this demonstration of our growing commitment to putting customers at the heart of what we do. The award, Gartner explains “is a recognition of vendors […]
Categories: CRM

Asset Display Contributors in Action

Liferay - Sun, 10/21/2018 - 14:19

Display pages functionality in Liferay always was tightly coupled with the Web Content articles, we never had plans to support more or less the same technology for other types of assets even though we had many of these types: Documents & Media, Bookmarks, Wiki, etc... Even User is an asset and every user has corresponding AssetEntry in the database. But for Liferay 7.1 we decided to change this significantly, we introduced a new concept for the Display Pages, based on Fragments, very flexible, much more attractive than the old ones and...we still support only Web Content articles visualization :). Good news for the developers is that the framework is extensible now and it is easy to implement an AssetDisplayContributor and visualize any type of asset using our great display pages, based on fragments and in this article I want to show you how to do it on example.

Let's imagine that we want to launch a recruitment site, typical one with tons of job-offers, candidates profiles, thematic blogs etc. One of the main functionalities must be a candidate profile page - some sort of landing page with the basic candidate information, photo, personal summary, and skills. And this task can be solved using new Display Pages.

As I mentioned before, User is an Asset in Liferay and there is a corresponding AssetEntry for each User, it is good since for now, we support visualization only for the Asset Entries. To achieve our goal we need two things, first - an AssetDisplayContributor implementation for the User, to know which fields are mappable and which values correspond to those fields, and second - custom friendly URL resolver to be able to get our users profile page by a friendly URL with the user's screen name in it.

 Let's implement the contributor first, it is very simple(some repeated code is skipped, the full class available on GitHub):

@Component(immediate = true, service = AssetDisplayContributor.class) public class UserAssetDisplayContributor implements AssetDisplayContributor { @Override public Set<AssetDisplayField> getAssetDisplayFields( long classTypeId, Locale locale) throws PortalException { Set<AssetDisplayField> fields = new HashSet<>(); fields.add( new AssetDisplayField( "fullName", LanguageUtil.get(locale, "full-name"), "text"));   /* some fields skipped here, see project source for the full implementation */ fields.add( new AssetDisplayField( "portrait", LanguageUtil.get(locale, "portrait"), "image")); return fields; } @Override public Map<String, Object> getAssetDisplayFieldsValues( AssetEntry assetEntry, Locale locale) throws PortalException { Map<String, Object> fieldValues = new HashMap<>(); User user = _userLocalService.getUser(assetEntry.getClassPK()); fieldValues.put("fullName", user.getFullName());   /* some fields skipped here, see project source for the full implementation */       ServiceContext serviceContext = ServiceContextThreadLocal.getServiceContext(); fieldValues.put( "portrait", user.getPortraitURL(serviceContext.getThemeDisplay())); return fieldValues; } @Override public String getClassName() { return User.class.getName(); } @Override public String getLabel(Locale locale) { return LanguageUtil.get(locale, "user"); } @Reference private UserLocalService _userLocalService; }

As you can see, there are two main methods - getAssetDisplayFields which defines the set of AssetDisplayField objects, with the field name, label and the type (for the moment we support two types - text and image trying to convert to text all non-text values, like numbers, booleans, dates, and lists of strings) and getAssetDisplayFieldsValues which defines the values for those fields using specific AssetEntry instance.  There is a possibility to provide different field sets for the different subtypes of entities like we do it for the different Web Content structures, using the classTypeId parameter.

The second task is to implement corresponding friendly URL resolver to be able to get our profiles by users screen name. Here I'll show only the implementation of the getActualURL method of FriendlyURLResolver interface because it is the method that matters, but the full code of this resolver is also available in GitHub.

@Override public String getActualURL( long companyId, long groupId, boolean privateLayout, String mainPath, String friendlyURL, Map<String, String[]> params, Map<String, Object> requestContext) throws PortalException { String urlSeparator = getURLSeparator(); String screenName = friendlyURL.substring(urlSeparator.length()); User user = _userLocalService.getUserByScreenName(companyId, screenName); AssetEntry assetEntry = _assetEntryLocalService.getEntry(User.class.getName(), user.getUserId()); HttpServletRequest request = (HttpServletRequest)requestContext.get("request"); ServiceContext serviceContext = ServiceContextFactory.getInstance(request); AssetDisplayPageEntry assetDisplayPageEntry = _assetDisplayPageEntryLocalService.fetchAssetDisplayPageEntry( assetEntry.getGroupId(), assetEntry.getClassNameId(), assetEntry.getClassPK()); if (assetDisplayPageEntry == null) { LayoutPageTemplateEntry layoutPageTemplateEntry = _layoutPageTemplateEntryService. fetchDefaultLayoutPageTemplateEntry(groupId, assetEntry.getClassNameId(), 0); _assetDisplayPageEntryLocalService.addAssetDisplayPageEntry( layoutPageTemplateEntry.getUserId(), assetEntry.getGroupId(), assetEntry.getClassNameId(),   assetEntry.getClassPK(), layoutPageTemplateEntry.getLayoutPageTemplateEntryId(), serviceContext); } String requestUri = request.getRequestURI(); requestUri = StringUtil.replace(requestUri, getURLSeparator(), "/a/"); return StringUtil.replace( requestUri, screenName, String.valueOf(assetEntry.getEntryId())); }

The key part here is that we need to know which AssetDisplayPageEntry corresponds to the current user. For the Web Content articles, we have a corresponding UI to define Display Page during the content editing. In the case of User, it is also possible to create the UI and save the ID of the page in the DB but to make my example simple I prefer to fetch default display page for the User class and create corresponding AssetDisplayPageEntry if it doesn't exist. And at the end of the method, we just redirect the request to our Asset Display Layout Type Controller to render the page using corresponding page fragments.

That's it. There are more tasks left, but there is no need to deploy anything else. Now let's prepare the fragments, create a Display Page and try it out! For our Display Page, we need 3 fragments: Header, Summary, and Skills. You can create your own fragments with editable areas and map them as you like, but in case if you are still not familiar with the new Display Pages mapping concept I recommend you to download my fragments collection and import them to your site.

When you have your fragments ready you can create a Display Page, just go to Build -> Pages -> Display Pages, click plus button and put the fragments in the order you like. This is how it looks using my fragments: 

Clicking on any editable area(marked with the dashed background) you can map this area to any available field of available Asset Type(there should be 2 - Web Content Article and User). Choose User type and map all the fields you would like to show on the Display Page and click Publish button. After publishing it is necessary to mark our new Display Page as default for this Asset Type, this action is available in the kebab menu of the display page entry:

Now we can create a user and try our new Display Page. Make sure you specified all the fields you mapped, in my case the fields are - First name, Last name, Job title, Portrait, Birthday, Email, Comments(as a Summary), Tags(as Skills list) and Organization(as Company). Save the user and use it's screen name to get the final result:

It is possible to create a corresponding AssetDisplayContributor for any type of Asset and use it to visualize your assets in a brand new way using Fragment-based Display Pages.

Full code of this example is available here.

Hope it helps! If you need any help implementing contributors for your Assets feel free to ask in comments.

Pavel Savinov 2018-10-21T19:19:00Z
Categories: CMS, ECM

Online training: Custom Fields and Profiles, October 24th

CiviCRM - Fri, 10/19/2018 - 10:12

Would you like to gather information from your contacts that is relevant to your organization when they sign up for an event, membership or make an online donation? Do you have questions about creating an online newsletter sign up? 

The upcoming training session "Customize your Database with Custom Fields and Profiles" taught by Cividesk will demonstrate best practices for creating custom fields and how to include those fields in a profile. You will also learn about the many uses of profiles and how this feature can be maximized to improve your organization's efficiency.

Categories: CRM

How to use the price comparison for an online shop?

PrestaShop - Fri, 10/19/2018 - 07:47
Still don´t know the big advantages of using a price comparer for your business site?
Categories: E-commerce

UK CiviSprint as a newbie

CiviCRM - Fri, 10/19/2018 - 03:41

I didn’t hide the fact that I’d been feeling daunted by the prospect of the Sprint. Knowing that I’d be the least techie by some way even amongst the non-devs, I was also acutely aware of being a newbie to the community - after a year and a half as a CiviCRM user, I’d only had five weeks of working with Rose Lanigan and learning the basics of implementation. But I needn’t have worried, soon realising that:

a)       In any group, someone has to be the least technical. It’s an opportunity to learn and to bring a different perspective.

Categories: CRM

Fragments extension: Fragment Entry Processors

Liferay - Fri, 10/19/2018 - 00:02

In Liferay 7.1 we presented a new vision to the page authoring process. The main idea was to empower business users to create pages and visualize contents in a very visual way, without a need to know technical stuff like Freemarker or Velocity for the Web Content templates. To make it possible we introduced the fragment concept.

In our vision fragment is a construction block which can be used to build new content pages, display pages or content page templates. Fragment consists of HTML markup, CSS stylesheet, and Javascript code.

Despite the fact that we really wanted to create a business-user-friendly application, we always remember about our strong developers community and their needs. Fragments API is extensible and allows you to create your custom markup tags to enrich your fragments code and in this article, I would like to show you how to create your own processors for the fragments markup.

As an example, we are going to create a custom tag which shows a UNIX-style fortune cookie :). Our fortune cookie module has the following structure:

We use Jsoup library to parse fragments HTML markup so we have to include it into our build file(since it doesn't come within the Portal core) among other dependencies. 

sourceCompatibility = "1.8" targetCompatibility = "1.8" dependencies { compileInclude group: "org.jsoup", name: "jsoup", version: "1.10.2" compileOnly group: "com.liferay", name: "com.liferay.fragment.api", version: "1.0.0" compileOnly group: "com.liferay", name: "com.liferay.petra.string", version: "2.0.0" compileOnly group: "com.liferay.portal", name: "com.liferay.portal.kernel", version: "3.0.0" compileOnly group: "javax.portlet", name: "portlet-api", version: "3.0.0" compileOnly group: "javax.servlet", name: "javax.servlet-api", version: "3.0.1" compileOnly group: "org.osgi", name: "org.osgi.service.component.annotations", version: "1.3.0" }

OSGi bnd.bnd descriptor has nothing special because we don't export any package and don't provide any capability:

Bundle-Name: Liferay Fragment Entry Processor Fortune Bundle-SymbolicName: com.liferay.fragment.entry.processor.fortune Bundle-Version: 1.0.0

Every Fragment Entry Processor implementation has two main methods, first one - to process fragments HTML markup and second - to validate the markup to avoid saving fragments with invalid markup.

/** * @param fragmentEntryLink Fragment Entry link object to get editable * values needed for a particular case processing. * @param html Fragment markup to process. * @param mode Processing mode (@see FragmentEntryLinkConstants) * @return Processed Fragment markup. * @throws PortalException */ public String processFragmentEntryLinkHTML( FragmentEntryLink fragmentEntryLink, String html, String mode) throws PortalException; /** * @param html Fragment markup to validate. * @throws PortalException In case of any invalid content. */ public void validateFragmentEntryHTML(String html) throws PortalException;

FragmentEntryLink object gives us access to the particular fragment usage on a page, display page or page template and can be used if we want our result to depend on the particular usage parameters. Mode parameter can be used to give additional processing(or remove unnecessary processing) options in the EDIT(or VIEW) mode.

In this particular case, we don't need the validation method, but we have a good example in the Portal code.

Let's implement our fortune cookie tag processor! The only thing we have to do here is to iterate through all fortune tags we meet and replace them with a random cookie text. As I mentioned before, we use Jsoup to parse the markup and work with the document:

@Override public String processFragmentEntryLinkHTML( FragmentEntryLink fragmentEntryLink, String html, String mode) { Document document = _getDocument(html); Elements elements = document.getElementsByTag(_FORTUNE); Random random = new Random(); elements.forEach( element -> { Element fortuneText = document.createElement("span"); fortuneText.attr("class", "fortune"); fortuneText.text(_COOKIES[random.nextInt(7)]); element.replaceWith(fortuneText); }); Element bodyElement = document.body(); return bodyElement.html(); } private Document _getDocument(String html) { Document document = Jsoup.parseBodyFragment(html); Document.OutputSettings outputSettings = new Document.OutputSettings(); outputSettings.prettyPrint(false); document.outputSettings(outputSettings); return document; } private static final String[] _COOKIES = { "A friend asks only for your time not your money.", "If you refuse to accept anything but the best, you very often get it.", "Today it's up to you to create the peacefulness you long for.", "A smile is your passport into the hearts of others.", "A good way to keep healthy is to eat more Chinese food.", "Your high-minded principles spell success.", "The only easy day was yesterday." }; private static final String _FORTUNE = "fortune";

 

That is it. After deploying this module to our Portal instance, fortune tag is ready to use in the Fragments editor:

It is up to you how to render your personal tag, which attributes to use, which technology to use to process tags content. You can even create your own script language, or apply the one which you already have in your CMS to avoid massive refactoring and use existing templates as-is.

Full Fortune Fragment Entry Processor code can be found here.

Hope it helps!

Pavel Savinov 2018-10-19T05:02:00Z
Categories: CMS, ECM

Introduction to the Agile Data Lake

Talend - Thu, 10/18/2018 - 09:41

Let’s be honest, the ‘Data Lake’ is one of the latest buzz-words everyone is talking about. Like many buzzwords, few really know how to explain what it is, what it is supposed to do, and/or how to design and build one.  As pervasive as they appear to be, you may be surprised to learn that Gartner predicts that only 15% of Data Lake projects make it into production.  Forrester predicts that 33% of Enterprises will take their attempted Data Lake projects off life-support.  That’s scary!  Data Lakes are about getting value from enterprise data, and given these statistics, its nirvana appears to be quite elusive.  I’d like to change that and share my thoughts and hopefully providing some guidance for your consideration on how to design, build, and use a successful Data Lake; An Agile Data Lake.  Why agile? Because to be successful, it needs to be.

Ok, to start, let’s look at the Wikipedia definition for what a Data Lake is:

“A data lake is a storage repository that holds a vast amount of raw data in its native format, incorporated as structured, semi-structured, and unstructured data.”

Not bad.  Yet considering we need to get value from a Data Lake this Wikipedia definition is just not quite sufficient. Why? The reason is simple; you can put any data in the lake, but you need to get data out and that means some structure must exist. The real idea of a data lake is to have a single place to store of all enterprise data, ranging from raw data (which implies an exact copy of source system data) through transformed data, which is then used for various business needs including reporting, visualization, analytics, machine learning, data science, and much more.

I like a ‘revised’ definition from Tamara Dull, Principal Evangelist, Amazon Web Services, who says:

“A data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data, where the data structure and requirements are not defined until the data is needed.”

Much better!  Even Agile-like. The reason why this is a better definition is that it incorporates both the prerequisite for data structures and that the stored data would then be used in some fashion, at some point in the future.  From that we can safely expect value and that exploiting an Agile approach is absolutely required.  The data lake therefore includes structured data from relational databases (basic rows and columns), semi-structured data (like CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (typically images, pictures, audio, & video) thus creating a centralized data store accommodating all forms of data.  The data lake then provides an information platform upon which to serve many business use cases when needed.  It is not enough that data goes into the lake, data must come out too.

And, we want to avoid the ‘Data Swamp’ which is essentially a deteriorated and/or unmanaged data lake that is inaccessible to and/or unusable by its intended users, providing little to no business value to the enterprise.  Are we on the same page so far?  Good.

Data Lakes – In the Beginning

Before we dive deeper, I’d like to share how we got here.  Data Lakes represent an evolution resulting from an explosion of data (volume-variety-velocity), the growth of legacy business applications plus numerous new data sources (IoT, WSL, RSS, Social Media, etc.), and the movement from on-premise to cloud (and hybrid). 

Additionally, business processes have become more complex, new technologies have recently been introduced enhancing business insights and data mining, plus exploring data in new ways like machine learning and data science.  Over the last 30 years we have seen the pioneering of a Data Warehouse (from the likes of Bill Inmon and Ralph Kimball) for business reporting all the way through now to the Agile Data Lake (adapted by Dan Linstedt, yours truly, and a few other brave souls) supporting a wide variety of business use cases, as we’ll see.

To me, Data Lakes represent the result of this dramatic data evolution and should ultimately provide a common foundational, information warehouse architecture that can be deployed on-premise, in the cloud, or a hybrid ecosystem. 

Successful Data Lakes are pattern based, metadata driven (for automation) business data repositories, accounting for data governance and data security (ala GDPR & PII) requirements.  Data in the lake should present coalesced data and aggregations of the “record of truth” ensuring information accuracy (which is quite hard to accomplish unless you know how), and timeliness.  Following an Agile/Scrum methodology, using metadata management, applying data profiling, master data management, and such, I think a Data Lake must represent a ‘Total Quality Management” information system.  Still with me?  Great!

What is a Data Lake for?

Essentially a data lake is used for any data-centric, business use case, downstream of System (Enterprise) Applications, that help drive corporate insights and operational efficiency.  Here are some common examples:

  • Business Information, Systems Integration, & Real Time data processing
  • Reports, Dashboards, & Analytics
  • Business Insights, Data Mining, Machine Learning, & Data Science
  • Customer, Vendor, Product, & Service 360

How do you build an Agile Data Lake? As you can see there are many ways to benefit from a successful Data Lake.  My question to you is, are you considering any of these?  My bet is that you are.  My next questions are; Do you know how to get there?  Are you able to build a Data Lake the RIGHT way and avoid the swamp?  I’ll presume you are reading this to learn more.  Let’s continue…

There are three key principles I believe you must first understand and must accept:

  • ⇒ A PROPERLY implemented Ecosystem, Data Models, Architecture, & Methodologies
  • ⇒ The incorporation of EXCEPTIONAL Data Processing, Governance, & Security
  • ⇒ The deliberate use of Job Design PATTERNS and BEST PRACTICES

A successful Data Lake must also be agile which then becomes a data processing and information delivery mechanism designed to augment business decisions and enhance domain knowledge.  A Data Lake, therefore, must have a managed lifecycle.  This life cycle incorporates 3 key phases:

  1. INGESTION:
    • Extracting raw source data, accumulating (typically written to flat files) in a landing zone or staging area for downstream processing & archival purposes
  2. ADAPTATION:
    • Loading & Transformation of this data into usable formats for further processing and/or use by business users
  3. CONSUMPTION:
    • Data Aggregations (KPI’s, Data-points, or Metrics)
    • Analytics (actuals, predictive, & trends)
    • Machine Learning, Data Mining, & Data Science
    • Operational System Feedback & Outbound Data Feeds
    • Visualizations, & Reporting

The challenge is how to avoid the swamp.  I believe you must use the right architecture, data models, and methodology.  You really must shift away your ‘Legacy’ thinking; adapt and adopt a ‘Modern’ approach.  This is essential.  Don’t fall into the trap of thinking you know what a data lake is and how it works until you consider these critical points.

Ok then, let’s examine then these three phases a bit more.  Data Ingestion is about capturing data, managing it, and getting it ready for subsequent processing.  I think of this like a box crate of data, dumped onto the sandy beach of the lake; a landing zone called a ’Persistent Staging Area’.  Persistent because once it arrives, it stays there; for all practical purposes, once processed downstream, becomes an effective archive (and you don’t have to copy it somewhere else).  This PSA will contain data, text, voice, video, or whatever it is, which accumulates.

You may notice that I am not talking about technology yet.  I will but, let me at least point out that depending upon the technology used for the PSA, you might need to offload this data at some point.  My thinking is that an efficient file storage solution is best suited for this 1st phase.

Data Adaptation is a comprehensive, intelligent coalescence of the data which must adapt organically to survive and provide value.  These adaptations take several forms (we’ll cover them below) yet essentially reside 1st in a raw, lowest level of granulation, data model which then can be further processed, or as I call it, business purposed, for a variety of domain use cases.  The data processing requirements here can be quite involved so I like to automate as much of this as possible.  Automation requires metadata.  Metadata management presumes governance.  And don’t forget security.  We’ll talk about these more shortly.

Data Consumption is not just about business users, it is about business information, the knowledge it supports, and hopefully, the wisdom derived from it.  You may be familiar with the DIKW Pyramid; Data > Information > Knowledge > Wisdom.  I like to insert ‘Understanding’ after ‘Knowledge’ as it leads wisdom.

Data should be treated as a corporate asset and invested as such.  Data then becomes a commodity and allows us to focus on the information, knowledge, understanding, and wisdom derived from it.  Therefore, it is about the data and getting value from it. 

Data Storage Systems: Data Stores

Ok, as we continue to formulate the basis for building a Data Lake, let’s look at how we store data.  There are many ways we do this.  Here’s a review:

  • DATABASE ENGINES:
    • ROW: traditional Relational Database System (RDBMS) (ie: Oracle, MS SQL Server, MySQL, etc)
  • COLUMNAR: relatively unknown; feels like a RDBMS but optimized for Columns  (ie: Snowflake, Presto, Redshift, Infobright, & others)
  • NoSQL – “Not Only SQL”:
    • Non-Relational, eventual consistency storage & retrieval systems (ie: Cassandra, MongoDB, & more)
  • HADOOP:
    • Distributed data processing framework supporting high data Volume, Velocity, & Variety (ie: Cloudera, Hortonworks, MapR, EMR, & HD Insights)
  • GRAPH – “Triple-Store”:
    • Subject-Predicate-Object, index-free ‘triples’; based upon Graph theory (ie: AlegroGraph, & Neo4J)
  • FILE SYSTEMS:
    • Everything else under the sun (ie: ASCII/EBCDIC, CSV, XML, JSON, HTML, AVRO, Parquet)

There are many ways to store our data, and many considerations to make, so let’s simplify our life a bit and call them all ‘Data Stores’, regardless of them being Source, Intermediate, Archive, or Target data storage.  Simply pick the technology for each type of data store as needed.

Data Governance

What is Data Governance?  Clearly another industry enigma.  Again, Wikipedia to the rescue:

Data Governance is a defined process that an organization follows to ensure that high quality data exists throughout the complete lifecycle.”

Does that help?  Not really?  I didn’t think so.  The real idea of data governance is to affirm data as a corporate asset, invest & manage it formally throughout the enterprise, so it can be trusted for accountable & reliable decision making.  To achieve these lofty goals, it is essential to appreciate Source through Target lineage.  Management of this lineage is a key part of Data Governance and should be well defined and deliberately managed.  Separated into 3 areas, lineage is defined as:

  • ⇒ Schematic Lineage maintains the metadata about the data structures
  • ⇒ Semantic Lineage maintains the metadata about the meaning of data
  • ⇒ Data Lineage maintains the metadata of where data originates & its auditability as it changes allowing ‘current’ & ‘back-in-time’ queries

It is fair to say that a proper, in-depth discussion on data governance, metadata management, data preparation, data stewardship, and data glossaries are essential, but if I did that here we’d never get to the good stuff.  Perhaps another blog?  Ok, but later….

Data Security

Data Lakes must also ensure that personal data (GDPR & PII) is secure and can be removed (disabled) or updated upon request.  Securing data requires access policies, policy enforcement, encryption, and record maintenance techniques.  In fact, all corporate data assets need these features which should be a cornerstone of any Data Lake implementation.  There are three states of data to consider here:

  • ⇒ DATA AT REST in some data store, ready for use throughout the data lake life cycle
  • ⇒ DATA IN FLIGHT as it moves through the data lake life cycle itself
  • ⇒ DATA IN USE perhaps the most critical, at the user-facing elements of the data lake life cycle

Talend works with several technologies offering data security features.  In particular, ‘Protegrity Cloud Security’ provides these capabilities using Talend specific components and integrated features well suited for building an Agile Data Lake.  Please feel free to read “BUILDING A SECURE CLOUD DATA LAKE WITH AWS, PROTEGRITY AND TALEND” for more details.  We are working together with some of our largest customers using this valuable solution.

Agile Data Lake Technology Options

Processing data into and out of a data lake requires technology, (hardware/software) to implement.  Grappling with the many, many options can be daunting.  It is so easy to take these for granted, picking anything that sounds good.  It’s only after or until better understanding the data involved, systems chosen, and development efforts does one find that the wrong choice has been made.  Isn’t this the definition of a data swamp?  How do we avoid this?

A successful Data Lake must incorporate a pliable architecture, data model, and methodology.  We’ve been talking about that already.  But picking the right ‘technology’ is more about the business data requirements and expected use cases.  I have some good news here.  You can de-couple the data lake designs from the technology stack.  To illustrate this, here is a ‘Marketecture’ diagram of depicting the many different technology options crossing through the agile data lake architecture.

As shown above, there are many popular technologies available, and you can choose different capabilities to suit each phase in the data lake life cycle.  For those who follow my blogs you already know I do have a soft spot for Data Vault.  Since I’ve detailed this approach before, let me simply point you to some interesting links:

You should know that Dan Linstedt created this approach and has developed considerable content you may find interesting.  I recommend these:

I hope you find all this content helpful.  Yes, it is a lot to ingest, digest, and understand (Hey, that sounds like a data lake), but take the time.  If you are serious about building and using a successful data lake you need this information.

The Agile Data Lake Life Cycle

Ok, whew – a lot of information already and we are not quite done.  I have mentioned that a data lake has a life cycle.  A successful Agile Data Lake Life Cycle incorporates the 3 phases I’ve described above, data stores, data governance, data security, metadata management (lineage), and of course: ‘Business Rules’.  Notice that what we want to do is de-couple ‘Hard’ business rules (that transform physical data in some way) from ‘Soft’ business rules (that adjust result sets based upon adapted queries).  This separation contributes to the life cycle being agile. 

Think about it, if you push physical data transformations upstream then when the inevitable changes occur, the impact is less to everything downstream.  On the flip side, when the dynamics of business impose new criteria, changing a SQL ‘where’ clauses downstream will have less impact on data models it pulls from.  The Business Vault provides this insulation from the Raw Data Vault as it can be reconstituted when radical changes occur.

Additionally, a Data Lake is not a Data Warehouse but in fact, encapsulates one as a use case.  This is a critical takeaway from this blog.  Taking this further, we are not creating ‘Data Marts’ anymore, we want ‘Information Marts’.   Did you review the DIKW Pyramid link I mentioned above?  Data should, of course, be considered and treated as a business asset.  Yet simultaneously, data is now a commodity leading us to information, knowledge, and hopefully: wisdom.

This diagram walks through the Agile Data Lake Life Cycle from Source to Target data stores.  Study this.  Understand this.  You may be glad you did.  Ok, let me finish to say that to be agile a data lake must:

  • BE ADAPTABLE
    • Data Models should be additive without impact to existing model when new sources appear
  • BE INSERT ONLY
    • Especially for Big Data technologies where Updates & Deletes are expensive
  • PROVIDE SCALABLE OPTIONS
    • Hybrid infrastructures can offer extensive capabilities
  • ALLOW FOR AUTOMATION
    • Metadata, in many aspects, can drive the automation of data movement
  • PROVIDE AUDITABLE, HISTORICAL DATA
    • A key aspect of Data Lineage

And finally, consider that STAR Schemas are, and always were, designed to be ‘Information Delivery Mechanisms’, a misunderstanding some in the industry has fostered for many years.  For many years we have all built Data Warehouses using STAR schemas to deliver reporting and business insights.  These efforts all too often resulted in raw data storage of the data warehouse in rigid data structures, requiring heavy data cleansing, and frankly high impact when upstream systems are changed or added. 

The cost in resources and budget has been a cornerstone to many delays, failed projects, and inaccurate results.  This is a legacy mentality and I believe it is time to shift our thinking to a more modern approach.  The Agile Data Lake is that new way of thinking.  STAR schemas do not go away, but their role has shifted downstream, where they belong and always intended for.

Conclusion

This is just the beginning, yet I hope this blog post gets you thinking about all the possibilities now.

As a versatile technology and coupled with a sound architecture, pliable data models, strong methodologies, thoughtful job design patterns, and best practices, Talend can deliver cost-effective, process efficient and highly productive data management solutions.  Incorporate all of this as I’ve shown above and not only will you create an Agile Data Lake, but you will avoid the SWAMP!

Till next time…

The post Introduction to the Agile Data Lake appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Using BOMs to Manage Liferay Dependency Versions

Liferay - Wed, 10/17/2018 - 15:15

Liferay is a large project, and many developers who are attempting to get their customizations to work with Liferay will often end up asking the question, "What version of module W should I use at compile time when I'm running on Liferay X.Y.Z?" To answer that question, Liferay has some instructions on how to find versions in its document, Configuring Dependencies.

This blog entry is really to talk about what to do in situations where those instructions just aren't very useful.

Path to Unofficial BOMs

First, a little bit of background, because I think context is useful to know, though you can skip it if you want to get right to working with BOMs.

Back in late 2016, I started to feel paranoid that we'd start introducing major version changes to packages in the middle of a release and nobody would notice. To ease those concerns, I wrote a tool that indexed all of the packageinfo files in Liferay at each tag, and then I loaded up these metadata files with a Jupyter notebook and did a check for a major version change.

Then, like many other is it worth the time problems, it evolved into a script that I'd run once a week, and a small web-based tool so that I wouldn't have to fire up Jupyter every time I needed to check what was essentially static information.

Fast forward to February 2017, and our internal knowledge base was updated to allow for a new wiki format which (accidentally?) provided support for HTML with script tags. So, I chose to share my mini web-based tool to the wiki, which then lead our support team in Spain to share a related question that they'd been pondering for awhile.

Imagine if you happened to need to follow the Liferay document, Configuring Dependencies, for a lot of modules. Doesn't that lookup process get old really fast?

So, given that it was clearly possible to create an unofficial reference document for every Liferay exported package, wouldn't it be nice if we could create an official reference document that recorded every Liferay module version?

Since I had all of the metadata indexed anyway, I put together a separate tool that displayed the information stored in bnd.bnd files at every tag, which sort of let you look up module version changes between releases. This let people get a sense for what an official reference document might look like.

(Note: I learned a year later that bnd.bnd files are not the correct file to look at if you want to know the version at a given tag. Rather, you need to look at the artifact.properties files saved in the modules/.releng folder for that information. So in case it helps anyone feel better, Liferay's release and versioning process isn't obvious to anyone not directly involved with the release process, whether you're a Liferay employee or not.)

From the outside looking in, you might ask, why is it that our team didn't ask for Liferay to officially provide a "bill of materials" (BOMs), as described in the Introduction to Dependency Mechanism in the official Maven documentation? That way, you'd only specify the version of Liferay you're using, and the BOM would take care of the rest. If such a BOM existed, a lookup tool would be completely unnecessary.

Well, that's how the request actually started at the time of DXP's release, but since it wasn't happening, it got downgraded to a reference document which looked immediately achievable.

Fast forward to today. Still no official reference document for module versions, still no official Liferay BOMs.

However, by chance, I learned that the IDE team has been evaluating unofficial BOMs currently located on repository.liferay.com. These BOMs were generated as proof of concepts on what that BOM might include, and are referenced in some drafts of Liferay IDE tutorials. Since I now had an idea of what the IDE team itself felt a BOM should look like, I updated my web-based tool to use all of the collected metadata to dynamically generate BOMs for all indexed Liferay versions.

Install the Unofficial BOM

For sake of an example, assume that you want to install release-dxp-bom-7.1.10.pom.

The proof of concept for this version exists in the liferay-private-releases repository of repository.liferay.com, and Liferay employees can setup access to that repository to acquire that file. Since there are no fix packs, it is also functionally equivalent to the original 7.1.0 GA1 release, and you can use com.liferay.portal:release.portal.bom:7.1.0 instead.

However, if you wish to use a version for which a proof of concept has not been generated (or if you're a non-employee wanting to use an unofficial DXP BOM), you can try using Module Version Changes Since DXP Release to use the indexed metadata and generate a BOM for your Liferay version. If you go that route, open up the file in a text editor, and you should find something that looks like the following near the top of the file:

<groupId>com.liferay</groupId> <artifactId>release.dxp.bom</artifactId> <version>7.1.10</version> <packaging>pom</packaging>

With those values for the GROUP_ID, ARTIFACT_ID, VERSION, and PACKAGING, you would install the BOM to your local Maven repository by substituting in the appropriate values into the following mvn install:install-file command:

mvn install:install-file -Dfile=release.dxp.bom-7.1.10.pom -DgroupId="${GROUP_ID}" \ -DartifactId="${ARTIFACT_ID}" -Dversion="${VERSION}" \ -Dpackaging="${PACKAGING}"

And that's basically all you need to do when installing a BOM that's not available in any repository you can access.

Install Multiple BOMs

If you only have a handful of BOMs, you could repeat the process mentioned above for each of your BOMs. If you have a lot of BOMs to install (for example, you're a Liferay employee that might need to build against arbitrary releases, and you decided to use the earlier linked page and generate something for every Liferay fix pack), you may want to script it.

To keep things simple for pulling values out of an XML file, you should install the Python package yq, which provides you with the tool xq, which provides you with XML processing at the command-line. This tool is similar to the popular tool jq, which provides you with JSON processing at the command line.

pip install yq

Once yq is installed, you can add the following to a Bash script to install the auto-generated BOMs to your local Maven repository:

#!/bin/bash install_bom() { local GROUP_ID=$(cat ${1} | xq '.project.groupId' | cut -d'"' -f 2) local ARTIFACT_ID=$(cat ${1} | xq '.project.artifactId' | cut -d'"' -f 2) local VERSION=$(cat ${1} | xq '.project.version' | cut -d'"' -f 2) local PACKAGING=$(cat ${1} | xq '.project.packaging' | cut -d'"' -f 2) mvn install:install-file -Dfile=${1} -DgroupId="${GROUP_ID}" \ -DartifactId="${ARTIFACT_ID}" -Dversion="${VERSION}" \ -Dpackaging="${PACKAGING}" echo "Installed ${GROUP_ID}:${ARTIFACT_ID}:${VERSION}:${PACKAGING}" } for bom in *.pom; do install_bom ${bom} done Use BOMs in Blade Samples

In case you've never used a BOM before, I'll show you how you would use them if you were to build projects in the Blade Samples repository.

Reference the BOM in Maven

First, update the parent pom.xml so that child projects know which dependency versions are available by simply adding the BOM as a dependency.

<dependencyManagement> <dependencies> <dependency> <groupId>com.liferay.portal</groupId> <artifactId>release.dxp.bom</artifactId> <version>7.1.10</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement>

We set this to be an import scope dependency, so that we don't have to download all of Liferay's release artifacts just to have the version numbers (they will be downloaded as-needed when the specific artifacts are resolved).

Then, in order to have a child project use this version, simply update the pom.xml in the child project to not include the version explicitly for any of the dependencies.

<dependency> <groupId>com.liferay.portal</groupId> <artifactId>com.liferay.portal.kernel</artifactId> <scope>provided</scope> </dependency>

As noted in Introduction to Dependency Mechanism, the version specified in the parent POM will then be chosen as the dependency version. Running mvn package in the child project will then download the actual versions for the noted Liferay release.

Note: If this is the first time you've run any Maven commands in the liferay-blade-samples repository, you'll want to make sure that all of the parent projects are installed or the build will fail. If you are new to Maven and aren't sure how to read pom.xml files, this is achieved with the following steps:

  1. Run mvn -N install in liferay-blade-samples/maven
  2. Run mvn install in liferay-blade-samples/parent.bnd.bundle.plugin.
Reference the BOM in Gradle

Liferay workspace uses an older version of Gradle, and so BOMs aren't supported by default. To get support for BOMs, we'll first need to bring in io.spring.gradle:dependency-management-plugin:1.0.6.RELEASE.

The first step in doing this is to update the parent build.gradle so that Gradle knows where to find the plugin.

buildscript { dependencies { ... classpath group: "io.spring.gradle", name: "dependency-management-plugin", version: "1.0.6.RELEASE" } ... }

The next step in doing this is to update the parent build.gradle so that Gradle makes sure to apply the plugin to each child project.

subprojects { subproject -> ... apply plugin: "io.spring.dependency-management" ... }

Because we're installing the BOMs to our local Maven repository, the next step in doing this is to update the parent build.gradle so that Gradle knows to check that local repository. We can then also add the BOM to a dependencyManagement block.

subprojects { subproject -> ... repositories { mavenLocal() ... } dependencyManagement { imports { mavenBom "com.liferay.portal:release.dxp.bom:7.1.10" } } ... }

Then, in order to have a child project use this version, simply update the build.gradle in the child project to not include the version explicitly for any of the dependencies.

dependencies { compileOnly group: "com.liferay.portal", name: "com.liferay.portal.kernel" ... } Minhchau Dang 2018-10-17T20:15:00Z
Categories: CMS, ECM

Open Source Program Benefits Survey Results

Open Source Initiative - Wed, 10/17/2018 - 08:37

There are many organizations out there, from companies like Red Hat to internet scale giants like Google and Facebook that have established an open source programs office (OSPO). The TODO Group, a network of open source program managers, recently performed the first ever annual survey of corporate open source programs and revealed some interesting findings on the actual benefits of open source programs.

According to the survey, the top three benefits of managing an open source program are:

  • awareness of open source usage/dependencies
  • increased developer agility/speed
  • better and faster license compliance
Corporate Open Source Programs on the Rise

According to the survey, 53% of companies have an open source program or plan to establish one in the near future:

An interesting factoid to see is that large companies are about twice as likely to run an open source program than smaller companies (63 percent vs. 37 percent). Also, technology industry organizations were more likely to have an open source program than traditional industry verticals such as the financial services industry. Another interesting trend was that most open source programs tend to start informally as a working group, committee or a few key open source developers and then evolve into formal programs over time, typically within a company’s engineering department.

Research Shows Giving Back Is A Competitive Advantage

It’s important to note that companies aren’t forming open source programs and giving back to open source for purely altruistic means. Recent research from Harvard Business School shows that open source contributing companies capture up to 100% more productive value from open source than companies who do not contribute back. In particular, the example of Linux was used showcased in the research:

"It’s not necessarily that the firms that contribute are more productive on the whole. It’s that they get more in terms of productivity output from their usage of the Linux operating system than do companies that use Linux without contributing."

In the survey, it was notable that 44 percent of companies with open source programs contribute code upstream compared to only 6 percent for companies without an open source program. If you want to sustain open source and give your business a competitive advantage, an open source program can help.

Finally, you’ll be happy to learn that the survey results and questions are open sourced under the CC-BY-SA. The TODO Group plans to run this survey on an annual basis moving forward and in true open source fashion, we’d love your feedback on any new questions to ask, please leave your thoughts in the comments or on GitHub.

 About the Author: Chris Aniszczyk is currently a Vice President at OSI Affiliate Member Linux Foundation, focused on developer relations and running the Open Container Initiative (OCI) / Cloud Native Computing Foundation (CNCF).

Image credit: OSPO.png (CC BY-SA 4.0) by The Open Source Initiative is a derivitive of "Trollback + Company office.JPG" by Trollbackco from Wikimedia Commons, and used/edited with permission via CC BY-SA 4.0

Categories: Open Source

Why the cloud can save big data

SnapLogic - Tue, 10/16/2018 - 14:36

This article originally appeared on computable.nl. Many companies like to use big data to make better decisions, strengthen customer relationships, and increase efficiency within the company. They are confronted with a dizzying array of technologies – from open source projects to commercial software – that can help to get a better grip on the large[...] Read the full article here.

The post Why the cloud can save big data appeared first on SnapLogic.

Categories: ETL

We want to invite you to DEVCON 2018

Liferay - Tue, 10/16/2018 - 10:04

Every year we, the developers doing amazing things with Liferay's products, have this unique opportunity to meet, learn and enjoy those long technical discussions with each other. All that happens at DEVCON - our main developer conference which is taking place in Amsterdam from November 6th to 8th. This year's agenda is filled with sessions delivering in depth technical details about the products, presenting  new technologies and showcasing spectacular use cases.

Following the tradition, DEVCON starts with an un-conference, the day before the main conference. That is a full day consisting solely of the best parts of any traditional conference - the meetings and conversations at the halls between the talks. It's a day full of discussions with experts and colleagues on topics that attendees bring in.

We try to keep the prices for DEVCON at reasonable level and provide several kind of promotions for partners and organizations we have business relationships with. Yet there are talented developers in our community who are working alone or for non-profit organizations or in not so well developed parts of this world or for whatever reason can not afford a DEVCON ticket. This year we want to help some of them. 

We have free DEVCON tickets to give away!

As much as we would love to invite the whole community, we have to live by the market rules! So we only have limited number of tickets. To help us decide, please send an email to developer-relations@liferay.com with "Free DEVCON ticket" subject and tell us why you think you should be one of the people we give that free ticket to. We will decide between those that have the most convincing, creative and fun reasons.

See you in Amsterdam!

David Gómez 2018-10-16T15:04:00Z
Categories: CMS, ECM

Introducing Talend Data Catalog: Creating a Single Source of Trust

Talend - Tue, 10/16/2018 - 06:07

Talend Fall ’18 is here! We’ve released a big update to the Talend platform this time around including support for APIs, as well as new big data and serverless capabilities. You will see blogs from my colleagues to highlight those major new product and features introductions. On my side, I’ve been working passionately to introduce Talend Data Catalog, which I believe has the potential to change the way data is consumed and managed within our enterprise. Our goal with this launch is to help our customers deliver insight-ready data at scale so they can make better and faster decisions, all while spending less time looking for data or making decisions with incomplete data.

You Can’t Be Data Driven without a Data Catalog

Before we jump into features, let’s look at why you need a data catalog. Remember the early days of the Internet? Suddenly, it became so easy and cheap to create content and publish it to anyone that everybody actually did it. Soon enough, that created a data sprawl, and the challenge was not any more to create content but to find it. After two decades we know that winners in the web economy are those that created a single point of access to content in their category: Google, YouTube, Baidu, Amazon, Wikipedia.

Now, we are faced with a similar data sprawl in out data-driven economy. IDC research has found that today data professionals are spending 81% of their time searching, preparing, and protecting data with little time left to turn it into business outcomes. It has become crucial that organizations establish this same single source of access to their data to be in the winner’s circle.

Although technology can help to fix the issue, and I’ll come back on it later in the article, among these, enterprises need to set up a discipline to organize their data at scale, and this discipline is called data governance. But traditional data governance must be re-invented with this data sprawl:  according to Gartner, “through 2022, only 20% of organizations investing in information will succeed in scaling governance for digital business.” Given the sheer number of companies that are awash in data, that percentage is just too small.

Modern data governance is not only about minimizing data risks but also about maximizing data usage, which is why traditional authoritative data governance approaches are not enough. There is a need for a more agile, bottom-up approach. That strategy starts with the raw data, links it to its business context so that it becomes meaningful, takes control of its data quality and security, and fully organizes it for massive consumption.

Empowering this new discipline is the promise of data catalogs, leveraging modern technologies like smart semantics and machine learning to organize data at scale and turns data governance into a team sport by engaging anyone for social curation. 

With the newly introduced Talend Data Catalog, companies can organize their data at scale to make data accessible like never before and address challenges head-on. By empowering organizations to create a single source of trusted data, it’s a win for both the business with the ability to find the right data, as well as the CIO and CDO who can now control data better to improve data governance. Now let’s dive into some details on what the Talend Data Catalog is.

Intelligently discover your data

Data catalogs are a perfect fit for companies that modernized their data infrastructures with data lakes or cloud-based data warehouses, where thousands of raw data items can reside and can be accessed at scale. The catalog acts as the fish finder for that data lake, leveraging crawlers across different file systems, traditional, Hadoop, or cloud, and across typical file format. Then automatically extracts metadata and profiling information, for referencing, change management classification and accessibility.

Not only can it bring all of those metadata together in a single place, but it can also automatically draw the links between datasets and connect them to a business glossary. In a nutshell, this allows businesses to:

  • Automate the data inventory
  • Leverage smart semantics for auto-profiling, relationships discovery and classification
  • Document and drive usage now that the data has been enriched and becomes more meaningful

The goal of the data catalog is to unlock data from the application where they reside.

Orchestrate data curation

Once the metadata has been automatically harvested in a single place, data governance can be orchestrated in a much more efficient way. Talend Data Catalog allows businesses to define the critical data elements in its business glossary and assign data owners for those critical data elements. The data catalog then relates those critical data elements to the data points that refer it across the information system.

Now data is in control and data owners can make sure that their data is properly documented and protected. Comments, warnings, or validation can be crowdsourced from any business user for collaborative, bottom-up governance. Finally, the data catalog draws end-to-end data lineage and manages version control. It guarantees accuracy and provides a complete view of the information chain, which are both critical for data governance and data compliance.

Easy search-based access to trusted data

Talend Data Catalog makes it possible for businesses to locate, understand, use, and share their trusted data faster by searching and verifying data’s validity before sharing with peers. Its collaborative user experience enables anyone to contribute metadata or business glossary information.

Data governance is most often associated with control. A discipline that allows businesses to centrally collect data, process, and consume under certain rules and policies. The beauty of Talend Data Catalog is that not only does it control data but liberates it for consumption as well. This allows data professionals to find, understand, and share data ten times faster. Now data engineers, scientists, analysts, or even developers can spend their time on extracting value from those data sets rather than searching for them or recreating them – removing the risk of your data lake turning into a data swamp.

A recently published IDC report, “Data Intelligence Software for Data Governance,” advocates the benefits of modern data governance and positions the Data Catalog as the cornerstone of what they define as Data Intelligence Software. In the report, IDC calls it a “technology that supports enablement through governance is called data intelligence software and is delivered in metadata management, data lineage, data catalog, business glossary, data profiling, mastering, and stewardship software.”

For more information, check out the full capabilities of the Talend Data Catalog here.

The post Introducing Talend Data Catalog: Creating a Single Source of Trust appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Gradle Plugin for manage properties files

Liferay - Mon, 10/15/2018 - 13:01

GitHub repositoty: https://github.com/ironcero/devtools-liferay-portal-properties

Summary

This gradle plugin let you manage your properties files on your liferay workspace (version 7.0 and 7.1).

Blade tool initializes this kind of workpace with one folder named configs. There are some folder into configs folder:

  • common
  • dev
  • local
  • prod
  • uat

It's very common that you need to keep different values for the same properties depends on your environment. This plugin try to help you to manage this settup: copying all properties files from one common folder to each environment folder and replacing all properties found in filters file to the correct value.

How to use

First you will need the plugin jar file. You could download latest version from https://github.com/ironcero/devtools-liferay-portal-properties/blob/master/release/portal-properties-1.0.0.jar (Maven Central Version coming soon) or download source code from this github and to compile it. If you download jar file you will need move this to correct path in your local repository (gradle coordenates are devtools.liferay:portal-properties:1.0.0). Else if you download source code and compile it you will need to execute install maven task to install jar file on correct path in your local repository.

After jar file is fetched you will need to set up your liferay workspace. You will need to create two newely folder. You can create these folder in path you want but we recommend created into common folder (in configs folder).

Now you will need to set up this plugin in your build.gradle file. You will need add these line to build.gradle file:

buildscript { dependencies { classpath group: "devtools.liferay", name: "portal-properties", version: "1.0.0" } repositories { mavenLocal() maven { url "https://repository-cdn.liferay.com/nexus/content/groups/public" } } } apply plugin: "devtools-liferay-portal-properties" buildproperties { descFolderPath = 'configs' originFolderPath = 'configs/common/origin' keysFolderPath = 'configs/common/keys' } build.finalizedBy(buildproperties)

In this example we're going to use configs/common/origin folder to keep original properties file with pattern, and configs/common/keys folder to keep different values for properties. In details:

  • Dependencies: Gradle coordenates of DevTools Liferay Portal Properties is devtools.liferay:portal-properties:1.0.0.
  • Repositories: you will need mavenLocal repository because you've moved plugin jar file to your maven local repository.
  • Apply plugin: DevTools Liferay Portal Properties plugin id is devtools-liferay-portal-properties.
  • BuildProperties: In this section we will put all configuration parameters. In 1.0.0 release we have:
    • descFolderPath: Path where properties file will be copied and properties will be replaced.
    • originFolderPath: Location of original properties file (with ${} filter params).
    • keysFolderPath: Location of filter properties file.
  • build.finaluzedBy: With this command we can execute this plugin on build stage and not only on buildproperties.

It's time to add your properties files.

In the example we've created 4 filter file on keysFolderPath folder (configs/common/keys):

  • dev.properties
  • local.properties
  • prod.properties
  • uat.properties The content of these files are very similar (local.properties):
test1=Local

File name (without .properties extension) must be equals to environment folder on descFolderPath folder.

In the example we've created only one properties file on originFolderPath folder (configs/common/origin). But we'ld put more properties files and all of then would be copied and replaced. portal-ext.properties on configs/common/origin:

testKey=testValue test1Key=${test1}

Now you are be able to generated your portal-ext.properties filtered by environment with buildproperties gradle task, or standar build gradle task.

gradle buildproperties gradle build

This is a common log of process:

:buildproperties Build properties task...configs Settings: destination folder path: configs origin folder path: configs/common/origin keys folder path: configs/common/keys Parsing dev environment... Copying C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\common\origin\portal-ext.properties to C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\dev WARNING: Property not found in file portal-ext.properties on dev folder (${test1}) WARNING: Property not found in file portal-ext.properties on dev folder (${test2}) Parsing local environment... Copying C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\common\origin\portal-ext.properties to C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\local Parsing prod environment... Copying C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\common\origin\portal-ext.properties to C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\prod WARNING: Property not found in file portal-ext.properties on prod folder (${test1}) Parsing uat environment... Copying C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\common\origin\portal-ext.properties to C:\dev\workspaces\devtools\liferay\portal-properties-test\liferay-workspace\configs\uat WARNING: Property not found in file portal-ext.properties on uat folder (${test1}) BUILD SUCCESSFUL Total time: 0.275 secs

You will see WARNING log when you have some properties on your original properties files and you haven't filter for these properties on your filter properties files.

You could review Liferay Test project in https://github.com/ironcero/devtools-liferay-portal-properties/tree/master/testProject/liferay-workspace

Ignacio Roncero Bazarra 2018-10-15T18:01:00Z
Categories: CMS, ECM

Astrazeneca: Building the Data Platform of the Future

Talend - Mon, 10/15/2018 - 06:28

AstraZeneca plc is a global, science-led biopharmaceutical company that is the world’s seventh-largest pharmaceutical business, with operations in more than 100 countries. The company focuses on the discovery, development, and commercialization of prescription medicines, which are used by millions of patients worldwide.

It’s one of the few companies to span the entire lifecycle of a medicine; from research and development to manufacturing and supply, and the global commercialization of primary care and specialty care medicines.

Beginning in 2013, AstraZeneca was faced with industry disruption and competitive pressure. For business sustainability and growth, AstraZeneca needed to change their product and portfolio strategy.

As the starting point, they needed to transform their core IT and finance functions. Data is at the heart of these transformations. They had a number of IT-related challenges, including inflexible and non-scalable infrastructure; data silos and diverse data models and file sizes within the organization; a lack of enterprise data governance; and infrastructure over-provisioning for required performance.

The company had grown substantially, including through mergers and acquisitions, and had data dispersed throughout the organization in a variety of systems. Additionally, financial data volume fluctuates depending on where they are in the financial cycle and peaks at month-end, quarters or financial year end are common.

In addition to causing inconsistencies in reporting, silos of information prevented the company and its Science and Enabling Unit division from finding insights hiding in unconnected data sources.

For transforming their IT and finance function and accelerating financial reporting, AstraZeneca needed to put in place a modern architecture that could enable a single source of the truth. As part of its solution, AstraZeneca began a move to the cloud, specifically Amazon Web Services (AWS), where it could build a data lake to hold data from a range of source systems, The potential benefits of a cloud-based solution included increased innovation and accelerated time to market, lower costs, and simplified systems.

But the AWS data lake was only part of the answer. The company needed a way to capture the data, and that’s where solutions such as Talend Big Data and Talend Data Quality come into play. AstraZeneca selected Talend for its AWS connectivity, flexibility, and licensing model, and valued its ability to scale rapidly without incurring extra costs.

The Talend technologies are responsible for lifting, shifting, transforming, and delivering data into the cloud, extracting from multiple sources, and then pushing that data into Amazon S3.

Their IT and Business Transformation initiative was successful and has paved the way for Business Transformation initiatives across five business units and they are leveraging this modern data platform for driving new business opportunities.

Attend this session at Talend Connect UK 2018 to learn more about how AstraZeneca transformed its IT and finance functions by developing an event-driven, scalable data-platform to support massive month-end peak activity, leading to financial reporting in half the time and half the cost.

The post Astrazeneca: Building the Data Platform of the Future appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Elsevier: How to Gain Data Agility in the Cloud

Talend - Fri, 10/12/2018 - 12:50

Presenting at Talend Connect London 2018 is Reed Elsevier (part of RELX Group), a $7 billion data and analytics company with 31,000 employees, serving scientists, lawyers, doctors, and insurance companies among its many clients. The company helps scientists make discoveries, lawyers win cases, doctors save lives, insurance companies offer customers lower prices, and save taxpayers money by preventing fraud.

Standardizing business practices for successful growth

As the business grew over the years, different parts of the organization began buying and deploying integration tools, which created management challenges for central IT. It was a “shadow IT” situation, where individual business departments were implementing their own integrations with their own different tools.

With lack of standardization, integration was handled separately between different units, which made it more difficult for different components of the enterprise to share data. Central IT wanted to bring order to the process and deploy a system that was effective at meeting the company’s needs as well as scalable to keep pace with growth.

Moving to the cloud

One of the essential requirements was that any new solution be a cloud-based offering. Elsevier a few years ago became a “cloud first” company, mandating that any new IT services be delivered via the cloud and nothing be hosted on-premises. It also adopted agile methodologies and a continuous deployment approach, to become as nimble as possible when bringing new products or releases to market.

Elsevier selected Talend as a solution and began using it in 2016. Among the vital selection factors were platform flexibility, alignment with the company’s existing infrastructure, and its ability to generate Java code as output and support microservices and containers.

In their Talend Connect session, Delivering Agile integration platforms, Elsevier will discuss how it got up and running rapidly with Talend despite having a diverse development environment. And, how it’s using Talend, along with Amazon Web Services, to build a data platform for transforming raw data into insight at scale across the business. You’ll learn how Elsevier created a dynamic platform using containers, serverless data processing and continuous integration/continuous development to reach a level of agility and speed.

Agility is among the most significant benefits of their approach using Talend. Elsevier spins up servers as needed and enables groups to independently develop integrations on a common platform without central IT being a bottleneck. Since building the platform, internal demand has far surpassed the company’s expectations—as it is delivering cost savings and insight at a whole new level.

Attend this session to learn more about how you can transform your integration environment.

 

The post Elsevier: How to Gain Data Agility in the Cloud appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

Events online training session - Tuesday, October 16th

CiviCRM - Fri, 10/12/2018 - 09:41

Cividesk will be offering the intermediate level online training, "Organizing Successful Events" designed for current users familiar with CiviEvent on Tuesday, October 16th from 12 pm MT/ 1 pm CT/ 2 pm ET. 

Categories: CRM

Maintenance extension for PrestaShop 1.6

PrestaShop - Fri, 10/12/2018 - 05:12
It’s been a while since we’ve shared any updates concerning PrestaShop 1.6. To make up for it, here is the latest information to help you transition to PrestaShop 1.7.
Categories: E-commerce

IP Expo Europe 2018 recap: AI and machine learning on display

SnapLogic - Thu, 10/11/2018 - 17:06

Artificial Intelligence (AI) and machine learning (ML) have come to dominate business and IT discussions the world over. From boardrooms to conferences to media headlines, you can’t escape the buzz, the questions, the disruption. And for good reason – more than any other recent development, AI and ML are transformative, era-defining technologies that are fundamentally[...] Read the full article here.

The post IP Expo Europe 2018 recap: AI and machine learning on display appeared first on SnapLogic.

Categories: ETL

Bitwise: Cloud Data Warehouse Modernization – Inside Look at Talend Connect London

Talend - Thu, 10/11/2018 - 09:28

With expectations of business users evolving beyond limitations of traditional BI capabilities, we see a general thrust of organizations developing a cloud-based data strategy that enterprise users can leverage to build better analytics and make better business decisions. While this vision for cloud strategy is fairly straightforward, the journey of identifying and implementing the right technology stack that caters to BI and analytical requirements across the enterprise can create some stumbling blocks if not properly planned from the get-go.

As a data management consulting and services company, Bitwise helps organizations with their modernization efforts. Based on what we see at our customers when helping to consolidate legacy data integration tools to newer platforms, modernize data warehouse architectures or implement enterprise cloud strategy, Talend fits as a key component of a modern data approach that addresses top business drivers and delivers ROI for these efforts.

For this reason, we are very excited to co-present “Modernizing Your Data Warehouse” with Talend at Talend Connect UK in London. If you are exploring cloud as an option to overcome limitations you may be experiencing with your current data warehouse architecture, this session is for you. Our Talend partner is well equipped to address the many challenges with the conventional data warehouse (that will sound all too familiar to you) and walk through the options, innovations, and benefits for moving to cloud in a way that makes sense to the traditional user.

For our part, we aim to show “how” people are moving to cloud by sharing our experiences for building the right business case, identifying the right approach, and putting together the right strategy. Maybe you are considering whether Lift & Shift is the right approach, or if you should do it in one ‘big bang’ or iterate – we’ll share some practical know-how for making these determinations within your organization.

With so many tools and technologies available, how do you know which are the right fit for you? This is where vendor neutral assessment and business case development, as well as ROI assessment associated with the identified business case, becomes essential for getting the migration roadmap and architecture right from the start. We will highlight a real-world example for going from CIO vision to operationalizing cloud assets, with some lessons learned along the way.

Ultimately, our session is geared to help demonstrate that by modernizing your data warehouse in cloud, you not only get the benefits of speed, agility, flexibility, scalability, cost efficiency, etc. – but it puts you in a framework with inherent Data Governance, Self-Service and Machine Learning capabilities (no need to develop these from scratch on your own), which are the cutting-edge areas where you can show ROI for your business stakeholders…and become a data hero.

Bitwise, a Talend Gold Partner for consulting and services, is proud to be a Gold Sponsor of Talend Connect UK. Be sure to visit our booth to get a demo on how we convert ANY ETL (such as Ab Initio, OWB, Informatica, SSIS, DataStage, and PL/SQL) to Talend with maximum possible automation.

About the author:

Ankur Gupta

EVP Worldwide Sales & Marketing, Bitwise

https://www.linkedin.com/in/unamigo/

The post Bitwise: Cloud Data Warehouse Modernization – Inside Look at Talend Connect London appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL

New Talend APAC Cloud Data Infrastructure Now Available!

Talend - Wed, 10/10/2018 - 16:37

As businesses in the primary economic hubs in Asia such as Tokyo, Banglore, Sydney and Singapore are growing at a historical level, they are moving to the cloud like never before. For those companies, their first and foremost priority is to fully leverage the value of their data while meeting strict local data residency, governance, and privacy requirements. Therefore, keeping data in a cloud data center that’s on the other side of the globe simply won’t be enough.

That’s why Talend is launching a new cloud data infrastructure in Japan, in addition to its US data center and the EU data center across Frankfurt and Dublin, in a secure and highly scalable Amazon Web Services (AWS) environment, to allow APAC customers to get cloud data integration and data management services closer to where the data is stored. This is most beneficial to local enterprise businesses and foreign companies who have plans to open up offices in the local region.

There are several benefits Talend Cloud customers can expect from this launch.

Accelerating Enterprise Cloud Adoption

Whether your cloud-first strategy is about modernizing legacy IT infrastructure, leveraging a hybrid cloud architecture, or building a multiple cloud platform, Talend new APAC cloud data infrastructure will allow your transition to the cloud become more seamless. With a Talend Cloud instance independently available in APAC, companies can build a cloud data lake or a cloud data warehouse for faster, more scalable and more agile analytics with more ease.

More Robust Performance

For customers who are using Talend Cloud services in the Asia Pacific, this new cloud data infrastructure will lead to faster extract, transform and load time despite of the data volume. Additionally, it will boost performance for customers using AWS services such as Amazon EMR, Amazon Redshift, Amazon Aurora and Amazon DynamoDB.

Increased Data Security with Proximity

Maintaining data within the local region means the data do not have to make a long trip outside of the immediate area, which can reduce the risk of data security breaches at rest, in transit,  and in use and ease companies’ worries about security measures.

Reduced Compliance and Operational Risks

Because the new data infrastructure offers an instance of Talend Cloud that is deployed independently from the US or the EU, companies can maintain higher standards regarding their data stewards, data privacy, and operational best practices.

For Japan customers, they are likely to be better compliant with Japan’s stringent data privacy and security standards. In the case of industry and government regulation adjustments, Talend Cloud customers would still be able to maintain flexibility and agility to keep up with the changes.

If you are a Talend customer, you will soon have the opportunity to migrate your site to the new APAC data center. Log in or contact your account manager for more information.

Not a current Talend Cloud customers? Test drive Talend Cloud for 30 days free of charge or learn how Talend Cloud can help you connect your data from 900+ data sources to deliver big data cloud analytics instantly.

 

 

 

The post New Talend APAC Cloud Data Infrastructure Now Available! appeared first on Talend Real-Time Open Source Data Integration Software.

Categories: ETL
Syndicate content