LeapFrogBI Blog | Agile SQL Server Data Warehousing Platform
Free Trial



Create a Copy of an Existing Component

by Paul Felix December 10, 2015

Components are typically unique and apply only to a particular unit of work. The component's source contains a unique set of outputs and the component exposes a transformed set of output. However, there are some cases where a component can be applied to numerous data flows with only minor differences. Consider a data warehousing effort that consolidates many sources which contain the same schema. We may want to perform the same transformations on each source. In this use case the ability to copy a component and apply all of the component's properties to a new data flow is a time saver. The new "Create Copy" feature does exactly that. Here's how it works.

 

We will make a copy of the highlighted transformation component in the following example.

 

Navigate to the detail page of the component that you wish to copy and click the "Create Copy" link that is above the field list.

 

Provide the name of the new component & click the copy button.

 

The new component will be created and you will be redirected to the new component's detail page.

 

Notice that the new component uses the same source as the old component. The new component also contains all of the old component's expressions, fields, filters, statistics, and other properties. The new component will not contain any of the dependents that the old component contained.

If you want to use the new component in a new data flow, then use the "Redirect Source" feature which will allow you to map the component's required inputs to a selected source component.

Enjoy!

 

Have you ever fished in a Data Lake?

by Paul Felix November 04, 2015

Why build a data warehouse when you can have _____. Fill in the blank with the latest marketing epiphany.

What is a data lake? Here's how Gartner defines it. "…enterprise-wide data management platforms for analyzing disparate sources of data in its native format…" (http://www.gartner.com/newsroom/id/2809117) Basically, a data lake is a collection of data from multiple sources that is loaded into a central repository in its native format.

A data lake would be an ideal repository for data scientists to use when attempting to train models, uncover trends, and such. A data lake is not a good solution for organizations seeking to track promotion performance, identify the most valuable customers, track performance against enterprise goals, etc. Don't trust me on this one. Connect to your organization's ERP system and start digging through the hundreds or thousands of tables. Then try to combine this data with the ERP systems from the last three mergers. The problem gets extremely complex to say the least.

In other words, while a data lake and a data warehouse both contain data their use cases differ greatly. One does not replace the other, and both are very useful. The term data lake may be new, but the concept is quite old. Often data warehousing includes an initial staging point which could be considered a data lake. This data is not conformed, not integrated, and typically not trimmed. It is raw data that will be useful to skilled data scientists. It would be a very bad idea to unleash a data lake on the typical business analyst. The analyst will spend countless hours attempting to conform data and create data structures that accurately represent operations.

Data lakes are a very valuable resource. They can be fast to create, very flexible, and can support many kinds of analysis. Data lakes are not data warehouses. That is not to say that a data lake cannot create an accurate EBITDA by business unit, but to do so you will need to first integrate & consolidate the source data which would essentially be equivalent to a data warehouse.

Early Arriving Facts / Late Arriving Dimensions

by Paul Felix June 08, 2015

A resilient ETL process deals with data quality issues without causing a process failure while also meeting business requirements. One issue that should be anticipated is the early arriving fact (aka late arriving dimension) situation. As the name implies an early arriving fact is a record that is bound for a fact table which references a dimension record that does not yet exist. In other words, the natural key in the fact record has not yet been loaded in a related dimension preventing a successful foreign key lookup to the dimension's surrogate key.

How can this happen?

There are many situations that lead to failed fact foreign key lookups. In some cases a data source may simply be unconstrained such as simple flat file or Excel workbooks. Another common situation involves multiple applications that are not always in sync. Yet another situation involves large scale applications that may be replicated across an enterprise always leaving a short time period which may lead to inconsistencies.

How can we handle late arriving dimensions?

The answer depends on both business requirements and the nature of the data. The following schematic describes our options.

 

 

 

There are four three main categories of options. Note that in all cases we need to test the foreign key lookup to determine if a failed lookup is going to occur when loading a fact record. The strategy used to do so depends on the treatment of such failures.

Never process – If the fact record to be loaded is of no value without a reference to the late arriving dimension, and there is no expectation that the late arriving dimension will ever show up, then there is no reason to load the fact. Simply omit the record from the fact load. This is rarely the case.

Queue & Retry – If the fact records to be loaded is of no value without a reference to the late arriving dimension, and the late arriving dimension is expected to show up eventually, then we can queue the records until such time the dimension records is loaded. The queued record must be flagged such that it is retried at a future time and loaded once a related dimension record is loaded. It is typically a good idea to create queued record report to monitor this process.

Unknown Member – If the fact record does have value without a reference to a related late arriving dimension, and we either do not expect or do not care if a dimension record shows up at a later time, then we can simply load the fact record with a foreign key reference to the unknown records in the related dimension; typically -1 surrogate key value. Note that we must pay particular attention to the fact grain if we are going to take the unknown member approach to failed foreign key lookups. Most fact table primary keys are set as a composite across foreign keys and degenerate dimensions. If allowing a foreign key to be set to -1 causes a possible primary key violation, then we need to look for other options such as adding a degenerate dimension.

Inferred Member – If the fact records has value without a reference to a related late arriving dimension, and we expect the late arriving dimension to show up at a later time, then we can use the natural key in the record bound for the fact table to seed a record (inferred member) in the related dimensions. Once the dimension record shows up, it will have the same natural key and the dimension record's attributes can then be updated. Particular attention should be paid to the inferred member's record time in cases where dimension history is tracked such as SCD 2 situations.

When determining the proper treatment of early arriving facts, first consider the value of the record in the case that a foreign key cannot be set to the related dimension. If there is value in having the fact loaded regardless of the available dimension record, then either an inferred member or unknown member approach is appropriate. If there is no value in loading the fact without a related dimension record, then the record should either be queued until a related dimension record is loaded or ignored indefinitely.

The LeapFrogBI platform makes implementing all late arriving fact strategies a cinch. Queues & ignored records can be quickly setup using filters within standard design patterns. LeapFrogBI sets failed foreign key lookups to the related dimension unknown member surrogate by default; no development required. Inferred members are easily handled using standard design patterns that reuse the fact table source records to first load a dimension. Contact us to learn about the LeapFrogBI platform and discuss late arriving dimension treatment options.

New Feature - Precedence Graphs

by Paul Felix March 30, 2015

The LeapFrogBI precedence solution automates component execution while respecting both functional and resource constraints.  Users can now view a visual representation of functional constraints via the Precedence Graph option. 

In precedence settings, click the "Precedence Graph" button.  A full screen graph will be displayed representing current precedence settings. 

Setup Steps:  None, as usual SaaS rules!

Note:  Large projects will result in large precedence graphs.  Even with a full screen rendering the graph can consume numerous screen size pages. 


Tags:

News

Easy Access To Recent History

by Paul Felix March 19, 2015

At times it is useful to retrace your navigation steps within a project.  Even better, it would be great if you could jump to the page you were viewing 8 pages ago, for example.  The new "History" menu option enable this type of quick access to the pages you have viewed recently. 

Setup Steps:  Do nothing.  Software as a Service is awesome!



Notes:  History is maintained within the currently selected project, and will be cleared when the current project selection is changed.  Also, the recent history list will persist regardless of the target objects begin deleted.  This is by design & provides an uninterrupted breadcrumb trail.  Of course, if you navigate to a deleted object, then you will get the "Oops" page.  Finally, the history list maintains a distinct list of the past 15 pages visited in chronological order.  If you visit the same page repeatedly, it will be listed only once keeping your history list as full as possible.




Tags:

News

New Expression Type - LTrim(RTrim(...))

by Paul Felix February 15, 2015

Trim both sides of a string in one step using the new Left Trim & Right Trim combined expression type.

https://leapfrogbi.zendesk.com/hc/en-us/articles/203603629 

Tags:

News

More New Features - Custom Connection Attributes & Faster Build Access

by Paul Felix February 10, 2015

Custom Connection Attributes

This new feature adds a connection attributes dialog to connection string definitions enabling users to define custom attributes at design time.

https://leapfrogbi.zendesk.com/hc/en-us/articles/204250845-Custom-Connection-Attributes


Faster Access to Generated Code

Users can now view generated code, download a build (save or execute), view build status, and refresh generated code (build) directly from the component detail page.  Can it get any easier?

https://leapfrogbi.zendesk.com/hc/en-us/articles/203545409


Tags:

News

New Expression Type - Custom Script

by Paul Felix January 14, 2015

Have you ever wanted to just type some t-sql instead of using the fully validated LeapFrogBI expression builder?  Well, you are in luck.  You can now do just that!  Check out the following link for details. 


Announcement - Custom Expressions

Quick Tip - Data Flow Queue

by Paul Felix January 02, 2015

Have you ever needed to prevent a record from moving forward in a data flow until a condition is met?  A typical example of such a scenario is a fact load which should include only records that have related dimensional records.  Queues are one way to handle late arriving dimension situations.  Fact records are queued and retried on each subsequent load process.  When the condition is met the fact is loaded.  This quick video tip shows one way to setup queues using LeapFrogBI in just a few minutes.



Quick Tip - Batch Processing

by Paul Felix December 05, 2014

If you find that your data quantity exceeds the ability of your hardware resources, then you will need to find a way to break large jobs into smaller pieces.  Batch processing does exactly that.  Process x number of records per batch and keep rerunning the processes until all records are processed.  This short video demonstrates one way to batch process using LeapFrogBI.






Paul B. Felix
Paul is an agile business intelligence & data warehousing junkie living in Austin, Texas. He has spent his career helping organizations use data to make better decisions.


Twitter icon LeapFrogBI Newsletter YouTube icon
LeapFrogBI Newsletter  
LeapFrogBI â„¢, LFBI â„¢, and IKIDAF â„¢ are trademarks owned by LeapFrogBI LLC | © 2012 LeapFrogBI LLC All Rights Reserved | Privacy Policy  | Terms of Use