Advertisement
Promo

Become a member of the ZDNet UK community

Alena Semeshko

View blog's RSS Feed

Data Integration Blog

In this blog you can find posts with useful links to, news on and analysis of things like data integration, mashups, data quality, data warehousing, application integration (EAI), data management…the list goes on.

Monday 13 October 2008, 12:21 PM

ETL of your own - wise or not?

Posted by Alena Semeshko

ETL - Extracting and reading data from the original source, Transforming it to suit your business, that is, cleansing and formatting it, and finally Loading(sending) it to your system/database/warehouse

So, ETL tools. Make one of your own or buy one from a trusted vendor? What's best for your company?

Before jumping into anything or rushing to a popular vendor, analyze the degree to which an ETL tool will benefit your company and wether it might be wiser to build one of your own.

Building a customized solution of your own implies hiring technical staff for that purpose, but you have a greater chance of your final solution to be simple and match just your business needs.

Purchasing an ETL tool you are more limited by the market offerings in terms of customization. To add to that, you are likely to face the complexity of educating your in-house staff, which might take a load of your company's time and resources. Regardless of this, however, most vendors still resort to purchasing ready-made solutions. How come? Well, try asking yourself the following questions and you might actually come to the same decision.

1. What are your goals, why do you need an ETL solution? Is this a one-time procedure with limited conditions or are you planning to make it a part of your organization's structure and strategy? What do you want from your ETL tool? Specify priorities.
2. How much can you spend on your solution? How much do you want to spend?
3. How many data sources are you working with, and what kind are they? What functionality do they already have that might be helpful at the extraction/transformation.
4. How much time can you painlessly allow for the transformation process?, for your entire ETL process?
5. How much human resources and time can you dedicate to this project? (don't forget about education)
6. If you decide to build your own solution, who is going to educate your staff? Are you competent enough in the process (etl, warehousing)?
7. Just how many ETL experts do you have in your company and how do you estimate their potential and skill? Are they replaceable?

Just these 7 for starters...

Friday 10 October 2008, 12:05 PM

What Companies Lack in BI

Posted by Alena Semeshko

As much as companies are talking of committing to Business Intelligence principles in their daily work, the concept of BI still seems too utopian and vague to be successfully implemented throughout an enterprise. It's probably not that the definition is vague, it's that the practical side will differ a bit depending on companies' needs.

The one thing that is more or less universal and requires the utmost attention in all cases is data quality.

Whether your data is already 'dirty' and needs to be reviewed on regular basis, or whether there is no systematic process for checking it within your company, sooner or later you realize that something about your data needs to be fixed. Some companies prefer to conduct regular automatic check ups, others choose to apply filtering techniques before the information even enters internal databases, one way or another, enough solutions already exist to help you make that first step into the world of BI and make it right.

One of BEYE bloggers recently posted his list of the top ten things BI lacks. Aside from data quality he singles out such foundational aspects of BI as the problem of structured and unstructured data, valuation techniques, Predictive Analytics / Data Mining, technology limitations, simulations, on demand analytics, etc. The the list will vary slightly from one company to another, but making one and working towards perfecting your Business Intelligence strategy through it is certailny helpful. No one can tell you how rewarding it is, you can only feel it for yourself while gradually putting "taken care of" or "implemented" next to each item from the list.

Monday 6 October 2008, 12:57 PM

Data Quality - Upstream or Downstream?

Posted by Alena Semeshko

I keep wondering how come data quality check still exists as a procedure performed once in a while, rather than as a part of the front-end process? How come most companies start worrying about the quality of your data only when it's already dirty and in use? How come it doesn't occur to them that the quality of data needs to be thought through before it’s actually captured? Even at the early stages of data capturing, data quality aleady plays an important role in the future of the company. It is the early stages that make a difference in how your data turns out and if it will pay off later on.

A recent Forrester paper titled It's Time To Invest In Upstream Data Quality suggests that when companies realize short-term data cleanup ROI immediately, it's hard to justify front-end investments that may take years.

At the same time, Forrester says, IT budget planning committees tend to avoid the existing data quality (DQ) products that allow integrating downstream data hygiene rules into front-end processes, justifying this by solutions' cost and complexity.

The result? I&KM pros quickly reach diminishing return on data quality investments, requiring even more investments later on to catch up with missed opportunities like verifying customer contact information, standardizing product data, and eliminating duplicate records.

The paper explores how to break this cycle and identify the optimal DQ solution downstream and audit source systems that cause the most significant data issues upstream.

Tuesday 30 September 2008, 2:41 PM

Data Federation vs. Data Integration?

Posted by Alena Semeshko

Data federation and data integration. What's the difference between the two?

I understand data federation as something that joins data from different sources distributed around the company without actually moving it from the original source. That is to say, data federation software creates a single repository that doesn't contain the data itself, rather its metadata (information about the actual data/its location). This technology allows users to have a single standardized view of data displayed in a single data layer without having to deal with the variety of original data sources.

James Kobielus in his ZDNet blog explores the core difference between the enterprise data warehousing (EDW) and data federation.

Data federation generally seems outdated, compared to data warehousing, which at first looks like a more reasonable approach:

Federated environments are not optimized for heavy-hitting data matching, merging, transformation and cleansing, all of which are essential functions to deliver a “single version of the truth” for business intelligence (BI).

However, James also lists the benefits data federation may deliver in the company's overall Business Intelligence strategy:

Data federation is an umbrella term for a wide range of operational BI topologies that provide decentralized, on-demand alternatives to the centralized, batch-oriented architectures characteristic of traditional EDW environments.

In the real world, dara federation and EDW are not that mutually exclusive, and may very well target different markets, as data federation is better suited to near-real-time BI requirements than the batch-oriented EDWs deployed in many organizations.

So, where does data integration fit in the picture? Certain aspects of data integration intersect with both of the technologies discussed above. On the one hand, data integration may very well involve copying and moving data around, which is contrary to the definition of data federation, yet fits very well into the concept of data warehousing. On the other hand, data federation is in many aspects only a single instance of data integration in that the metadata it uses can be employed in the integration processes.


Next

Previous

1 2


Alena Semeshko
  • Alena Semeshko
  • Sales / Marketing
  • Member since: July 2008

Site Activity Rating 3

Contacts

Number of Contacts: 3

Contacts' Latest Discussions

Number of Tracked Discussions: 1,044

ator1940 ator1940

A different polish.

Monday 9 November 2009, 2:27 PM

3 comments
ator1940 ator1940

"polished Moblin"

Monday 9 November 2009, 1:32 PM

3 comments
ator1940 ator1940

Did not say it was.

Friday 6 November 2009, 2:13 PM

15 comments

Contacts' Latest Blogs

Number of Contacts Blogs: 2

Avatar ator1940

Open Virtual Desktop

Friday 21 November 2008, 4:19 AM

2 comments

Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters