Loading ...
Techceler...
Business & Economics
Integration
97
0
Try Now
Log In
Pricing
<p>April 2017, IDC #US41715317 TAXONOMY IDC's Worldwide Data Integration and Integrity Software Taxonomy, 2017 Stewart Bond IDC'S WORLDWIDE DATA INTEGRATION AND INTEGRITY SOFTWARE TAXONOMY FIGURE 1 Data Integration and Integrity Software Primary Segments Source: IDC, 2017 ©2017 IDC #US41715317 2 DATA INTEGRATION AND INTEGRITY SOFTWARE TAXONOMY CHANGES FOR 2017 Three primary changes have occurred since the last published taxonomy for the data integration and access (DIA) software functional market (see IDC's Worldwide Data Integration and Access Software Taxonomy, 2015, IDC #255856, June 2015): Name change from data integration and access software to data integration and integrity software Consolidation of the general data quality and domain-based matching and cleansing market segments into one data quality software segment Addition of the self-service data preparation software segment Table 1 maps these changes as they were made in 2016, and how they relate to the 2015 market definition. TABLE 1 Summary of Data Integration and Integrity Software Functional Market Changes for 2016 2015 Market 2016 Market Comments Application development and deployment Data integration and access software Data integration and integrity software Renamed and expanded the scope of this market (Self-service data preparation is added as a submarket. Overall market name changed [to data integration and integrity] to emphasize the growing importance of data quality.) Data integration and integrity software General data quality segment consolidated with domain-based matching and cleansing segment Data quality segment Market consolidation – software vendors with general data quality software are investing in domain-based matching and cleansing and vice versa Self-service data preparation segment New segment describing data access and preparation tools suitable for business users Source: IDC, 2017 Data is core to digital transformation, and data without integrity will not be able to support digital transformation initiatives. Data with integrity is data that is trusted, available, secure, and compliant across the enterprise IT environment, regardless of whether data is persisted on-premises or in the cloud. Data integration software vendors have the tools, intellectual property, and products that can improve the integrity of data. Integrity has become the secondary focus of solutions in this market, and for this reason, IDC changed the name of the software market from data integration and access to data integration and integrity (DII). ©2017 IDC #US41715317 3 Until 2016, IDC tracked data quality software spend within the data integration and access software functional market as two subsegments: general data quality, and domain-based matching and cleansing software. These two segments are consolidating as software vendors in the general data quality segment add domain-based matching and cleansing software, and conversely, domain-based matching and cleansing software vendors are adding general data quality software to their portfolio. While in some cases the products continue to be separate, it is becoming increasingly difficult to track spend as the lines are blurring. As the market consolidates, so must the segments that IDC is tracking. A lot happened on the way to big data insights, including a need to put data preparation into the hands of business users. 2015 saw a surge of activity in a self-service data preparation software category, and therefore, IDC started tracking spend in this category in 2016. In many cases, the purpose of software in this segment is to prepare data for analysis, and because functionality is similar to extract, transform, and load (ETL), IDC decided to record and track revenue within the DII software market instead of the business analytics market. TAXONOMY OVERVIEW Data integration and integrity (DII) software enables the access, blending, movement, and integrity of data among multiple data sources. The purpose of data integration and integrity software is to ensure the consistency of information where there is a logical overlap of the information contents of two or more discrete systems. Data integration is increasingly being used to capture, prepare, and curate data for analytics. It is also the conduit through which new data types, structures, and content transformation can occur in modern IT environments that are inclusive of relational, nonrelational, and semistructured data repositories. DII software employs a wide range of technologies, including, but not limited to, extract, transform, and load; change data capture (CDC); federated access; format and semantic mediation; data quality and profiling; and associated metadata management. Data access is increasingly becoming open with standards-based APIs; however, there is still a segment of this market focused on data connectivity software, which includes data connectors and connectivity drivers for legacy technologies and orchestration of API calls. DII software may be used in a wide variety of functions. The most common is that of building and maintaining a data warehouse, but other uses include enterprise information integration, data migration, database consolidation, master data management (MDM), reference data management, metadata management, and database synchronization, to name a few. Data integration may be deployed and executed as batch processes, typical for data warehouses, or in near-real-time modes for data synchronization or dedicated operational data stores. A data migration can also be considered a one-time data integration. Although most applications of DII software are centered on structured data in databases, they may also include integrating data from disparate, distributed unstructured or semistructured data sources, including flat files, XML files, JSON, and legacy applications, big data persistence technologies, and the proprietary data sources associated with commercial applications from vendors such as SAP and Oracle. More recently, an interactive form of data integration has emerged in the application of self-service data preparation. The data integration and integrity software market includes the submarkets as illustrated in Figure 2 and discussed in the Definitions section that follows. ©2017 IDC #US41715317 4 FIGURE 2 IDC Data Integration and Integrity Software Taxonomy Model Source: IDC, 2017 DEFINITIONS Bulk Data Movement Software This software, commonly referred to as extract, transform, and load software, selectively draws data from source databases, transforms it into a common format, merges it according to rules governing possible collisions, and loads it into a target. A derivative of this process, extract, load, and transform (ELT), can be considered synonymous with ETL for the purpose of market definition and analysis. ELT is an alternative commonly used by database software vendors to leverage the inherent data processing performance of database platforms. ETL and/or ELT software normally runs in batch but may also be invoked dynamically by command. It could be characterized as providing the ability to move many things (data records) in one process. ©2017 IDC #US41715317 5 The following are representative companies in this submarket: IBM (InfoSphere DataStage) Informatica (PowerCenter) Oracle (Oracle Data Integrator) SAP (SAP Data Integrator, SAP Data Services, HANA Smart Data Integration) SAS (SAS Data Management) Talend (Open Studio for Data Integration) Dynamic Data Movement Software Vendors often refer to this functionality as providing "real time" data integration; however, it can only ever be "near" real time because of the latency inherent in sensing and responding to a data change event occurring dynamically inside a database. Data changes are either captured through a stored procedure associated with a data table or field trigger or a change data capture (CDC) facility that is either log or agent based. These solutions actively move the data associated with the change among correspondent databases, driven by metadata that defines interrelationships among the data managed by those databases. The software is able to perform transformation and routing of the data, inserting or updating the target, depending on the rules associated with the event that triggered the change. It normally either features a runtime environment or operates by generating the program code that does the extracting, transforming, and routing of the data and updating of the target. It could be characterized as software that moves one thing (data record or change), many times. The following are representative companies in this segment: IBM (InfoSphere Data Replication) Informatica (PowerCenter Real Time Edition) Oracle (Oracle GoldenGate) SAP (SAP Replication Server, SAP Data Services) Data Quality Software This submarket includes products used to identify errors or inconsistencies in data, normalize data formats, infer update rules from changes in data, validate against data catalog and schema definitions, and match data entries with known values. Data quality activities are normally associated with data integration tasks such as match/merge and federated joins but may also be used to monitor the quality of data in the database, either in near real time or on a scheduled basis. The data quality software submarket also includes purpose-built software that can interpret, deduplicate, and correct data in specific domains such as customers, locations, and products. Vendors with domain-specific capabilities typically offer products that manage mailing lists and feed data into customer relationship management and marketing systems. The following are representative companies in this submarket: Experian Data Quality IBM (InfoSphere Discovery, BigInsights BigQuality) Informatica (Informatica Data Quality and Data-as-a-Service) ©2017 IDC #US41715317 6 Melissa Data Pitney Bowes (Code-1 Plus and Finalist) SAP (SAP Data Services, SAP HANA Smart Data Quality) SAS Data Quality Talend (Talend Open Studio for Data Quality) Syncsort (acquired Trillium Software) Data Access Infrastructure This software is used to establish connections between users or applications and data sources without requiring a direct connection to the data source's API or hard-coded database interface. It includes cross-platform connectivity software. It also includes ODBC and JDBC drivers and application and database adapters. The following are representative companies in this submarket: Micro Focus (Attachmate Databridge) Information Builders (iWay Integration Solutions) Progress Software (DataDirect Connect and DataDirect Cloud) Rocket Software (Rocket Data) Composite Data Framework This market segment includes data federation and data virtualization software, enabling multiple clients (usually applications, application components, or databases running on the same or different servers in a network) to share data in a way that reduces latency. Usually involving a cache, this software sometimes also provides transaction support, including maintaining a recovery log. Data federation permits the access to multiple databases as if they were one database. Most are read only, but some provide update capabilities. Data virtualization products are similar but offer full schema management coordinated with the source database schemas to create a complete database environment that sits atop multiple disparate physical databases. Originally created to provide the data services layer of a service-oriented architecture (SOA), data virtualization has been applied to multiple use cases including data abstraction, rapid prototyping of data marts and reports, and providing abilities for data distribution without replication. The following are representative companies in this submarket: Cisco Data Virtualization Platform Denodo IBM (InfoSphere Federation Server) Informatica Data Virtualization Red Hat (JBoss Data Virtualization) SAS (SAS Federation Server) Master Data Definition and Control Software The master data definition and control software market segment includes products that help organizations define and maintain master data, which is of significance to the enterprise and multiple systems. Master data is usually categorized by one of four domains: party (people, organizations), ©2017 IDC #US41715317 7 product (including services), financial assets (accounts, investments), and location (physical property assets). Master data definition and control software manages metadata regarding entity relationships, attributes, hierarchies, and processing rules to ensure the integrity of master data for where it is to be used. Key functionalities includes master data modeling, import and export, and the definition and configuration of rules for master data handling such as versioning, access, synchronization, and reconciliation. This market segment also includes reference data management software. Reference data is used to categorize or classify other data and is typically limited to a finite set of allowable values aligned with a value domain. Domains can be public such as time zones, countries and subdivisions, currencies, units of measurement, or proprietary codes used within an organization. Domains can also be industry specific such as medical codes, SWIFT BIC codes, and ACORD ICD codes. As with master data, reference data is referenced in multiple places within business systems, and in some cases, can deviate across disparate systems. Reference data management software could be considered a subset of master data management (MDM) software and, in many cases, may share the same code base as master data management software sold by the same vendor. Master data definition and control products can include operational orchestration facilities to coordinate master data management processes across multiple information sources in either a batch, a service, or an event-based context. Master data definition and control software provides capabilities to facilitate single or multiple master data entity domain definition and processing and usually serves as the core technical component to a broader master data management solution (a competitive market in the IDC taxonomy). Representative vendors and products include: IBM (IBM InfoSphere Master Data Management) Informatica (Multi-Domain MDM, PIM) Oracle (Oracle Product Hub, Oracle Customer Hub, Oracle Site Hub, Oracle Higher Education Constituent Hub, and Oracle Data Relationship Management) SAP (SAP Master Data Governance) TIBCO (Master Data Management Platform) Metadata Management Software Metadata is a data about data. It is commonly associated with unstructured data, as a structure in which to describe the content. However, it is increasingly being used in the structured data world, as the data becomes more complex in its definition, usage, and distribution across an enterprise. At a basic level, metadata includes definitions of the data, when, how, and by whom the data was created and last modified. More advanced metadata adds context to the data, traces lineage of the data, cross- references where and how the data is used, and improves interoperability of the data. Metadata has become critical in highly regulated industries as a useful source of compliance information. The metadata management submarket has grown out of the database management, data integration, and data modeling markets. Metadata management solutions provide the functionality to define metadata schema, automated and/or manual population of the metadata values associated with structured data, and basic analytic capabilities for quick reporting. These solutions also offer ©2017 IDC #US41715317 8 application program interfaces (APIs) for programmatic access to metadata for data integration and preparation for analytics. Representative vendors and products include: ASG (Enterprise Data Intelligence) IBM (InfoSphere Information Governance Catalog) Informatica (Business Glossary, Enterprise Information Catalog, Live Data Map) Oracle (Enterprise Metadata Management) SAP (SAP Information Steward Metadata Management and Metapedia) Self-Service Data Preparation Software Self-service data preparation is an emerging segment in the DII software functional market. Demand is coming from today's tech-savvy business users wanting access to their data in increasingly complex environments, without IT getting in the way. IT has also been looking for technology that can put data preparation capabilities into the hands of business users, where the requirements and desires of analytic outcomes are best understood. These solutions are targeted at business analysts, but some offer a range of user interfaces by persona. Some require deeper levels of data analytics knowledge, and others require deeper levels of data integration knowledge. Functionality ranges from cataloging data sets within repositories, providing interactive capabilities for cleansing and standardizing data sets, joining disparate data sets, performing calculations, and varying degrees of analytics for validation prior to visualization. Software in this segment represents vendors that explicitly sell self-service data preparation components. Business intelligence and analytics vendors that offer self-service data preparation capability within a suite as part of the analytics process they support are excluded from this segment. Representative vendors and products include: Alteryx (Designer) Datawatch (Monarch) Paxata SAP (Agile Data Preparation) Talend (Data Preparation) Trifacta (Wrangler) UniFi RELATED MARKETS Table 2 provides data integration and integrity software related markets. ©2017 IDC #US41715317 9 TABLE 2 Data Integration and Integrity Software Related Markets Competitive Market Relationship Master data management (MDM) software The master data definition and control segment of the DII market, combined with portions of data access, movement, quality and metadata segments make up much of the MDM platform segment of the MDM competitive market. Business analytics software A portion of the data integration and integrity software market is used to derive the business analytics market, specifically within the data warehouse generation segment. Source: IDC, 2017 LEARN MORE Appendix: Patterns of Use A pattern of use is a commonly observed assembly of software components, meeting the needs of commonly observed business and technical requirements, within specific organizational and technical constraints. The DII software market segments are aligned with data integration patterns of use. Some patterns that leverage components from across the segments, and conversely some patterns support solutions in other segments and competitive markets (e.g., master data management). Vendors could consider patterns in the packaging and licensing of software products. Vendors are in a unique position to help identify new patterns that are emerging in the install base and respond with new products, packaging, and licensing options to gain a competitive advantage. The patterns discussed in the sections that follow are typical of many (but not all) data integration solution implementations, applied to meet business requirements within technical and organizational constraints. They are presented in a way that vendors could help customers identify the pattern and justify motivation for, and implications of, using the pattern. Scheduled Bulk Data Integration Scheduled bulk data integration is used to move large sets of data from a source to a destination on a frequent schedule. This pattern primarily leverages software in the bulk data movement segment of the market, namely ETL or the derivative ELT. Figure 3 illustrates the two types of implementation within this pattern. ©2017 IDC #US41715317 10 FIGURE 3 Scheduled Bulk Data Integration Patterns Source: IDC, 2017 The extract process happens on a scheduled basis, depending on the data latency requirements in the target. This pattern is often used to populate data warehouses, data marts, or operational data stores (ODSs) on a regularly scheduled basis. It can also be used as a one-time data migration. This pattern is efficient for processing large data sets, as bulk data movement technology is built to handle large volumes of data. Some data transformation requirements can only be achieved with ETL/ELT, such as match/merge between multiple source data sets, complex de-normalization, and summarizing and sorting of transactional data. Use of this pattern introduces latency between the source and target and is constrained by the physical data stores and processing capabilities of the underlying infrastructure. Although the bulk data movement market segment primarily serves this pattern, vendors could bundle components from the data access segment into solutions for this pattern. Vendors could also consider metadata management add-on components for traceability of data lineage through the transformation process, and data quality components for cleansing the data. Near-Real-Time Data Integration Near-real-time data integration is used to move smaller sets of data more frequently. This pattern primarily leverages components in the dynamic data movement market segment. It is most often used when changes in source data need to be captured and distributed to other systems. Figure 4 illustrates the pattern and the multiple methods used to capture changes. ©2017 IDC #US41715317 11 FIGURE 4 Near-Real-Time Data Integration Pattern Source: IDC, 2017 Changes in source data can be captured using a stored procedure or database trigger. The trigger may populate a change log inside the source database or send to an external persistence service. CDC is a technology that runs outside of the source database and monitors database log files for changes. Once a change is recognized, "before" and "after" images of the data in scope are sent to a persistence service. A micro-batch is another alternative for picking up changes. These run every few minutes and pull only the data that has changed since the last time the micro-batch was executed. It may pull data out of a change log or directly out of source tables. The persistence service could be a message queue, file system, external data store, or other types of containers. Mediation services pick up the changed data from the persistence service and perform the necessary transformation, formatting, and distribution to the target system(s) that need a copy of the change. This pattern is efficient for processing data changes when latency between source and target needs to be minimized. Change data capture is an efficient, noninvasive method for recognizing changes in application databases when API-level access is not available, such as packaged applications with proprietary APIs, or legacy systems that are costly to modify. Database triggers, change tables, or other source data schema modifications are invasive and can nullify application vendor support agreements. Dynamic data movement is the primary market segment that serves this pattern. Vendors could also bundle data access and metadata management components for solutions in this pattern. Vendors could also bundle this pattern as the underlying middleware in master data management solutions, distributing changes to master data from the systems of entry to systems of record and reference. Real-Time Data Integration Real-time data integration provides blending and transformation of data across multiple disparate sources in a virtualized view. Also known as data virtualization, it could be considered data federation without persistence. Figure 5 illustrates this pattern assuming two disparate data sources. ©2017 IDC #US41715317 12 FIGURE 5 Real-Time Data Integration Pattern Source: IDC, 2017 In this pattern, the composite data framework accesses the disparate data sources, querying for the data in scope from both, joins the results in memory and provides a virtual view of the integrated data, isolating the data access and integration logic from the consumer of the view. The virtual view is made available to consumers via APIs. This pattern is efficient when zero latency is required between the source data and consumers. It is also useful in situations where flexibility and agility of reporting schemas are important. A virtual view can be changed without any underlying impacts. With real-time access comes impact. Querying against live systems can impact performance of the source databases and in turn can impact the response time of the virtual views. Availability of sufficient memory on the infrastructure running the composite data framework software, in addition to sufficient network bandwidth, is important when considering this pattern. It is also important that all underlying data security policies are replicated in the virtual views, to prevent unauthorized access. Packaging of composite data framework, metadata management, and data access software components is required for end-to-end solutions in this pattern. Vendors that have composite data frameworks, but not data quality solutions, could encourage the application of this pattern to identify quality inconsistencies across disparate data sources. Data Federation Data federation pulls data from multiple disparate sources, blends and transforms it into one data set, and persists the results into another physical data store. It could be used to populate an operational data store, reporting database, or data mart that needs a federated view of disparate data. Figure 6 illustrates the data federation pattern. ©2017 IDC #US41715317 13 FIGURE 6 Data Federation Pattern Source: IDC, 2017 Data federation can be used when latency is allowed between source systems and the federated views. Latency can be reduced with near-real-time population of the target data store. It is also appropriate when memory and network bandwidth constraints preclude data virtualization. Federated data should be used in read-only mode for reporting and analysis purposes. Allowing write access to the federated repository can cause a data governance issue as it creates an alternate version of the source data, leading to inconsistencies. The data federation use case leverages software in the composite data framework segment and can use any combination of dynamic and bulk data movement components to move data from source to target. Vendors could consider bundling these components together and adding metadata management components for a comprehensive solution that will also provide context and lineage of data in the federated data store. Data in Motion Pattern Data in motion represents the application of data integration technology against streaming data. Many 3rd Platform technologies constantly emit data, available to organizations that have an interest or need for the data. Data is constantly in motion, coming from social networks, the Internet of Things, stock market tickers, and so forth. Figure 7 illustrates an application of data integration technology to data in motion. FIGURE 7 Streaming Integration Pattern Source: IDC, 2017 ©2017 IDC #US41715317 14 This is an emerging pattern and is applicable to those organizations that are working with, or have a desire to leverage, streaming data. Any foray into the use of data streams needs to be made taking into consideration the abilities of the organization to capture and respond to business-relevant events in the stream. Available system resources also need to be understood in the context of the velocity and volume of the streaming data. This pattern leverages several components in the data integration software market, and vendors are encouraged to consider bundling alternatives to meet the requirements of this pattern, as applicable to customer requirements. Data access software provides the capability to interface with social media, Internet of Things, and other data stream sources. Mediation capabilities available in bulk and dynamic data movement components assist with identifying and filtering data in the streams that are relevant to the organization. Data quality components can be used to validate the data prior to allowing into the persistent storage of the organization. Related Research IDC's Worldwide Services Taxonomy, 2017 (IDC #US42356617, March 2017) Market Analysis Perspective: Worldwide Data Integration and Integrity Software, 2016 (IDC #US40799016, September 2016) Worldwide Data Integration and Integrity Software Market Shares, 2015: Year of Big Data and the Emergence of Self-Service (IDC #US40696216, June 2016) Worldwide Data Integration and Integrity Software Forecast, 2016–2020 (IDC #US40696116, June 2016) Synopsis This IDC study provides taxonomical definitions for the data integration and integrity software market. The study segments the market into data access, movement, integration, and governance functional categories. Market segments include bulk data movement, dynamic data movement, composite data frameworks, data quality, master data definition and control, metadata management, and self-service data preparation. "Data is core to digital transformation, and data without integrity will not be able to support digital transformation initiatives," says Stewart Bond, research director for Data Integration and Integrity Software at IDC. "Data integration software provides organizations with the ability to catalog, move, transform, match, cleanse, and prepare data improving the level of trust and integrity of data, improving the outcomes of digital transformation initiatives on the 3rd Platform." About IDC International Data Corporation (IDC) is the premier global provider of market intelligence, advisory services, and events for the information technology, telecommunications and consumer technology markets. IDC helps IT professionals, business executives, and the investment community make fact- based decisions on technology purchases and business strategy. More than 1,100 IDC analysts provide global, regional, and local expertise on technology and industry opportunities and trends in over 110 countries worldwide. For 50 years, IDC has provided strategic insights to help our clients achieve their key business objectives. IDC is a subsidiary of IDG, the world's leading technology media, research, and events company. Global Headquarters 5 Speen Street Framingham, MA 01701 USA 508.872.8200 Twitter: @IDC idc-community.com www.idc.com Copyright Notice This IDC research document was published as part of an IDC continuous intelligence service, providing written research, analyst interactions, telebriefings, and conferences. Visit www.idc.com to learn more about IDC subscription and consulting services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or sales@idc.com for information on applying the price of this document toward the purchase of an IDC service or for information on additional copies or web rights. Copyright 2017 IDC. Reproduction is forbidden unless authorized. All rights reserved. </p>