Data integration in the Internet of Things


In an interview with Metering & Smart Energy International, Franco Castaldini, vice president, marketing at Bit Stew Systems speaks frankly about how utilities and energy companies are being impacted by the Industrial Internet of Things, focusing on the several challenges related to data integration as more and more devices get connected to the grid and generate additional data streams.

The Industrial Internet of Things (IIoT) is creating a fundamental shift in advanced energy production and distribution technology, management and services while leveraging existing investments in infrastructure and operations.

Utilities need to become more agile and flexible in light of increasing renewable and distributed power generation, both of which demand a more flexible smart grid that can handle multiple energy sources in a decentralised and bi-directional network.

Utility companies are also expected to improve routine procedures such as outage management, predict asset performance and transform energy data into new services. To perform these tasks, data generated through sensors and other intelligent technologies need to be integrated, combining data from disparate sources into meaningful and valuable information.

In an interview with Metering & Smart Energy International, Franco Castaldini, vice president: marketing at Bit Stew Systems, speaks frankly about how utilities and energy companies are impacted by the Industrial Internet of Things, focusing on the several challenges related to data integration as more and more devices get connected to the grid and generate additional data streams.

Castaldini begins by defining the Industrial Internet of Things as the process of “leveraging data from connected devices and turning that data into new business value associated with asset performance and operational intelligence and efficiency.”

In order to gain a better understanding of some of obstacles faced by industrial organisations taking on the IIoT, Bit Stew commissioned a survey that gathered the feedback of over 100 IT and operational executives. The focus was on the steps that industrial organisations are taking to prepare for the Industrial Internet of Things, the potential benefits the IIoT offers their businesses, and the major hurdles encountered along the way.

The survey also served to establish whether multiple industries employing IIoT are experiencing similar challenges with regard to data integration.

The survey revealed that as the IIoT matures, confidence in current data integration tools declines by half. The survey respondents indicated higher confidence in their existing data integration tools during the planning stages which steeply declined once they actually began implementing an IIoT solution. They discovered that their existing approach/ tools struggle to accommodate the volume and complexity of industrial data.

Early adopters of IIoT have indicated challenges around the ‘technology limitations’ of current data integration tools. On the IT side, this refers to the extract, transform and load (ETL) tools, which are used in the process of extracting data from source systems and transform these into a new model that can be used for data warehousing. In Industrial IoT, data volumes increase substantially compared to consumer markets.

Apart from volume, there is also complexity around data variety. Castaldini says: “It’s one thing to integrate data that is highly structured from relational databases that support your typical enterprise systems. It’s another to take data from your OT systems such as your distribution management system, outage management system and GIS system and try to bring data from these systems together with enterprise data. It’s a much taller ask than simply integrating data from traditional applications that IT has typically managed. IT/OT data convergence is a huge challenge.”

Castaldini adds that there is a lack of access to the right skillsets, noting that early adopters are hiring data scientists and their equivalents on the OT side.

He says most IT and OT departments will outsource data integration tasks to a system integrator who will spend time with individuals close to that data.

“There tends to be a lot of back and forth between the system integrator and the utility, costing a significant amount of time and money, only to establish a baseline understanding of the data, so that you can start to map the data from source to target. Data mapping is one step in the data integration process whereby you are transforming the data to fit it to the target data model you are working with.”

Frequency of data is another consideration and can be classified as real-time or interval data.

“You might have isolated areas of connected devices that might have communication or telecommunication challenges, sending data to a data centre to be filed. You therefore have a need for processing and filtering data at the edge. So all of these complexities make industrial IoT unique and highlights the shortcomings of traditional ETL tools and using them for the unique technical requirements of IIoT.”

Castaldini notes that many of the vendors that are active in the IIoT market place the responsibility for data integration on the customer. Many solution providers in this space focus on analytics and visualizations and rely on pulling data from whatever their existing data management solution is: for example, a data warehouse, historians or data lakes. These solutions end up falling short on meeting the requirements of a use case that intends to provide asset or operational value. Customers are also relying on antiquated technologies to be able to integrate their data from connected devices and systems, which presents a significant roadblock in their ability to extract useable information.

“’I’ve seen what happens with utilities that have to deal with data management as they take on advanced distribution management systems. They simply do not budget enough time and effort – and the results are ugly.”

“You’re going to have to be prepared for a lot of vendor management and having to articulate a broader vision of what you want to achieve. You’ve got to first solve the data challenges before you can reach the business outcomes that you want, and then you’ve got something that can scale and allow you to iterate on top of it.”

Castaldini recommends small, manageable implementations of analytics that can be tested, before scaling out more broadly.

The survey results also show that proven models for data modelling and mapping rose to the top as being the number one thing that industrial organisations are looking for when evaluating a vendor for their capabilities. Utilities are currently using tools/approaches that require significant human intervention. At the scale and speed of industrial data this simply does not work. Utilities are relying on existing data warehouse infrastructure and business intelligence tools to create a data management solution. These solutions cannot operate in real time or at the edge.

What is needed is a data intelligence platform that removes the need for time consuming human intervention. For instance, one that uses ML algorithms, pattern recognition, statistics, and other forms of machine intelligence to substantively automate the processing, integration, and provisioning of data. It needs to integrate data for the data warehouse, the data lake, and cloud services, pulling from traditional and operational data sources; and support and promote the self-service discovery, data prep and advanced analytics use cases, too. Finally, it should integrate data for machine driven use cases such as data mining, machine learning, and deep learning.

“What’s interesting to note is that customers have said that they’re not seeing a decrease in data integration costs from one project to another. While this may not be the case for all utilities, this has been the general sentiment of the utilities that we have spoken to.”

According to research conducted by Gartner, 80% of data analytics project costs go toward data integration, while 50% of these projects go way over budget or end up failing due to inadequate data integration tools/practices.

In Castaldini’s opinion, data integration needs to be approached by recognising the uniqueness of utilities and their heterogeneous environments and the complexities inherent in what they’ve created and what connected devices now add to that complexity.

“You have to remove much of the human interpretation of data out of the process of integrating it.”

Castaldini recommends newer approaches around machine learning and artificial intelligence, where algorithms are applied to the data integration process. This method does not eliminate human involvement entirely, but does reduce the amount of time that an individual would have to spend on data integration.

“We term this approach ‘machine intelligence’ – a blend of machine learning and artificial intelligence practices and methodologies specific to the data integration problem.”

Castaldini provides an example of where a machine learning approach was used for data integration. General Electric partnered with Bit Stew to manage the integration of an oil and gas dataset in just over four hours. A similar integration had previously taken GE six man-months to complete.

“We developed that data model and mapped the data to that model. We ingested the data and then visualised it. GE used data historians and had a schema called ‘WITSML’ to persist data and store it in distributed file systems. They used an ‘extract, transform and load’ (ETL) tool and data mapping tool to do this.

“However, the bulk of the time was spent talking to subject matter experts in the upstream space. This was specifically well data. GE went back and forth trying to understand the data and how it related to the WITSML data model, which is much like the CIM used in the utility industry, but for upstream oil and gas,” said Castaldini.

A utility-specific example – Pennsylvania Power and Light – entailed the mapping of the data and ingestion of the data per system, across 54 data sources representing two million end points (two million smart meters). The data in this example was integrated in just over a week with the support of one engineer. MI