AMI system operations pose unexpected challenges


By David Gordon Kreiss and Masoud Abaei

Utilities have found that the operation of their smart meter/ AMI system is more complex than expected. AMI is not a mature technology nor is AMI operations a mature activity. Utilities have learned to expect the unexpected. This article will describe the challenges that utilities are facing as they operate their AMI systems.

AMI vendors for the most part have supplied quite reliable systems. They are to be applauded for providing both hardware and software that reliably collects and stores billing data. But when it comes to the myriad of tasks associated with the operation of an AMI system, utilities have experienced many unexpected challenges. Below is a summary:

  • The identification and analysis of AMI meter and communications issues are more complex than expected
  • AMI operating tasks are more complicated and require more steps than expected
  • Operating and analysis tools beyond those supplied by the vendor are needed. These tools include geospatial, work management, reporting and analytical applications
  • Many additional processes are required as well as unexpected changes to existing established processes
  • The organization and staffing to effectively operate an AMI system was not clearly understood.

AMI operations are focused on the performance and reliability of over the air (OTA) activities. AMI systems provide a number of daily OTA services. Billing interval data is generally collected daily. Many utilities also execute daily reads of network statistics, meter events, and voltage profiles. Other OTA services would include remote service switch activations or de-activations, meter re-programming (configuration), and diagnostic tasks to include meter pings and load side voltage checks. Also, a collection of OTA activities strictly associated with communications backhaul devices occurs continuously checking the health of these devices and collecting communications metrics.

In addition, large OTA activities such as meter and backhaul firmware downloads as well as mass configuration changes are conducted throughout the year. It is importantthat these activities do not affect overall billing read performance.

While these OTA services are carefully scheduled, it is not completely understood how the system will be loaded during the course of the day. Packet message size can be determined, but what cannot be identified is loading at any point of the system and the identification of bottlenecks.

The primary goal of an AMI system is to insure the reliable and timely completion of these OTA activities.

AMI operational objectives are to optimize network performance and manage risk and reliability. A metric for measuring network performance is meter read rate. A 99.5% daily meter read rate would be a target but a 98-99% actual rate is most generally the case for fully deployed networks.

Reliability can be measured in a few ways. Field device failures are used as a primary metric but other incidents that affect the ability of the AMI system to perform its OTA activities must also be tracked. These issues could be head-end system interruptions or data upload failures (to include ETL) or delays that are required for operations.

Risk will be discussed later, but as an operational objective, risk management is to avoid an incident that could have a significant financial impact or seriously affect the utility’s reputation with the public or commission.

Below is a description of challenges being faced by utilities in meeting their operational objectives.

The necessity and effort to integrate many additional sources of data beyond those provided by the AMI system to effectively monitor and manage an AMI system
It was expected that an AMI system would provide data, such as meter events and head end communications “result codes” which would allow a client to directly identify hardware and communications problems. Utilities have found that this data alone is not sufficient and must be correlated and processed with a broad array of disparate data sources to provide actionable information.

AMI field devices, to include meters and backhaul, along with the head-end system provide quite a bit of information regarding the health of a device or system component. In fact, for a one million meter deployment, a utility can receive millions of records daily. This data by itself is not generally useful until it is correlated with other utility data. For example, a meter reporting a time synchronization event may not need to be addressed if it switched backhaul devices, or the system reporting a nonresponding meter may not be relevant if the meter had been scheduled for replacement. The challenge utilities face is to first set up the rules for converting data to addressable incidents and then to integrate the necessary data sources to allow for a correlated analysis.

AMI system analytics require an end-to-end view
For many AMI incidents it is difficult to immediately determine the cause of the issue. Impacts of meter read rate performance could be due to backhaul device issues, backhaul system issues (cell tower issues), head end or network management issues or event security appliance issues. An end-to-end view is necessary to isolate the part of the system that has failed, become unavailable, or has become a bottleneck.

Firmware and configuration downloads and management
Field device firmware and configuration upgrade projects can be treated as an art rather than a science. The activity cannot be executed using a simple playbook where a fixed collection of steps can be taken to complete the task. The actual process is generally a broadcast(s) download followed by a collection of point to point jobs to help fill in the gaps, and then a series of cleanup and activation steps. Near time analytics and mitigation are required to mitigate communications issues during the process so as to get to optimize the number of devices with a complete download. This is extremely important since in some cases devices that have not received the new firmware or configuration would need to be replaced. This occurs when the new firmware is not backward compatible such as with a communication speedup or when a necessary security patch is pushed.

Security alert identification and false positive analysis
Maintaining cyber security of the field devices and communications system is a critical activity. While AMI devices do provide useful data for identifying a cyber-attack, the raw data must be correlated with other data to glean out possible threats. The data sources used for correlation are quite varied and include real time field activity information. In addition, AMI users are experiencing large quantities of false positive cyber security alerts. Manual analysis is often not possible to meet SLAs. Therefore, some utilities are implementing automated, rues engine, analytics to solve the volume problem.

Finally, the criteria for identifying a cyber-attack are constantly changing. Utilities often have periodic workshops with the vendor, meter organization, IT and their own security department to re-visit the rules and processes for cyber-attack management. Any changes must be implemented promptly.

Utilizing power outage and restoration events
A well promoted function of AMI meters was to issue real time notifications when connected power is lost and when power is returned. On the surface, the use of power outage events (POE) or last gasp transmissions seems rather simplistic. When a POE is received the expectation is that the premise has lost power, and when a power restoration event (PRE) is received the expectation is that power to the premise has been restored. However, that has not been the case. While AMI users did not expect 100% reliable reception of POEs and PREs, the actual percentages have been well below expectations. In addition, during complex outages (multiple outage and restorations due to the nature of the fault and response of the protection systems), it may not be possible to match a restoration message to a particular outage alert for a given premise.

Let us be clear that POEs and PREs are extremely useful data, of great value to storm outage management and no lights response, and to identify nested outages. But analytics and processes must be developed to support the raw POE and PRE data.

One last note, the expectation that POEs can be mapped to an individual transformer or phase of a circuit may be optimistic. Utilities have found that the accuracy of the electrical connectivity mapping of these assets can be too low to provide crisp analytics to accurately identify a faulted phase or a faulted distribution transformer. To remedy this issue there are a number of active projects to not only improve the accuracy of the existing connectivity model but to also create an automated system to maintain an acceptable level of accuracy over time.

Managing overall system risk
Risk, as distinct from reliability, defines an event that can result in a large financial loss or a serious impact on the reputation of the utility with its customers and commission. Utilities have experienced quite surprising events that thankfully did not result in a catastrophic incident. But there is the possibility of such incidents as a large scale unauthorized remote disconnect or a cascading equipment failure. Strategies and processes to address these risk factors were not generally implemented in the initial plan and deployment of the AMI system.

Faced with this unexpected challenge utilities are realizing the need for real or near time monitoring and analytics. In other words, the need for real time situational awareness. The specific need is to identify a critical event as quickly as possible. To supply management with real time information and to create processes with very specific SLAs to mitigate the impact if not prevent a potentially catastrophic event.

AMI supplied tools are insufficient
AMI system vendors have provided monitoring and analytical tools to support the operation of their system. These tools generally include the ability to perform OTA ad hoc data collection, a database for storing AMI data to include communications and meter events, and a reporting application that sits on top of the database. These basic tools have been insufficient for a variety of reasons.

As previously presented, AMI monitoring and analysis requires data from many different sources to include customer service, meter services, security appliance and backhaul data to name a few. The databases provided by the AMI vendors were not appropriate to store this data and therefore the AMI reporting system was deficient.

Need for an integrated geospatial application is critical
Many utilities are now integrating a geospatial analysis application in their AMI operations processes. These applications provide essential information to analyze meter issues especially when diagnosing possible RF issues. A geospatial field tool is also needed when engineering solutions to address communications issues.

Work management
The need for issue tracking tools to include work management applications quickly became apparent as the AMI user was faced with an unexpectedly large number of daily non responding meters and other system issues. Utilities needed an effective way of organizing and tracking the large number of AMI devices being worked on by the operations team.

Most utilities deploying AMI systems found themselves creating many additional processes that were not originally expected. The handling of the many variations of security alerts, the management of backhaul activation/de-activation, security key and appliance requirements, and the management of mitigation driven field communications devices required in many cases unexpected processes. Change management also became a bigger task to not only manage process, release and version changes but also to manage diagnostics rules and work activities.

It seems that every utility has its own unique organization structure to provide overall end to end management of the AMI system. Some utilities have a collection of loosely connected groups to address each component of the AMI system. This results in separate groups to monitor, analyze and manage OTA communications, field device mitigation and management, head-end system, security appliances, MDM, and product management. In some cases still other groups are in charge of AMI related projects and processes.

What utilities have learned is that operating an AMI system requires a collection of skills that need to be tightly integrated rather than spread over disparate groups. This need becomes quite obvious when viewing typical AMI operational processes such as “mitigation of non-responding meters” and “security alert false positive analysis”. For a 1 million meter deployment, hundreds of new meters may not have communicated over multiple days. To effectively and efficiently process these non-responding meters requires effort from those with expertise in mesh networks as well as backhaul communications, head-end IT systems, field devices, and even customer service systems where the meter may have been removed or cut out flat. As noted earlier, utilities receive quite a few potential cyber alerts a day. Again, this requires expertise of the operations staff over a range of AMI components to effectively and quickly perform false positive analytics.

AMI systems have delivered great value. The OTA collection of billing data has been successful and utilities are piloting projects to utilize AMI functionality in smart grid oriented distribution operations to improve system efficiency and reliability and reduce O&M. But the operation of an AMI system clearly requires staffing, tools, and processes beyond what was generally planned. AMI systems are just more complex than expected.

Many utilities are finding it difficult to address the challenges discussed in this article. Not because of technology but because the cost to address these challenges was not included in the original project funding package. The result is that utilities are addressing each challenge with a piecemeal approach rather than architecting a holistic solution. While the piecemeal approach may seem more feasible to obtain funding, over the long haul it will cost more. Incident identification application should be integrated with work management as well as geospatial tools. The result is that instead of utilities having an efficient operating centre with integrated operating system software, they have sets of standalone tools and are in a constant state of catch up.

It has been only recently that large scale deployments have been completed and their utilities are assuming operation control from their vendors. Much is being learned and hopefully will be shared. One utility has just deployed an enterprise AMI operations system that provides real time monitoring and automated analytics. Others have integrated AMI data into their volt/var and distribution automation system. We can expect that the systems, tools, staffing and processes associated with AMI operations will evolve over many years to come.

To read further articles from Smart Energy International magazine, edition 3 2013, click here and create a login.