Data: Data collection
The primary study area selected for the development of templates was South Wales, chosen to collect and analyze data on WPD's network areas that share characteristics with other DNOs This approach aimed to leverage the investment from WAG's Arbed initiative in the deployment of Low Carbon Technologies (LCTs) Furthermore, the number of installed monitors was intended to provide WPD with a statistically relevant data sample to support the development of Low Voltage (LV) solutions.
Network Templates which are both credible and representative for application by the other DNOs in the wider GB network
In the development of the LV network Templates two sets of data are used (i) fixed data and (ii) monitored data.
Fixed data • 951 substation profiles with, data on circa 30 different factors, such as:
• Network topology characteristics (Transformer size, feeder length and capacity)
• Details of customer mixes, including Elexon customer profiles and estimated annual consumption
The fixed data does not change during the research period and it is used to classify
LV substations and to form typical LV networks to conduct topology and energy flow analysis.
• 3600 feeder-ends monitored (including customer premises)
Monitored data includes measurements taken at 10-minute intervals for voltage, current, and real power delivered (kW) This variable data is utilized for mathematical clustering analysis to uncover underlying patterns in the real power delivered.
Table 3, two sources of data which are combined in developing LV Network Templates
The research area encompasses a diverse range of geographical locations, including inner-city, urban, suburban, rural, and industrial sites The monitored substations reflect a variety of customer demographics, ranging from those primarily serving residential clients to others that cater exclusively to industrial and commercial sectors.
To enable effective data analysis, over 824 substations and 3,600 remote feeder ends were monitored throughout the year In September 2011, preliminary data was sent to the University of Bath, facilitating the creation of sense-checking procedures to reduce the incidence of poor or sporadic data By March 2012, data delivery became fully semi-automated through WPD, utilizing an FTP link to a dedicated server at the University of Bath At the time of the original LV Network Templates report, more than 500 million substation readings and over 101 million feeder-end data points were under analysis.
WPD's strategic installation of monitoring equipment at customer premises, street furniture, and substations enabled immediate data recording upon setup As each feeder end and substation came online, the volume of data collected increased significantly This included variable data such as voltage and real power (kW), alongside fixed data, which provided the University of Bath with valuable insights into the relationships between load, voltage flow, and the available network capacity on specific LV circuits or substations.
Data: Sense Checking
To ensure the reliability and accuracy of the final LV Network Templates, sense-checking activities were implemented due to the extensive collection of real-time data from substations and feeder ends This process is crucial, as data errors can occur at any stage, from initial recording to its reception by the University, thereby maintaining consistency and credibility throughout.
The robust sense checking methodology captured three main issues:
• Low currents are represented by several sole readings due to the poor resolution of current meters
The calculated power derived from voltage and current readings aligns closely with the metered power, known as 'real power delivered (RPD)', at the majority of substations; however, there are a few substations where a significant discrepancy exists between these two power measurements.
• The majority of power readings at some substations are zero
In the UK, the nominal phase voltage for low voltage (LV) substations is set at 230V, with a tolerance of +10% and -6% This tolerance allows for effective sense checking of both substation and feeder-end voltages, with any significant deviations flagged for further investigation The voltage sense checking methodology categorizes feeders based on their length and power delivery, using averaged voltage as a benchmark to represent the voltages within each group.
Therefore, the abnormal voltage data can be spotted by comparison with the average voltage of the feeders from the same category
Voltage drop (VD) is influenced by active (P) and reactive (Q) powers along a feeder, alongside the feeder's resistance (R) and reactance (X) For detailed insights and examples of the comprehensive voltage sense checking activities conducted, please consult the "Stresses on the LV Network Caused by Low Carbon Technologies" report [25].
The currents at LV substations fluctuate significantly throughout the day based on customer demand, making it impractical to establish a low threshold for accurate current recording An estimated upper limit can be derived from the maximum demand of domestic households, typically around 2kW in the UK, allowing for validation of measured currents against the number of domestic customers served However, this method is less effective in commercial and industrial areas due to limited information on customer demand patterns, necessitating greater engineering judgment for accurate sense checking Below is a summary of current data that has been sense checked at the substation level.
Current readings within substations typically fluctuate according to their capacity; however, discrepancies in meter resolution have been identified To illustrate these ongoing issues, Figure 23 presents a comprehensive overview of the current readings from a total of 824 substations, with the X-axis representing the values of the current readings and the Y-axis indicating the frequency of each reading occurrence.
Figure 23, Occurrence Frequency of Current Records from 0A to 100A
The resolution of current readings varies significantly, starting at around 2A and decreasing to 0.5A as current levels rise It reaches a minimum of approximately 0.2A when currents exceed 90A Notably, when currents fall below 20A, they are represented by a limited number of discrete values, particularly at lower levels The specific readings of 8.76A and 10.73A are observed approximately 200,000 and 100,000 times, respectively, with no other values recorded in between Table 4 illustrates the percentage of low readings within the current dataset.
Table 4, Percentage of low current readings
The investigation of current readings at substation 536787 reveals a "saw tooth" pattern, as shown in Figure 24, indicating poor resolution rather than the expected smooth curve The current readings fluctuate between 0A and 12A, and although the resolution improves for readings above 30A, the overall profile remains unsmooth Fortunately, this issue has minimal impact on power measurements, leading the University of Bath to conclude that the quality and credibility of their templates will not be affected.
Figure 24, 10-minute interval of current for Substation 536787
Three-phase power sense checking at substations involves calculating the power using recorded phase voltages and currents, along with an estimated power factor When the voltage, current, and power data are accurate, the resulting power factor should align with the expected values based on customer types This calculated power can then be utilized to verify the metered power at low voltage substations.
The real active power delivered (PD), along with the phase voltages (V) and currents (I) of the three phases A, B, and C, are key components in assessing power performance The power factor (PF) is also an essential metric in this evaluation Table 5 below provides a comprehensive summary of the power sense checking results for all 824 substations involved in the project.
Group not for use (Group B) 94
Table 5, Power Sense Checking Results
Theoretically, the metered real power delivered should match the power calculated using voltages and currents multiplied by a power factor, typically ranging from 0 to 1 based on customer type Domestic customers usually have a power factor above 0.9, indicating that the metered real power should closely align with the calculated real power However, an analysis of 824 substations revealed significant variations between these two values Consequently, the substations can be categorized into two groups based on the discrepancies between calculated and delivered power: "Group for use" (Group A) and "Group not for use" (Group B).
A total of 730 substations exhibit real power delivered readings that closely align with calculated real power values, demonstrating a strong correlation with power factor The ratios between these two measurements typically exceed 0.7, highlighting the significant influence of power factor on performance.
Figure 25 below demonstrates the relationship of the two values on substation 511028
The analysis utilizes kW as the measurement unit, highlighting a benchmark represented by a red line with a slope of 1 The black line, which indicates the ratio of power delivered to power calculated over the entire period, closely aligns with the benchmark For the substations in this group, either the power delivered or the power calculated can be used for effective clustering Additionally, the metered power delivered is more accurately represented by real power consumption, as it incorporates the power factor.
Figure 25, Real power delivered against real power received (Group A)
Group not for use (Group B)
Ninety-four substations are suspected of being non-operational due to abnormal power factor readings For instance, at substation 513569, the actual power delivered significantly deviates from the calculated power, as illustrated by the marked difference from the benchmark line Although the power values fall within reasonable ranges, this discrepancy may be attributed to a low power factor at the substation, making it challenging to evaluate the overall reliability of the data.
The other type of substations in this group have zero or close to zero real power delivered readings resulting in low power and current readings being calculated
Substation 511361 is an example of this (Figure 27) with most readings being either zeros or 0.1 kW.
Figure 27, real power delivered against real power received (Group B)
A potential cause for low real power delivered readings being received could be due to the poor metering resolution or a defective meter Due to the identified issues in
Group B, it is evident that further solutions would need to be identified, applied and tested in order for the data to provide the necessary confidence.
Analysis and Modelling: Moving from data to the
DEVELOPMENT OF LV NETWORK TEMPLATES
As part of Step 2, a four stage process was required to develop the LV Network
• Clustering – grouping substations according to patterns of real power delivered leading to clusters of substations
• Classification – characterising the relationship between these groups and fixed data including substation characteristics and customer profiles
• Scaling – transforming the results from the clustering and classification steps to appropriate scales than can be used in practice
Quality assurance involves assessing estimated demand profiles against actual observed data, conducting an in-depth analysis of outliers to determine the accuracy of templates, and identifying potential issues This methodology is then applied to data from other Distribution Network Operators (DNOs) to evaluate its applicability across different license areas.
The initial stage of the project utilized monitored substation data to form distinct clusters, setting it apart from the subsequent stages that rely on fixed, routinely available data The second and third stages focus on developing classification rules and scaling factors based exclusively on this fixed data, allowing for their application in regions and timeframes lacking monitored data A comprehensive overview of these stages is outlined below.
Analysis and Modelling: Clustering: Techniques and Application
The objective is to analyze daily patterns from substations to form clusters where the daily patterns are more similar within each cluster than between different clusters Statistically, this requires that the variation between clusters (inter-cluster variation) is greater than the variation within each cluster (intra-cluster variation).
The project utilized agglomerative hierarchical clustering to define clusters based on detected data patterns, without any prior assumptions This method organizes daily load patterns into clusters represented by a dendrogram, as illustrated in Figure 28 Clusters can be formed through either a top-down or bottom-up approach; in the top-down method, all objects start in a single cluster and are iteratively divided into smaller clusters based on dissimilarity measures until each object is isolated in its own cluster.
This project utilized a bottom-up clustering approach, where each object is initially classified as its own individual cluster These clusters are then merged based on similarity measures until they form a single cohesive cluster This method is advantageous as it demonstrates reduced sensitivity to errors during the agglomeration process.
When utilizing the agglomerative method, it is essential to focus on two key factors: first, the selection of an appropriate dissimilarity measure that determines the proximity between objects, and second, the criteria that establish whether two clusters are sufficiently close to be merged into one.
The dissimilarity measure used in clustering largely depends on the intended output and purpose For instance, when the goal is to identify clusters based on peak usage, the measure focuses on variations in daily peaks In contrast, when creating LV templates, the emphasis is on detecting temporal patterns, necessitating the use of normalized data Additionally, the measures can be tailored to minimize the impact of outliers and noise, depending on the specific analytical requirements.
Figure 28, Dendogram: Substation Cluster Analysis on real power delivered
The dendogram above (Figure 28) illustrates the clusters of substations using normalised data from South Wales
The choice of data units used for clustering significantly influences the characteristics of the resulting clusters For instance, using direct measurements of real power delivered creates clusters that represent load magnitudes To uncover additional features, such as daily load patterns, data normalization or scaling is essential Two normalization methods were evaluated alongside direct measurements, with the selection depending on the main focus of how substations are categorized The three approaches considered include direct measurements, normalization techniques, and their implications on clustering outcomes.
• Magnitude– where data on the original scale of the measurements is used
• Normalised I – where the data is normalised to the maximum value of real power delivered recorded at each substation over entire period of study
• Normalised II – where the data is normalised to the maximum value of real power delivered at each substation for each day
The first approach to creating clusters focuses primarily on the average real power delivered by each substation, which leads to the neglect of daily demand patterns Consequently, the clusters formed primarily represent the number of customers and their mix at each substation, rather than providing insights into demand patterns This limitation renders the approach ineffective for planning purposes and fails to enhance the existing fixed data As a result, this method is deemed unsuitable for analysis.
The second approach partially addresses the issue but is influenced by daily and seasonal fluctuations in maximum values, leading to clusters that primarily reflect temporal effects Consequently, clusters may be established based on seasonal demand differences, with winter driving higher maximum values and summer resulting in lower ones This complicates the detection of daily demand patterns Additionally, both the first and second approaches face challenges when a scaling factor is applied, as they create template versions of one another This is problematic because substations may shift clusters with the changing seasons, complicating the practical application of Low Voltage (LV) Network Templates.
The third approach focuses on quickly detecting demand patterns, making it ideal for LV template development This method generates clusters based on demand variations over a 24-hour period, further refined by distinguishing between weekdays and weekends, as well as different seasons—Spring, Summer, High Summer, Autumn, and Winter Ultimately, these clusters are used to create tailored templates.
The normalized approach requires a scaling factor to adjust templates to actual values, which, while seemingly complex, enhances flexibility in template usage Instead of relying on a limited set of around ten templates that must accommodate both load shape and magnitude for all substations, scaling these normalized templates allows for a diverse range of load shapes This customization ensures that the template output is specifically tailored to meet the unique requirements of each substation.
Since the release of the LV Network Templates Report, new data has been collected over the past year, allowing for a comprehensive clustering analysis of five seasons from summer 2012 to spring 2013 This method enables seasonal differentiation, ensuring that clusters can adapt to the unique characteristics of each season Load patterns for each season are documented, facilitating the mapping of clusters over time and providing insights into the dynamic shifts of substations across different clusters.
Analysis and Modelling: Classification
The University of Bath established a methodology for assigning substations to predefined clusters in the absence of monitored data This approach aims to create a statistical model that predicts cluster membership using only routinely available data, thereby generating effective rules for classification.
Multinomial logistic regression models were employed to analyze the relationships between cluster membership and fixed data This model comprises nine logistic regression equations, each designed to predict the probability of a substation belonging to one of ten clusters The reason for using nine equations instead of ten is that they are all relative to the probability of being in cluster 1.
A substation was assigned to the cluster with the highest probability, indicating a strong likelihood of its suitability The probabilities for all ten clusters provide insights into the confidence of this selection Validation and accuracy were determined by comparing predicted cluster memberships with actual memberships.
The variables chosen for the classification model were carefully selected from the comprehensive list of factors available in the fixed dataset, as detailed in Table 6.
Number of customers in each Elexon class
Estimated annual consumption for each Elexon class
Percentage of industrial and commercial customers *
Percentage half hourly metered load
Total length of HV feeder
Percentage overhead lines (at HV feeder)
Table 6, Variables considered for use in classification model Selected variables are highlighted with blue background Variables marked with * indicates that categorised versions were used in the model
The selection of factors for the model was based on statistical significance, enhancing model fit and accurately predicting cluster membership, while also considering the completeness of individual variable data A key aspect of the analysis involved exploring potential non-linear relationships between variables, such as transformer rating and the probability of cluster membership This was achieved by utilizing non-linear functions and categorization to identify statistically and practically meaningful groups Extensive testing led to the development of categories that not only preserved but also improved the utility of the information for predicting clusters The final set of categories is as follows:
• Transformer rating: classified as < 0, 200- 500 (indicates rural, suburban, urban)
• Number of LV feeders: classified as 5 (indicates to rural, suburban, urban)
• Percentage I&C: classified as