An Anomaly Detection Algorithm for Cathode Voltage of Aluminum Electrolytic Cell

Danyang Cao*; Yanhong Ma*; Lina Duan*

doi:10.3745/JIPS.04.0150

ISSN: 2092-805X

Volume 15, No 6 (2019), pp. 1392 - 1405

10.3745/JIPS.04.0150

Danyang Cao* , Yanhong Ma* and Lina Duan*

An Anomaly Detection Algorithm for Cathode Voltage of Aluminum Electrolytic Cell

Abstract: The cathode voltage of aluminum electrolytic cell is relatively stable under normal conditions and fluctuates greatly when it has an anomaly. In order to detect the abnormal range of cathode voltage, an anomaly detection algorithm based on sliding window was proposed. The algorithm combines the time series segmentation linear representation method and the k-nearest neighbor local anomaly detection algorithm, which is more efficient than the direct detection of the original sequence. The algorithm first segments the cathode voltage time series, then calculates the length, the slope, and the mean of each line segment pattern, and maps them into a set of spatial objects. And then the local anomaly detection algorithm is used to detect abnormal patterns according to the local anomaly factor and the pattern length. The experimental results showed that the algorithm can effectively detect the abnormal range of cathode voltage.

Keywords: Abnormal Pattern , Cathode Voltage , K-Nearest Neighbor , Sliding Window

1. Introduction

It is important to improve current efficiency for research on low temperature aluminum electrolysis. The anomaly detection for cathode voltage is the key of low temperature control. The work is very important to improve the environment for human survival. It is difficult to accurately calculate the cathode voltage of an aluminum electrolytic cell. It is not only related to the design and selection of lining materials, but also related to the furnace building quality, roasting, start-up and operation quality of the electrolytic cell, more importantly, cathode voltage of electrolytic cell will vary with the age of the electrolytic cell. The cathode voltage is mainly composed of the voltage of the cathode steel bar, the contact voltage of the cathode steel bar and the cathode carbon block and the voltage of the cathode carbon block itself [1]. An aluminum factory uses a new type of cathode signal acquisition device to collect the voltage data of the cathode steel bar. The cathode voltage in this paper refers to the voltage of the cathode steel bar. The magnitude of the cathode voltage depends on the length and temperature of the cathode steel bar. When the temperature rises, the cathode voltage will rise. If the temperature is constant, the cathode voltage is almost the same. The change of cathode voltage is also related to the crusts and precipitations [2], and the cathode voltage of each measurement point may be different. The cathode voltage is stable under normal conditions, and the fluctuations will be relatively smooth. If the cathode voltage fluctuates greatly, it should be abnormal. There may be an anomaly in the measurement point or in the aluminum electrolytic cell. The anomaly detection of cathode voltage is beneficial to prolong the service life of the electrolytic cell. It is of great benefit to the safe and efficient production of the aluminum electrolytic cell and to the increase of the economic benefit.

In order to detect the abnormality of the cathode voltage, an anomaly detection algorithm is proposed. The algorithm detects the data of single measurement point of cathode voltage, and the measurement point data has time series attribute. Therefore, a time series based anomaly detection algorithm is proposed to solve the abnormal detection problem of cathode voltage. The purpose of time series anomaly detection is to discover time series anomalies. According to the form of anomaly, the anomalies can be divided into point anomalies, sequence anomalies and pattern anomalies [3,4]. A pattern anomaly is a pattern of variation that is significantly different from other patterns in a time series. This pattern of change is generally rare, so it is called an abnormal mode. Knorr and Ng [5] proposed a distance based anomaly detection algorithm, which detects outliers by computing the distance between objects. Breunig et al. [6] proposed a density based anomaly detection algorithm, but when dealing with large data sets, the complexity is too high and the effect is not good enough. Zhang et al. [7] proposed an outlier sub-sequences detection algorithm based on important points segmentation of time series, which is suitable for one-dimensional data anomaly detection. Some researchers proposed the permutation entropy algorithm [8,9], but the permutation entropy was mostly used to study the complexity of time series [10], and mostly used in mechanical fault detection [11].

Time series is a sequence with time attributes. It represents the sequence of sampling values of a physical quantity of an object at different time points arranged in chronological order. It has high-dimensional characteristics. The mode representation of time series [12,13] is essentially a dimensionality reduction of time series, and it is a feature representation method for abstracting and generalizing time series. At present, there are mainly symbolic representation, principal component analysis representation, piecewise linear representation, and so on. This paper focuses on piecewise linear representation. This method extracts some feature points from time series, and then connects these feature points to form a line segment sequence. It has good data compression and noise filtering functions, and supports fast similarity search of time series data. Therefore, this paper uses the sliding window piecewise linear representation based on piecewise linear representation, combined with k-nearest neighbor local anomaly detection algorithm, to detect anomalous patterns in cathode voltage by mode length and local anomaly factor, and verifies the effectiveness of the algorithm in experiments.

2. Existing Device Discovery Scheme

A new type of cathode signal acquisition equipment in an aluminum factory has a sampling frequency of 5 seconds. As time goes by, the amount of cathode voltage data is huge, and the detection of the original sequence is very complicated. In order to reduce the complexity of the anomaly detection and make the detection result more accurate, it is necessary to reduce the dimension of the cathode voltage time series. The cathode voltage time series was divided into non-overlapping sub-sequences by using the sliding window segmentation linear representation method [14,15] and was converted into a collection of many line segment patterns, while sliding window does not have a fixed length and each segment limits the segmentation error. A distinct, rarely occurring, abnormal line segment pattern can be defined as abnormal pattern [16]. In this paper, the density of the pattern was used to measure the similarity between a line segment pattern and the surrounding line segment patterns. The greater the density of the pattern is, the more surrounding patterns are similar to it, and the less likely the pattern is anomalous. The smaller the density of the pattern is, the less surrounding patterns are similar to it, and the more likely the pattern is anomalous.

DEFINITION 1. Cathode voltage time series

The cathode voltage time series is an ordered set which consist of a series of observed value and observation time, it is denoted as:

[TeX:] $$X=\left\{\left(x_{1}, t_{1}\right),\left(x_{2}, t_{2}\right) \ldots\left(x_{n}, t_{n}\right)\right\}$$

DEFINITION 2. Least square method

The least square method minimizes the error squared and looks for the best function matching of the data. By using the least square method, the unknown data can be easily obtained, and the square sum of the error between these obtained data and the actual data is minimal. For groups of observations in the plane [TeX:] $$\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right)_{\dots}\left(x_{n}, y_{n}\right)$$ can be fitted by line. The slope [TeX:] $$a$$, the intercept [TeX:] $$b$$, and the sum of squared error are calculated as follows, is the average value of , is the average value of :

(1)

[TeX:] $$a=\frac{\sum x y-\frac{1}{n} \sum x \sum y}{\sum x^{2}-\frac{1}{n}\left(\sum x\right)^{2}}$$

(2)

[TeX:] $$b=\bar{y}-a \bar{x}$$

(3)

[TeX:] $$Q=\sum_{i=1}^{n}\left(y_{i}-a x_{i}-b\right)^{2}$$

DEFINITION 3. Pattern representation of time series

After the sliding window segmenting the time series [TeX:] $$X$$;, calculating the length [TeX:] $$l$$, the slope [TeX:] $$s$$, and the average value [TeX:] $$m$$; of each line segment pattern and constituting the triples [TeX:] $$\left(l_{i}, s_{i}, m_{i}\right)$$ to represent the ith line segment pattern. Assuming that [TeX:] $$X$$ is divided into segments, the pattern of [TeX:] $$X$$ is expressed as:

(4)

[TeX:] $$X=\left(\left(l_{1}, s_{1}, m_{1}\right),\left(l_{2}, s_{2}, m_{2}\right) \ldots\left(l_{c}, s_{c}, m_{c}\right)\right)$$

DEFINITION 4. Distance between patterns

The length [TeX:] $$l$$, the slope [TeX:] $$s$$, and the average value [TeX:] $$m$$ of each line segment pattern may not be equal, in order to effectively detect the abnormal pattern, we define the distance between the pattern [TeX:] $$p\left(l_{1}, s_{1}, m_{1}\right)$$ and the pattern [TeX:] $$q\left(l_{2}, s_{2}, m_{2}\right)$$ as:

(5)

[TeX:] $$d(p, q)=\sqrt{\frac{\left(l_{1}-l_{2}\right)^{2}+\left(s_{1}-s_{2}\right)^{2}+\left(m_{1}-m_{2}\right)^{2}}{\left|\left(l_{1}, s_{1}, m_{1}\right)\right|}}$$

DEFINITION 5. k-nearest neighbor distance of object [TeX:] $$p$$

The k-nearest neighbor distance of object [TeX:] $$p$$ is denoted as [TeX:] $$dist k(p) \cdot dist k(p)=d\left(p, 0\right)$$,and it should meet two conditions:

i) There are at least [TeX:] $$k$$ objects [TeX:] $$o^{\prime} \in C\{x \neq p\}$$ in the collection and satisfies [TeX:] $$d\left(p, o^{\prime}\right) \leq d(p, o)$$;

ii) There are at most [TeX:] $$k-1$$ objects [TeX:] $$o^{\prime} \in C\{x \neq p\}$$ in the collection and satisfies [TeX:] $$\mathrm{d}\left(p, o^{\prime}\right)<d(p, o)$$

In other words, the k-nearest neighbor distance of [TeX:] $$p$$ is the distance between [TeX:] $$p$$ and its [TeX:] $$$k^{\text {th }}$$$ nearest neighbor.

DEFINITION 6. [TeX:] $$k$$ distance neighborhood of object [TeX:] $$p$$

The [TeX:] $$k$$ distance neighborhood of object [TeX:] $$p$$ is denoted as [TeX:] $$N_{k}(p)$$, that is, all the objects within the k-nearest neighbor distance of [TeX:] $$p$$. So, the number of objects in the [TeX:] $$k$$ distance neighborhood of object [TeX:] $$p$$ is greater or equal to [TeX:] $$k$$, which is expressed as [TeX:] $$\left|N_{k}(p)\right| \geq k$$.

DEFINITION 7. k reachable distance from object [TeX:] $$q$$ to object [TeX:] $$p$$:

(6)

[TeX:] $$reachdist_{k}(p, q)=\max \left\{dist_{k}(q), d(p, q)\right\}$$

The k reachable distance from object [TeX:] $$q$$ to object [TeX:] $$p$$ is the maximum value between the k-nearest neighbor distance of [TeX:] $$q$$ and the true distance between the object [TeX:] $$p$$ and [TeX:] $$q$$.

DEFINITION 8. k local reachable density of the object [TeX:] $$p$$ The k local reachable density of the obejct [TeX:] $$p$$ is denoted as [TeX:] $$lrd_{k}(p)$$.

(7)

[TeX:] $$lrd_{k}(p)=\frac{\left|N_{k}(p)\right|}{\sum_{o \in N_{k}(p)} reachdist_{k}(p, o)}$$

It means the reciprocal of the average reachable distance of the object in [TeX:] $$N_{k}(p)$$ to [TeX:] $$p$$. Note that the distance is the object in [TeX:] $$N_{k}(p)$$ to [TeX:] $$p$$, not [TeX:] $$p$$ to the object in [TeX:] $$N_{k}(p)$$. The [TeX:] $$lrd_{k}(p)$$ represents the density, the lower the density is, the more likely p is the outlier. If p and neighborhood objects are of the same cluster, then the reachable distance may be smaller [TeX:] $$dist_{k}(\mathrm{o})$$, resulting in s maller sum of reachable distances and higher density values; If [TeX:] $$p$$ and neighborhood objects are far away, the reachable distance may take a larger value [TeX:] $$d(p, o)$$, resulting in a smaller density, more likely to be an outlier.

DEFINITION 9. Local anomaly factor of [TeX:] $$p$$

The local anomaly factor of [TeX:] $$p$$ is denoted as [TeX:] $$LOF_{k}(p)$$.

(8)

[TeX:] $$L O F_{k}(p)=\frac{\frac{1}{\left|N_{k}(p)\right|} \sum_{o \in N_{k}(p)} lrd_{k}(o)}{lrd_{k}(p)}$$

It is the ratio of the average of local reachable density of objects in [TeX:] $$N_{k}(p)$$ and the k local reachable density of the object [TeX:] $$p$$. If the ratio is closer to 1, it means that [TeX:] $$p$$ is almost the same as its neighborhood, and [TeX:] $$p$$ may be clusterd with the neighborhood. If the ratio is greater than 1, it means that the density of [TeX:] $$p$$ is less than its neighborhood object density, and [TeX:] $$p$$ is more likely to be abnormal.

3. Cathode Voltage Anomaly Detection Algorithm

3.1 Algorithm Design Model

The cathode voltage anomaly detection algorithm which was proposed in this paper can be divided into two parts. Firstly, the piecewise linear representation of sliding window was used to segment the cathode voltage time series, and then the k-nearest neighbor local anomaly detection algorithm was used to detect the anomaly. The design model of the anomaly detection algorithm is shown in Fig. 1

Fig. 1.

The design model of anomaly detection algorithm.

The length of the sliding window is not fixed and the segmentation error is limited. A new segment is started from the first point of the cathode voltage time series and continues to grow backward. The points in the segment are fitted by the least squares method, the fitting error is calculated according to the formula (3). The segmentation is ended until the fitting error of the segment exceeds the error threshold. Then the following sequence point is used as the beginning of the new segment, repeating the process until the end of the time series. The larger the error threshold is, the smaller the number of segments. The smaller the error threshold is, the larger the number of segments.

After the sliding window segmentation, the cathode voltage was expressed as a line pattern collection. According to the principle of sliding window segmentation, it can be seen that when the sequence changes greatly, the pattern length is small, and when the sequence is stable, the pattern length is large. The anomalies of the cathode voltage often occur when the sequence changes greatly. Calculate the pattern length, the pattern slope, and the pattern average of each line segment pattern, and map them into a collection of objects in space. This algorithm calculates three features so each object can be mapped to a point in three-dimensional space.

Calculate the k reachable distance of each object in the object set according to the formula (6), calculate the k local reachable density of each object according to the formula (7), and calculate the local anomaly factor according to the formula (8). If the local anomaly factor is greater than 1 and the pattern length is less than the average pattern length, it is more likely to be an abnormal pattern.

3.2 Algorithm Description

Input: Cathode voltage time series [TeX:] $$X=\left\{\left(x_{1}, t_{1}\right),\left(x_{2}, t_{2}\right) \ldots\left(x_{n}, t_{n}\right)\right\}$$ , segment error threshold value [TeX:] $$\mathcal{w}$$, nearest neighbor value k.

Output: Abnormal pattern set.

(1) [TeX:] $$\mathcal{w}$$ is a multiple of the maximum of [TeX:] $$X$$, and a segmentation error is obtained according to [TeX:] $$\mathcal{w}$$;

(2) The sliding window slides backwards from the first point of [TeX:] $$X$$, and the points in the window are fitted with the least squares according to Definition 2. When the fitting error exceeds the segmentation error, end the segmentation and move to the next segment;

(3) After segmentation, a line segment pattern set is obtained;

(4) Calculate the triples [TeX:] $$(l, s, m)$$ of each line segment pattern according to Definition 3, and map them into a collection of objects in three-dimensional space;

(5) Calculate the k-nearest neighbor distance of any object [TeX:] $$p$$ in the collection of objects according to Definition 4 and Definition 5;

(6) Calculate the k reachable distance between any object and a known object according to Definition 7;

(7) Calculate the k local reachable density and local anomaly factor of any object [TeX:] $$p$$ according to Definition 8 and Definition 9;

(8) If the local anomaly factor of the object [TeX:] $$p$$ is greater than 1 and the pattern length is less than the average pattern length, then [TeX:] $$p$$ is considered to be an abnormal pattern;

Output the abnormal pattern set.

3.3 Algorithm Analysis

The time complexity of the algorithm mainly consists of the following four parts:

(1) The cathode voltage time series is segmented by sliding window, the time complexity is [TeX:] $$O(n l)$$, [TeX:] $$n$$ is the length of the cathode voltage time series, [TeX:] $$l$$ is the average length of the segments;

(2) Calculate the distance between objects, the time complexity is [TeX:] $$O\left(c_{2}\right)$$, [TeX:] $$c$$ is the number of objects, that is, the number of segments;

(3) Calculate the local reachable density [TeX:] $$l r d$$ of the object, the time complexity is [TeX:] $$O\left(c_{k}\right)$$, [TeX:] $$k$$ is the nearest neighbor value;

(4) Calculate the local anomaly factor LOF of the object, and the time complexity is [TeX:] $$O\left(c_{k}\right)$$.

The value of c is the number of segments and should satisfy [TeX:] $$c<<n$$, so the overall time complexity of the algorithm is [TeX:] $$O(n l)$$.

4. Experiment Analysis

4.1 Data Acquisition

An aluminum factory used a new type of cathode signal acquisition equipment and collected three types of data as follows: the cathode steel bar voltage, cathode steel bar temperature and groove shell temperature. In this paper, the voltage of the cathode steel bar was studied. The cathode voltage in this paper refers to the voltage of the cathode steel bar. The aluminum factory has 14 cathode signal acquisition devices, one device has 16 channels. Channel 01 to channel 06 collected the voltage of the cathode steel bar, channel 07 to channel 12 collected the cathode steel bar temperature, channel 13 and channel 14 collected the groove shell temperature, channel 15 and channel 16 are empty. The cathode device 01 to cathode device 07 are on the A side and the cathode device 08 to cathode device 14 are on the B side. The status of the cathode device is shown in Table 1.

As can be seen from Table 1, the cathode voltage was divided into two sides named A and B, both of them has 40 measurement points. The sampling frequency of the cathode device is 5 seconds, and the amount of cathode voltage data is huge with the accumulation of time.

Table 1.

Cathode device status

Signal category	Cathode steel bar voltage						Cathode steel bar temperature						Groove shell temperature
Signal category	CH01	CH02	CH03	CH04	CH05	CH06	CH07	CH08	CH09	CH10	CH11	CH12	CH13	CH14
A side
Cathode device 01	01	02	03	04	05	06	01	02	03	04	05	06	01	02
Cathode device 02	07	08	09	10	11	12	07	08	09	10	11	12	03	04
Cathode device 03	13	14	15	16	17	18	13	14	15	16	17	18	05	06
Cathode device 04	19	20	21	22	23	24	19	20	21	22	23	24	07	-
Cathode device 05	25	26	27	28	29	30	25	26	27	28	29	30	08	-
Cathode device 06	31	32	33	34	35	36	31	32	33	34	35	36	09	-
Cathode device 07	37	38	39	40	-	-	37	38	39	40	-	-	10	-
B side
Cathode device 08	41	42	43	44	45	46	41	42	43	44	45	46	01	02
Cathode device 09	47	48	49	50	51	52	47	48	49	50	51	52	03	04
Cathode device 10	53	54	55	56	57	58	53	54	55	56	57	58	05	06
Cathode device 11	59	60	61	62	63	64	59	60	61	62	63	64	07	-
Cathode device 12	65	66	67	68	69	70	65	66	67	68	69	70	08	-
Cathode device 13	71	72	73	74	75	76	71	72	73	74	75	76	09	-
Cathode device 14	77	78	79	80	-	-	77	78	79	80	-	-	10	-

4.2 Results and Analysis

The experiment used python language to write all the programs, and in the CPU 3.40 GHz, memory 4 GB, Windows 7 operating system on the computer to verify the algorithm. The data are based on the cathode voltage data of an aluminum factory from March 1, 2017 to March 31, 2017.

We need to detect the possible abnormal range of the single measurement point data of cathode voltage. After testing, we set the segmentation error threshold [TeX:] $$\mathcal{w}$$ = 10, and took different nearest neighbor value k(7, 9, 10), the anomaly detection result was shown in Table 2, only the first seven abnormal patterns were shown in the table. And the number of abnormal patterns varied with k, as shown in Fig. 2.

As can be seen from Table 2, when the segmentation error threshold w was fixed, the anomaly detection results were mostly the same and not greatly changed with the change of k, the same parts were abnormal patterns we want. It can be seen from Fig. 2 that the number of abnormal patterns has no definite relationship with k, but the number of abnormal patterns tends to be stable when k is large.

Table 2.

Anomaly detectioon results of different k-nearest neighbor

k-nearest neighbor	No. of anomaly patterns	Pattern length	Average length	Local anomaly factor	Time range
k=7	19	33	39.16	2.827	3-13 22:00 to 3-15 15:00
		31	39.16	1.782	3-09 11:00 to 3-09 12:00
		36	39.16	1.631	3-01 08:00 to 3-01 09:00
		28	39.16	1.503	3-01 07:00 to 3-01 08:00
		25	39.16	1.353	3-09 21:00 to 3-09 22:00
		29	39.16	1.345	3-09 09:00 to 3-09 10:00
		27	39.16	1.339	3-09 20:00 to 3-09 21:00
k=9	18	40	41.33	2.760	3-13 22:00 to 3-15 15:00
		35	41.33	1.648	3-09 11:00 to 3-09 12:00
		37	41.33	1.511	3-01 07:00 to 3-01 08:00
		33	41.33	1.496	3-01 08:00 to 3-01 09:00
		36	41.33	1.256	3-09 09:00 to 3-09 10:00
		32	41.33	1.248	3-09 20:00 to 3-09 21:00
		31	41.33	1.237	3-09 10:00 to 3-09 11:00
k=11	17	34	43.76	2.702	3-13 22:00 to 3-15 15:00
		33	43.76	1.651	3-09 11:00 to 3-09 12:00
		43	43.76	1.522	3-01 08:00 to 3-01 09:00
		42	43.76	1.509	3-01 07:00 to 3-01 08:00
		35	43.76	1.274	3-09 09:00 to 3-09 10:00
		41	43.76	1.267	3-09 20:00 to 3-09 21:00
		29	43.76	1.238	3-09 16:00 to 3-09 17:00

Fig. 2.

The number of abnormal patterns varies with the k-nearest neighbor.

After testing, we set the neighbor value k=9, and took different segmentation error threshold [TeX:] $$\mathcal{w}$$(3, 10, 20), the abnormal detection result was shown in Table 3. The table showed only the first seven abnormal patterns. The number of abnormal patterns varied with the segmentation error threshold w, as shown in Fig. 3

As can be seen from Table 3, when the nearest neighbor value k is fixed, different segmentation error thresholds [TeX:] $$\mathcal{w}$$, the mode length is less than the average mode length and the local anomaly factor is greater than 1, resulting in different anomaly patterns. And as can be seen from Fig. 3, it can be seen that the smaller the error threshold [TeX:] $$\mathcal{w}$$ was, the more the possible abnormal patterns will be, and the larger the [TeX:] $$\mathcal{w}$$ was, the less the possible abnormal patterns will be, and the number of abnormal patterns decreases with the increase of [TeX:] $$\mathcal{w}$$. But the anomaly was concentrated, and the same parts were the anomaly patterns we want. Then the validity of the algorithm is verified.

Fig. 3.

The number of abnormal patterns varies with the segmentation error threshold [TeX:] $$\mathcal{w}$$.

Table 3.

Anomaly detection results of different segment error threshold [TeX:] $$\mathcal{w}$$

Error threshold	No. of anomaly patterns	Pattern length	Average length	Local anomaly factor	Time range
[TeX:] $$\mathcal{w}$$=3	31	23	24	2.978	3-13 22:00 to 3-15 15:00
		21	24	2.219	3-01 07:00 to 3-01 08:00
		20	24	2.081	3-09 11:00 to 3-09 12:00
		22	24	1.982	3-09 10:00 to 3-09 11:00
		15	24	1.889	3-01 08:00 to 3-01 09:00
		21	24	1.860	3-09 09:00 to 3-09 10:00
		20	24	1.848	3-09 20:00 to 3-09 21:00
[TeX:] $$\mathcal{w}$$=10	18	41	41.33	2.760	3-13 22:00 to 3-15 15:00
		21	41.33	1.648	3-09 11:00 to 3-09 12:00
		20	41.33	1.511	3-01 07:00 to 3-01 08:00
		23	41.33	1.496	3-01 08:00 to 3-01 09:00
		36	41.33	1.256	3-09 09:00 to 3-09 10:00
		22	41.33	1.248	3-09 20:00 to 3-09 21:00
		21	41.33	1.237	3-09 10:00 to 3-09 11:00
[TeX:] $$\mathcal{w}$$=20	11	54	67.63	2.655	3-13 22:00 to 3-15 15:00
		53	67.63	1.571	3-09 11:00 to 3-09 12:00
		53	67.63	1.449	3-01 08:00 to 3-01 09:00
		52	67.63	1.377	3-01 07:00 to 3-01 08:00
		45	67.63	1.181	3-09 21:00 to 3-09 22:00
		54	67.63	1.180	3-09 09:00 to 3-09 10:00
		51	67.63	1.175	3-09 20:00 to 3-09 21:00

4.3 Comparison

The proposed algorithm takes pattern length and local anomaly factors as the criteria for judging anomaly patterns, which is an optimization of the traditional local anomaly detection algorithm. The traditional local anomaly detection algorithm usually sorts the abnormal patterns according to the local anomaly factors from small to large, and takes the first several as abnormal patterns. But it is also possible that a normal pattern is more abnormal around it, and its local anomaly factor is relatively large. The detection results were not good enough if we only used the local anomaly factor as the standard to judge the abnormal pattern.

When the amount of data is small, the execution efficiency of the two algorithms is not much different, but with the increase of the amount of data, the algorithm execution efficiency of this paper is higher than that of the GSWCLOF algorithm, as shown in Fig. 4. For a dynamic data stream, the proportion of normal data is much larger than that of abnormal data. Because the GSWCLOF algorithm prunes the data, the detection accuracy of the algorithm is improved. However, Li et al. [17] proposed a density-based anomaly detection algorithm, which was later improved by proposing local sparse distance to detect anomalies and reduce the computational complexity. This view is closer to the definition of anomaly than the distance based anomaly view. However, there are still some shortcomings. When dealing with large data sets, the time complexity is still relatively high. Secondly, the detection results are sensitive to the selection of parameters [1,19] such as anomaly factor threshold, and there is no simple, unified and effective method to determine the parameters.

Fig. 4.

Excution efficiency comparison chart.

The cathode voltage anomaly detection algorithm based on sliding window has been predicted when segmenting the cathode voltage time series. Normally, the variation range of the abnormal part is relatively large, so the pattern length is relatively small. And the normal part is relatively smooth, so the pattern length is relatively large. In the local anomaly detection when the pattern length was also used as the standard to judge the abnormal pattern, combined with the local anomaly factor, the detection results were more accurate. It is an abnormal pattern when the local anomaly factor is greater than one and the pattern length is less than the average pattern length [20]. After that, of course, the pattern can also be sorted according to the local anomaly factor and take the first several as anomaly patterns.

An experiment was carried out by using a measurement point data of the cathode voltage, and the segmentation of the cathode voltage is shown in Fig. 5. The fluctuation curve was the cathode voltage, and the line segment was the fitting straight line. Experiments were done when pattern length was used and not used as a criterion for judging abnormal patterns. The anomaly detection results are shown in Figs. 6 and 7, and the abnormal patterns were marked with line segments.

Fig. 5.

Segmentation of cathode voltage.

As can be seen from Figs. 6 and 7, the cathode voltage was relatively smooth under normal conditions and the fluctuations were relatively smooth. Areas with large fluctuations are more likely to be abnormal. When the pattern length was also used as a criterion for judging abnormal patterns, Fig. 7 showed that the anomaly detection result was more accurate. Fig. 6 is likely to treat the normal point as an abnormal point, thereby reducing the accuracy. Moreover, it can be seen from the experimental part of this paper that the algorithm is more efficient and accurate.

Fig. 6.

Not use pattern length as ac riterion for judging abnormal patterns.

Fig. 7.

Use pattern length as a criterion for judging abnormal patterns.

5. Conclusions

The cathode voltage of aluminum electrolytic cell is relatively stable under normal conditions, and fluctuates greatly when it has an anomaly. The anomaly detection of the cathode voltage can effectively detect the possible anomalies in the electrolytic cell, which is convenient for the staff to adjust in time and prolongs the service life of the electrolytic cell. Based on the k-nearest neighbor local anomaly detection algorithm and the time series segmentation linear representation method, a cathode voltage anomaly detection algorithm based on sliding window was proposed, and the algorithm is valid and feasible on the cathode voltage data of aluminum electrolytic cell. But the algorithm is only suitable for the data of single measurement point of cathode voltage, and the abnormal detection of the multi-measurement point of the cathode voltage needs to be implemented by other methods. For example, based on the k-nearest neighbor local anomaly detection algorithm based on this paper, it is also necessary to combine the dimension reduction method based on principal component analysis for multivariate time series. The specific implementation method remains to be studied.

Acknowledgement

This paper is supported by the National Natural Science Foundation of China (No. 41471303), Basic Scientific Research Plan Project of Beijing Municipal Commission of Education (2018), Special Research Foundation of North China University of Technology (No. PXM2017_014212_000014), Yuyou Talents Support Program of North China University of Technology (2019), and Beijing Natural Science Foundation (No. 4162022).

Biography

Yanhong Ma

https://orcid.org/0000-0001-5258-371X

He received B.S. and M.S. degree in Computer Science and Technology from North China University of Technology in 2000 and 2006, respectively. He received his Ph.D. degree in Computer Application Technology in 2012 from University of Science Technology Beijing, China. He is currently an Associate Professor in the Department of Computer Science at the North China University of Technology, China. His research interests cover the fields of artificial intelligence, data mining. She received B.S. degree in School of Computer from Changchun Institute of Technology in 2017. She is currently a graduate student in North China University of Technology. Her current research interests include data mining and machine learning.

Biography

Lina Duan

https://orcid.org/0000-0002-3654-9420

She received B.S. degree in School of Computer from North China University of Technology in 2015. She is currently a graduate student in North China University of Technology. Her research fields cover data mining.

References

1 J. Li, W. Liu, Y. Lai, Z. Wang, Y. Liu, "Analysis of cathode voltage drop of aluminum electrolytic cell using electrical contact Model," Journal of Materials and Metallurgy, vol. 7, no. 2, pp. 99-102, 2008.custom:[[[-]]]
2 T. Kang, China Aluminum Qinghai Branch, "Methods research to lower the furnace bottom pressure-drop of pre-baked aluminum electrolytic cell," Non-ferrous Metallurgical Equipment, vol. 2014, no. 4, pp. 22-26, 2014.custom:[[[-]]]
3 X. Guo, F. Li, X. Song, "The outlier detection approach for multivariate time series based on PCA analysis," Journal of Jiangxi Normal University (Natural Sciences Edition), vol. 36, no. 3, pp. 280-283, 2012.custom:[[[-]]]
4 D. Zhou, "Cluster of multivariable time series, similar query and anomaly detection," Tianjin UniversityTianjin, China, 2009.custom:[[[-]]]
5 E. M. Knorr, R. T. Ng, "A unified notion of outliers: properties and computation," in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD), Newport Beach, CA, 1997;pp. 219-222. custom:[[[-]]]
6 M. M. Breunig, H. P. Kriegel, R. T. Ng, J. Sander, "LOF: identifying density-based local outliers," ACM Sigmod Record, vol. 29, no. 2, pp. 93-104, 2000.custom:[[[-]]]
7 L. Zhang, M. Yang, D. Lei, "Outlier sub-sequences detection for importance points segmentation of time series," Computer Science, vol. 39, no. 5, pp. 183-186, 2012.custom:[[[-]]]
8 C. Bandt, B. Pompe, "Permutation entropy: a natural complexity measure for time series," Physical Review Letters, vol. 88, no. 17, 2002.doi:[[[10.1103/PhysRevLett.88.174102]]]
9 M. Zanin, L. Zunino, O. A. Rosso, D. Papo, "Permutation entropy and its main biomedical and econophysics applications: a review," Entropy, vol. 14, no. 8, pp. 1553-1577, 2002.doi:[[[10.3390/e14081553]]]
10 X. Zhao, "Time series of relevance and complexity of the study," Beijing Jiaotong UniversityBeijing, China, 2015.custom:[[[-]]]
11 F. Feng, G. Rao, A. Si, "Abnormality detection and diagnosis of rolling bearing based on permutation entropy and neural network," Noise and Vibration Control, vol. 33, no. 3, pp. 212-217, 2013.custom:[[[-]]]
12 H. Xiao, F. Shang, "Representation of time series based on trend turning points," Science Technology and Engineering, vol. 10, no. 13, pp. 3254-3257, 2010.custom:[[[-]]]
13 S. Chen, X. Lv, R. Qi, L. Wang, L. Yu, "Linear representation method based on key points for time seriesComputer Science, vol. 43, no. 5, pp. 234-237, 2016.custom:[[[-]]]
14 X. Weng, J. Shen, "Outlier mining for multivariate time series based on sliding window," Computer Engineering, vol. 33, no. 12, pp. 102-104, 2007.custom:[[[-]]]
15 Y. Yu, Y. Zhu, D. Wan, X. Guan, "Time series outlier detection based on sliding window prediction," Journal of Computer Applications, vol. 34, no. 8, pp. 2217-2220, 2014.doi:[[[10.1155/2014/879736]]]
16 H. Xiao, "Time series similarity query and anomaly detection," Fudan UniversityShanghai, China, 2005.custom:[[[-]]]
17 S. Li, W. Meng, J. Qu, "GSWCLOF: density-based outlier detection algorithm on data stream," Computer Engineering and Applications, vol. 52, no. 19, pp. 7-11, 2016.custom:[[[-]]]
18 Y. Chen, F. Wu, L. Wu, B. Liu, "Research on time series based on anomaly detection," Computer Technology and Development, vol. 25, no. 4, pp. 166-170, 2015.custom:[[[-]]]
19 X. Guo, F. Li, H. Ye, "Outlier pattern research of multivariate time series," Journal of Xinyang Normal University (Natural Science Edition), vol. 25, no. 4, pp. 555-559, 2012.custom:[[[-]]]
20 X. Hui, "Time series anomaly detection based on user behavior pattern features," Ph.D. dissertationChongqing University, Chongqing, China, 2017.custom:[[[-]]]

Published (Print): December 31 2019

Published (Electronic): December 31 2019

Corresponding Author: Danyang Cao* (ufocdy@163.com)

Danyang Cao*, School of Information Science and Technology, North China University of Technology, Beijing, China, ufocdy@163.com

Yanhong Ma*, School of Information Science and Technology, North China University of Technology, Beijing, China, 531973086@qq.com

Lina Duan*, School of Information Science and Technology, North China University of Technology, Beijing, China, dln_zly@qq.com