Research Article - (2016) Volume 5, Issue 2
Keywords: Optimization; Particle swarm optimization; Scheduling functions; Population-based methods; Multi armed bandit algorithms
Population based optimization algorithms are algorithms in which a set of input values (or candidate solutions, or positions within the problem search space) is maintained through the course of the algorithm, as opposed to non-population-based methods which generally examine only one input value at each iteration of the algorithm. We shall refer to these input values as “individuals”, although common terminology varies based on the context of and (often biological) inspiration for specific algorithms. In general terms, a typical procedure for a population-based single-objective optimization algorithm is as follows:
Initialize all individuals within the search space
while Termination condition not met do
Choose next individual to examine based on
Scheduling function
Apply algorithm-specific operations
Update the individual’s position within the search space
end
Choose the best individual for the final optimized value
Note that in the for loop above, each individual is processed in order at every iteration. This is the dominant method of updating individuals’ positions for most population-based optimization algorithms. However, when the individuals are processed separately within the algorithm, it is not necessary to follow this update ordering throughout the optimization. In this paper, we introduce a generalization of this concept in which a scheduling function is defined in order to choose the next individual to examine at each iteration. Thus, the outline above becomes instead:
Initialize all individuals within the search space
while Termination condition not met do
for Each Individual in Population do
Apply algorithm-specific operations
Update the individual’s position within the search space
end
end
Choose the best individual for the final optimized value
With this generalization defined, the typical procedure outlined first becomes a special case of a “round-robin” schedule function, in which each individual is examined in a clockwork manner. We will also show in this paper that even other simple schedule functions significantly outperform the typical round-robin schedule.
In order to show this, we will employ one of the most widely used simple population-based optimization algorithms, Particle Swarm Optimization (PSO). For the sake of consistency and reproducibility, we will use the Standard Particle Swarm Optimization implementation standardized in 2011, known as SPSO-2011 [1-3]. Using a generalization of this algorithm to include schedule functions, we will compare several such schedule functions on a series of common optimization benchmark problems, and show that some of these functions consistently outperform the standard round-robin schedule used in many population-based algorithms.
Particle swarm optimization
Particle Swarm Optimization (PSO) [4] is a population-based optimization method involving a collection of “particles” iteratively moving within a problem search space according to position updating formulas. These formulas include movement toward the current best-known position by the individual particle, movement toward the current best-known position by the swarm as a whole, and some random movement. These position update formulas form the heart of the PSO algorithm [5,6]. In basic form for a single particle, they are:
where:
D is the number of dimensions in the search space
x2 is dimension d of the particle’s position
vd is dimension d of the particle’s velocity
rp and rs are random numbers in [0, 1]
pd is dimension d of the particle’s best-known location
gd is dimension d if the swarm’s best-known location ω, φp and φg are parameters to control behavior of the algorithm
Much research has been done on the selection of PSO parameters (e.g., [7-9]), and many improvements to the algorithm have been proposed since its introduction. Many of these improvements have been incorporated into the successive standard PSO implementations, with the latest being SPSO-2011, the version utilized in this paper. A complete discussion of these improvements and alterations is outside the scope of this paper, but both descriptions of the SPSO-2011 implementation and source code in various languages are available at [3,10,11].
Scheduling functions
Our aim in this paper is not to find the “best” scheduling function for population-based algorithms of the typse under consideration, but rather to demonstrate that some other relatively simple scheduling functions outperform the standard round-robin schedule. Since one of the most basic considerations in any optimization task is the tradeoff between exploration and exploitation, a natural choice for potential scheduling algorithms is those based on the widely studied Multi-Armed Bandit problem [12-14], which is a classic exploration/ exploitation problem in which a player at a series of slot machines seeks to maximize overall reward. In particular, we will examine some of the best-performing schedule functions from the comparison work of Kuleshov and Precup [15] and others [16,17] – the e -greedy algorithm, Boltzmann Exploration (Softmax), and Upper Confidence Bounds, in addition to the standard round-robin schedule and a random scheduling function.
All together, we will test the following eight scheduling functions, about which details are given in the following sections: Round-robin Schedule
• Random Schedule
• Fixed t greedy Schedule
• Adaptive e greedy Schedule
• Fixed t Softmax Schedule
• Adaptive t Softmax Schedule
• UCB1 Schedule (Upper Confidence Bounds)
• UCB1-Tuned Schedule
e-Greedy schedule
The e -greedy algorithm is a simple and popular decision algorithm.At each decision point, the algorithm selects the option with the highest current reward with probability 1-e, and a random option with probability e. That is, given rewards at time t of , the probability at time t+1 of selecting option i is given by:
Clearly, the choice of e significantly affects this algorithm. A full exploration of ideal values for e in this context is outside the scope of this paper, but based on results in the literature and from our own testing, we will use a fixed value of e = 0 and a simple linear decay from e = 1 to e = 0 for our fixed and adaptive e -greedy strategies, respectively
Boltzmann exploration (Softmax)
In general, Softmax algorithms arise from Luce’s choice axiom [18], which states that the probability of choosing an item i from a pool of j items is given by:
where w is some context-appropriate weighting. Boltzmann Exploration is one such Softmax method that uses a Boltzmann distribution for selection. That is, given rewards at time t of the probability at time t+1 of selecting option i is given by:
Where G is a “temperature” parameter controlling the exploration/ exploitation tradeoff. Again, a full discussion of ideal values for G in this context is outside the scope of this paper, but based on available literature we will use a fixed value of G = 0.05 and a simple linear decay from G = 1 to G = 0.05 for our fixed and adaptive Softmax strategies, respectively.
Upper confidence bounds
Upper Confidence Bounds (UCB) algorithms arose from the Multi-Armed Bandit problem to produce a limit on regret for decision problems of that type [19,20]. These algorithms have proved successful in a number of applications, including, notably, Artificial Intelligence programs for the game of Go.
For these algorithms, one additional variable is needed – the number of times option i has been selected, denoted by ni. Unlike the previous two schedule functions, the basic UCB algorithm, UCB1, does not assign a probability to each option, but rather first chooses each option once, and from then on, given rewards at tim t of always chooses option i that maximizes:
The creators of this algorithm also propose an alternative “tuned” version, UCB1-Tuned [19], which has been found to perform better in practice. This version incorporates the variance of each option
into the calculation of the next choice. UCB1-Tuned, given rewards at time t of
always chooses option i that maximizes:
where
We will use both UCB1 and UCB1-Tuned as separate scheduling functions in our experiments.
Round-robin and random schedules
The round-robin and random schedules are straightforward. The round-robin schedule always chooses the next option in a clockwork manner throughout the algorithm. The random schedule chooses an option (pseudo-) randomly from a discrete uniform distribution.
Benchmark functions
In order to compare the effects of these different scheduling functions, we will use as the optimization problem a set of common optimization benchmark functions. These minimization functions are some of the most common to appear in optimization literature (e.g., [21- 23]), and were chosen to display a variety of function characteristics. Those functions are shown in Table 1, where D is the number of search space dimensions (Table 2).
Name | Function | Range |
---|---|---|
Sphere | ![]() |
![]() |
Rastrigin (generalized) | ![]() |
![]() |
Rosenbrock | ![]() |
![]() |
Griewank (generalized) | ![]() |
![]() |
Schwefel (normalized) | ![]() |
![]() |
Salomon | ![]() |
![]() |
Table 1: Benchmark functions.
Schedule | Average Function | Performance relative |
---|---|---|
Value | to Round-Robin | |
Round-robin | 1.000 | 100 % |
Random | 0.846 | 118 % |
Fixed -greedy | 0.272 | 368 % |
Adaptive -greedy | 0.199 | 502 % |
Fixed Softmax | 1.085 | 92 % |
Adaptive Softmax | 0.883 | 113 % |
UCB1 | 0.840 | 119 % |
UCB1-Tuned | 0.833 | 120 % |
Table 2: Averaged results - 2 dimensions
For all of these functions, the global minimum is. Each of.Each of these functions is defined for an arbitrary number of dimensions, as discussed in the following section.
Experimental design
In this section, we detail the design of our experiments to compare the scheduling functions on the benchmark problems defined above. Of particular consideration are the parameters to be used for the Particle Swarm Optimization algorithm, the number of dimensions used for the benchmark functions, and the methods of comparing the schedule effects.
PSO parameters
In order to isolate the effect of the scheduling functions, as little as possible was changed from the SPSO-2011 implementation. Thus the PSO parameters were left at their default values, which included a swarm size of 40, and parameter values of:
See [1] and [2] for discussion and definition of parameters in this context.
All other details of the SPSO-2011 implementation will be left unchanged aside from the introduction of the schedule functions defined above, and one other small detail – the initial particle positions will be randomly calculated once, then those same starting positions will be used for each schedule and each independent run. This again is to isolate the effect of the schedules by eliminating random starting biases.
Benchmark function dimensions
Each of the benchmark functions we will use is defined for an arbitrary number of dimensions, and it is common in benchmarking optimization algorithms to run an algorithm over a variety of numbers of dimensions in order to see performance characteristics at different levels. We will also adopt this approach and show results for each function at 2, 10, and 50 dimensions.
Other considerations
In the Results section, we will use two different methods of comparing the different scheduling functions. First, we will compare the best-known function values after a specified number of objective function evaluations, as a way of showing a snapshot of the performance of each schedule. Specifically, we will compare the results after 50d + N function evaluations, where d is the number of dimensions of the problem and N is the number of PSO particles, in this case 40 (to account for the evaluations of the initial placements of the particles). In addition, we will show a trace of the best-known function values after each objective function evaluation, as a way of showing the performance of each schedule over time.
Since SPSO-2011 is a stochastic algorithm, for both of these methods we will show the best-known function values averaged over 500 independent runs.
Finally, since several of the scheduling functions defined above require an estimate of current reward, we will use each particle’s current best-known objective function value as that estimate within the schedule. We also conducted experiments using the particle’s current function value as the reward estimate, but the results did not differ significantly.
In presenting our results, we begin with a summary of the overall performance of the different scheduling algorithms, and then break down these results to show more specific details of the schedules, the benchmark functions, and the number of dimensions involved in the experiments.
These results demonstrate the main thesis of the paper – that significant performance gains can be achieved in population-based optimization algorithms through the use of alternative scheduling functions for updating. In particular, the results indicate that massive performance gains are possible at low dimensions, and more modest gains at higher dimensions with the relatively simple scheduling functions tested. With more advanced functions or hybrid approaches, it is very likely that scheduling functions can be defined that provide consistent advantages across a broad spectrum of problems and numbers of dimensions.
Overall summary
Overall, the adaptive -greedy schedule performed the best by far at low dimensions, outperforming the standard round-robin schedule by over five-fold on average on 2-dimensional benchmark problems. On the 10-dimensional benchmark problems, again the adaptive e-greedy schedule performed the best, narrowly outperforming the round-robin schedule, which itself outperformed all others. On the 50-dimensional benchmark problems, most of the scheduling functions performed similarly, with the adaptive Softmax schedule performing the best, and the fixed e -greedy schedule as a low-performing outlier.
Average comparisons for 2 dimensions, 10 dimensions, and 50 dimensions are shown in three Tables 3 and 4, with the columns normalized by the round-robin schedule results (Table 3).
Schedule | Average Function | Performance relative |
---|---|---|
Value | to Round-Robin | |
Round-robin | 1.000 | 100 % |
Random | 0.856 | 117 % |
Fixed ε-greedy | 17.270 | 6 % |
Adaptive ε-greedy | 1.916 | 52 % |
Fixed τ Softmax | 0.969 | 103 % |
Adaptive τ Softmax | 0.857 | 117 % |
UCB1 | 0.958 | 104 % |
UCB1-Tuned | 0.961 | 104 % |
Table 4: Averaged results - 50 dimensions.
Results in 2 dimensions
Both of the e-greedy scheduling functions were by far the best performers for the 2-dimensional benchmark functions. A bar graph of the average function values for each schedule after 100 evaluations is shown in Figure 1. Each function value is normalized by the roundrobin schedule average value, in order to show the relative performance of each alternative scheduling function.
As a more complete view of the best-known function value over time for each schedule, Figure 2 shows a trace of these values for each number of function evaluations on each benchmark function. Note that the PSO swarm for each scheduling function was set to the same randomized starting location, so the initial best-known values for each are identical.
This figure in particular illustrates how well the e-greedy schedules performed. The figure also illustrates how closely aligned the behavior of the UCB1 and UCB1-Tuned algorithms are over these benchmark functions.
Results in 10 dimensions
For the 10-dimensional benchmark functions, again the adaptive e-greedy schedule was the best performer, although in this case the standard round-robin schedule was also one of the best. A bar graph of the average function values for each schedule after 500 evaluations is shown in Figure 3. For the sake of clarity in this figure, high values are clipped for the fixed e -greedy schedule on the Sphere, Rosenbrock’s function, and Griewank’s function. The averaged values for this schedule are listed in Table 4.
Figure 4 shows a trace of the best-known function values for each number of function evaluations on each benchmark function. This figure illustrates that the fixed e-greedy schedule’s performance begins to fall off as the number of dimensions in the optimization function increases. This trend continues to the 50-dimensional case, where the fixed e -greedy scheduling function shows by far the worst performance.
Results in 50 dimensions
For the 50-dimensional benchmark functions, the adaptive-t Softmax schedule performed the best, although the performance of many of the scheduling functions was similar. Although the adaptive e-greedy algorithm was the top performer in the 2-dimensional and 10-dimensional cases, and still outperformed all other schedules on two of the 50-dimensional benchmark functions (Rastrigin’s function and Schwefel’s function), overall it did not perform as well as the Softmax or UCB schedules for the 50-dimensional case.
A bar graph of the average function values for each schedule after 2500 evaluations is shown in Figure 5. Again, for the sake of clarity in this figure, high values are clipped for the fixed e -greedy schedule on the Sphere, Rosenbrock’s function, and Griewank’s function. The averaged values for this schedule are listed in Table 1.
Figures 6 and 7 shows a trace of the best-known function values for each number of function evaluations on each benchmark function.
This figure illustrates how poorly the fixed e-greedy scheduling function performs in high dimensions. This is understandable, since this schedule tends to be highly exploitative, whereas for high-dimensional optimization problems, a good deal of exploration is needed before exploitation yields fruitful results (Table 4).
Within population-based optimization algorithms, the updating of the search-space position of each individual at each iteration of the algorithm is generally assumed. However, this is often not inherently necessary. We have proposed a generalized approach in which scheduling functions are used to define the order of these position updates.
The main conclusion of this work is that there is great potential for improvement of the performance of population-based optimization algorithms through defining these scheduling functions. Even with the relatively simple scheduling functions examined in this paper, significant performance increases were shown in several contexts. Perhaps the most striking example of this was a performance increase of over 500% for the 2-dimensional set of benchmark problems, merely by using a simple adaptive e-greedy schedule instead of the standard round-robin schedule within the Standard Particle Swarm Optimization (SPSO-2011) algorithm [24-31].
We highlight the relative simplicity of these scheduling functions to emphasize the need for further research in this area, particularly to define schedules that provide the most advantage for the types of problems relevant to the contexts of various practitioners. There is much potential for improvement, both for general-purpose population-based algorithms through the use of more advanced scheduling function or hybrid approaches, and for more specific applications through the same type of comparison and analysis shown in this paper, but focused on the particular attributes of the problems and data sets under consideration within those more specific contexts.