Complex problems are entangled with our everyday life, usually without us noticing. We rely on fast-speed internet, where each package traverses several routers around the globe. We use transportation systems and rely on train and airport providers to reliably schedule several thousand journeys a day. We rely on logistic providers to deliver our Amazon order among several thousand others in time, especially before Christmas. These kind of problems are omnipresent in a world with ever-growing complexity. To master this complexity, two kind of techniques have been developed with the goal to generate optimal or near-optimal solutions and thereby support decision-making.

Exact methods use mathematical tools to find the optimal solution (or an upper boundary) for a problem. However, complex problems have the bad habit of an exponentially growing complexity. As an example, the computation of an optimal schedule for serving 40 customers already takes several minutes. By adding 20 customers to the problem we already need to wait several hours. In this respect, finding an optimal schedule for several hundred or thousand customers seems hopeless. This is where heuristics come into play. Whereas exact methods are mathematical elegant yet impractical for problems of bigger size, heuristics can find decent solutions for those problems quickly by setting aside mathematical rigorousness (with matheuristics there is a new field that tries to bring both worlds together). In a sense, a heuristic is comparable to human problem-solving; it tries to solve a problem in different ways; if something works well, it keeps doing it until it does not work anymore (which is mainly because it has reached a local optimum). If it has reached such a stagnant state, a heuristic (and humans as well) try to make some changes. These changes are usually stimulated by employing randomness.

The problem with randomness is that, even though it triggers change, the change is undirected. As an analogy, think of an mountaineer who wants to reach the highest peak in an extremely misty mountain range. Whenever he is on a peak, he is unable to see through the dense mist where he needs to go next to reach the next higher peak (or whether he already reached the highest one). Obviously, the more knowledge he has about the mountain range or his position by maps or guides, the faster he will reach the highest peak. Likewise, heuristics need to be guided by knowledge about the problem to find the best possible direction when performing a change. In academic literature, the importance of problem-specific knowledge in the design of heuristics is often underestimated or downright ignored. One possible explanation is that the problems for which heuristics are developed, are naturally so complex that it is fairly difficult to extract any knowledge. Otherwise, we would not need a heuristic in the first place. Here, with knowledge we mean quantifiable rules that hold for any instance of the problem (i.e. if the current solution looks like A, then change B has in general the best chance for improvement).

However, with the advance of data mining there is a way out of this dilemma. Data mining techniques are powerful when it comes to revealing complex pattern and relations in huge amount of data that are hidden to human perception. Thus, the application of data mining to dig for knowledge about complex problems seems to be a promising step.

We applied this idea for one of the most famous problems, the Vehicle Routing Problem. In a first step, we generated about 10.000 smaller problem instances with randomly-chosen number of customers, routes and locations. For each instance, we computed an optimal, or near-optimal solution O and a non-optimal solution N. Hereby, we conducted several experiments and varied the degree of ‘non-optimality’, i.e. the gap to the optimal solution. We developed several metrics to measure the properties of a solution and the ones of the respective instance (e.g., number of intersections, width of routes, distance between routes,…). Essentially, we thereby generated two data points per instance, one for O and one for N, both composed of the metrics defined. This adds up to 20.000 data points in total. We feed the whole dataset to a classification learner which tries to find differences between Os and Ns with respect to the metrics. If the learner can distinguish between Os and Ns to some degree, then some of our metrics can predict whether a solution is optimal or not. For a gap of 10% between Os and Ns, the leaner could classify them correctly in 91% of the cases. A lower gap makes classification more difficult, but we still observed a prediction accuracy of 72% at a gap of 4%.

Even better, we can extract rules that explain why certain solutions are not optimal. The most significant rules revealed that non-optimal solutions tend to have wider routes with more intersections. In other words, we can improve the solution by reducing the width of the routes and the number of intersections. However, this depends on the characteristics of the considered instance; for some instances it is more likely to improve the solution by increasing the distance between routes or increasing the depth of routes. We summarized these findings in a rule table, in which we list for which type of instances, which type of improvements are most promising. To show the practical benefits of this generated knowledge, we used the rules to guide random changes in a local search heuristic, i.e. we would perform changes that conform with one or more of the rules with a higher probability. With little implementation effort we thereby could improve its performance significantly.

You can find more information about these findings in the presentation given by Florian Arnold and Kenneth Sörensen at the last EU/MEeting (Antwerp, March 2016).

- F. Arnold, "Generating problem-specific knowledge (for the vehicle routing problem)," in EU/MEeeting on design and analysis of metaheuristics, Antwerp, Belgium, 2016.

[Bibtex]`@inproceedings{arnold2016generating_pres, author = {Arnold, Florian}, title = {Generating problem-specific knowledge (for the vehicle routing problem)}, booktitle = {{EU/ME}eeting on Design and Analysis of Metaheuristics}, year = {2016}, address = {Antwerp, Belgium}, keywords = {vehicle routing problem}, }`

In summary, first experiments on the use of data-mining to extract problem-specific knowledge seems a worthwhile direction of research. Not only does it generate a quantifiable set of rules which provides guidance about how to improve a certain solution, but it does also provide universal insights which can be used in any heuristics for this particular problem.