Dušan Teodorovic. Swarm Intelligence systems for transportation engineering: principles and applications. Transportation Research, 2008.
This is one of the most comprehensive articles I have found that attempts to connect swarm intelligence heuristics to transportation systems. Most of the article is dedicated to introducing four multi-agent systems that leverage information sharing models to optimize search techniques, and I will recap those here.
Ant Colony Optimization (ACO)
Ants leave pheromone trails, and an ant will use the strength of the signal to weight their choice of path, as well-trod paths traveled by ants heading to a food source have a stronger pheromone signal. Computational applications of this general approach can be observed in the ant system, ant colony system, and the max-min ant system. The ant system has been used to solve the traveling salesman problem (TSP), and the article includes details on the algorithm. The author also shares his research on a model that blends ACO with fuzzy logic, which he calls the Fuzzy Ant System (FAS). This approach takes into account gradations like visibility and pheromone intensity. FAS seems to give you more knobs to turn to sensitize your model. The author goes on to spell out applications of ACO in transportation.
Particle Swarm Optimization (PSO)
All the birds ("particles") start out flying randomly in search of food. They keep track of the best fitness value they have achieved thus far (pbest), while also memorizing the best fitness value of any other particle (gbest). In each moment, the particles adjust their flying to take into account pbest and gbest. Two promising transportation examples are given - PSO in highway incident detection (Srinivasan et al. 2003) and an application in a vehicle routing problem with time windows (Zhu et al. 2006).
Bee Colony Optimization (BCO)
"Bees incrementally add solution components to the current partial solution and communicate directly to generate feasible solutions". The artificial bees in this model either fly forward (exploration) or backward (back to the hive to participate in the decision-making process). They then go back out to reinforce viable partial paths.
Stochastic Diffusion Search (SDS)
I'm not sure why the author included this heuristic, since they didn't present the algorithm or any transportation applications. Maybe four seemed like a better number than three?
a poem is a secret the body knows
and tries to share through words
How it works (in a nutshell)
This ensemble method constructs decision trees during the training phase (searching over a random subset of available decisions) and outputs the mode or mean prediction of these individual trees. In this way, Random Forests build up a prediction by leveraging output from a suite of "weak" learners. The final estimate is gleaned from multiple models trained on random samples of the data, thereby reducing the risk of overfitting.
- Bioinformatics, including "classifying different types of samples using gene expression of microarrays data, identifying disease associated genes from genome wide association studies, recognizing the important elements in protein sequences, or identifying protein-protein interactions."
- Breast cancer prediction 
- "Forests of trees splitting with oblique hyperplanes, if randomly restricted to be sensitive to only selected feature dimensions, can gain accuracy as they grow without suffering from overtraining" 
- Reduces overfitting by training each tree independently, using a random sample of the data. 
- While fast to train, Random Forests can be slow to generate predictions.
- Will not perform well if we must extrapolate beyond the range of what is given by the dependent or independent variables.
- The produced results can be difficult to interpret.
- Performs less well than other algorithms (i.e. regressions) when the relationship between independent and dependent variables is linear.
 Random Forests and Boosting
 When is a random forest a poor choice relative to other algorithms?
 Random Forests for bioinformatics
 Prediction of Breast Cancer using Random Forest, Support Vector Machines and Naïve Bayes
How it works (in a nutshell)
One of a family of ensemble methods that combine predictions from a series of models to create a single robust estimator, boosted trees all under the larger umbrella of gradient descent algorithms which start off with naive predictions, and iterate by fitting to residuals between the target and the last prediction. The goal of the method is to minimize the average value of the loss function by honing in on a function from the weighted sums from a class of "weak" learners. Trees are built one at a time, where each new tree helps to correct errors made by previously trained tree .
History of boosting 
Adaptive Boosting (AdaBoost) was the first boosting algorithm to gain some notoriety. "AdaBoost works by weighting the observations, putting more weight on difficult to classify instances and less on those already handled well. New weak learners are added sequentially that focus their training on the more difficult patterns."
AdaBoost and its neighboring algorithms were recast as ARCing algorithms (Adaptive Reweighting and Combining), before being developed to Gradient Boosting Machines. The statistical objective of these machines is to minimize the loss of the model by adding weak learners using a gradient descent like procedure. This method is known as a stage-wise additive model because "one new weak learner is added at a time and existing weak learners in the model are frozen and left unchanged."
- Can deliver better accuracy with fewer trees than Random Forests
- Training can be time-consuming, as trees are built one at a time
- Are prone to overfitting
 A Gentle Introduction to the Gradient Boosting Algorithm
 When would you use Random Forests over Gradient Boosted Machines?
How it works (in a nutshell)
A family of probabilistic classifiers in which the features are assumed to be conditionally-independent (their joint distribution equals the product of their marginals). What's gaussian about it? GNB uses gaussian distributions to model features with continuous data.
- Spam filtering
- Face recognition
- Document classification (i.e. machine learning research, poetry, anthropology)
- Can perform well in small datasets
- Can be used when we know the relative ordering of the probabilities in question, but perhaps not the probabilities themselves
- Simple complexity O(classes X features)
- The conditionally independent assumption may mask patterns associated with co-varying features.