Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
-
Added to the group Estimation of distribution algorithms by pelikan
-
pelikan favorited this 2 years ago
Slideshow Transcript
- Slide 1: A Billion Bits or Bust David E. Goldberg1, Kumara Sastry1,2, Xavier Llorà1,3 Illinois Genetic Algorithms Laboratory (IlliGAL) 1 Materials Computation Center (MCC) 2 National Center for Super Computing Applications (NCSA) 3 University of Illinois at Urbana-Champaign Urbana, IL 61801 USA deg@uiuc.edu, ksastry@uiuc.edu, xllora@uiuc.edu http://www-illigal.ge.uiuc.edu Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.
- Slide 2: Billion-Bit Optimization? Strides w/ genetic algorithm (GA) theory/practice in 1990s. Solving large, hard problems in principled way. Moving to practice in important problem domains. Still GA boobirds claim (1) no theory, (2) too slow, and (3) just voodoo. How demonstrate results achieved so far in dramatic way? Billion bits or bust whitepaper (10/23/05). DEG lunch questions: A million? Sure. A billion? Maybe. Naïve GA implementation for a billion bits: ~100 terabytes memory for population storage. ~272 random number calls. Naïve approach goes nowhere. 2
- Slide 3: Roadmap What is a genetic algorithm? 1980s: Trouble in River City. 1990s: IlliGAL solves robustness problems. 2000s: Toward billion-variable optimization Why this matters in practice: Example, multiscale materials and chemistry modeling Challenges to using this in industry. Efficiency enhancement in 4-part harmony. Supermultiplicative speedups by mining evolved data. 3
- Slide 4: What is a Genetic Algorithm (GA)? Search procedure based on mechanics of natural selection and genetics. Representation: GAs operate on codes: Binary or Gray Permutation Integer or real Program Fitness function: Objective function, subjective function, ecological co-evolution Population: Candidate solutions (individuals) Genetic operators: Selection: “Survival of the fittest” Recombination: Combine parental traits to create offspring Mutation: Modify an offspring slightly 4
- Slide 5: 1980s: Trouble in River City 1980s: Promise of robustness not realized. First generation performance uncertain: Sometimes effective, sometimes not. Sometimes fast, sometimes slow. How can we make GAs scalable? Competence. How can we make them practical? Efficiency 5
- Slide 6: 1990s: Competent and Efficient GAs 90s: IlliGAL solves robustness puzzle. Competence: Solve hard problems quickly, reliably, and accurately (Intractable to tractable). Efficiency: Develop speedup procedures (tractability to practicality). Principled design: [Goldberg, 2002] Relax rigor, emphasize scalability/quality. Use problem decomposition. Use facetwise models, and patchquilt integration using dimensional analysis. 6
- Slide 7: Competent GAs Then: 1993, GA Kitty Hawk Early 90s the facetwise theory came together. Early 90s first competent GAs appeared. First competent GA in 1993: fast messy GA (fmGA). Original mGA complexity estimated: O(l5) [Goldberg, Deb, Kargupta, & Harik, 1993] fmGA subquadratic on deceptive problem. 7
- Slide 8: Competent GAs Now: hBOA Replace genetics with probabilistic model building: PMBGA or EDA. 3 main elements: Decomposition (structural learning) Learn what to mix and what to keep intact Representation of BBs (chunking) Means of representing alternative solutions Diversification of BBs (niching) Preserve alternative chunks of solutions Combines selectionist approach of GAs, statistical methods, and machine learning. [US utility patent # 7,047,169] 8
- Slide 9: Hierarchical BOA (hBOA) Bayesian New Current network population Selection population E p rim n xe et 6 O 1 3 lo (n (n .6 g )) 10 u b r f v lu tio s N me o E a a n 5 10 Facetwise model 4 10 27 81 23 4 79 2 9 P b mS e ro le iz
- Slide 10: hBOA on Hard Antenna Designs [Yu, Santarelli, & Goldberg, 2006] 10
- Slide 11: Aiming for a Billion Theory & algorithms in place. Needed the guts to try. Focus on key theory, implementation, & efficiency enhancements. Theory keys: Problem difficulty. Parallelism. Implementation key: compact GA. Efficiency keys: Various speedup. Memory savings. Results on a billion-variable noisy OneMax. 11
- Slide 12: Theory Key 1: Master-Slave Linear Speedup Speed-up: Max speed-up at Near linear speed-up until [Cantu-Paz & Goldberg, 1997; Cantú-Paz, 2000] 12
- Slide 13: Theory Key 2: Noise Covers Most Problems Adversarial problem design [Goldberg, 2002] Fluctuating R P Noise Deception Scaling Blind noisy OneMax 13
- Slide 14: Implementation Key: Compact GA Simplest probabilistic model building GA [Harik, Lobo & Goldberg, 1997; Baluja, 1994; Mühlenbein & Paaß, 1996] Represent population by probability vector Probability that ith bit is 1 Replace recombination with probabilistic sampling Selectionist scheme New population evolution through probability updates Equivalent to GA with steady-state tournament selection and uniform crossover 14
- Slide 15: Compact Genetic Algorithm (cGA) Random initialization: Set probabilities to 0.5 Model Sampling: Generate two candidate solutions by sampling the probability vector Evaluation: Evaluate the fitness of two sampled solutions Selection: Select the best among the sampled solutions Probabilistic model update: Increase the proportion of winning alleles by 1/n 15
- Slide 16: Parallel cGA Architecture Processor #1 Processor #2 Processor #np Sample bits 1- /np Sample bits 1- /np Sample bits 1- /np Collect partial sampled solutions and combine Parallel fitness evaluation of sampled solutions Broadcast fitness values of sampled solutions Select best individual Select best individual Select best individual Update probabilities Update probabilities Update probabilities 16
- Slide 17: cGA is Memory Efficient: Θ() vs. Θ(1.5) Simple GA: Compact GA: Frequencies instead of probabilities (4 bytes) Parallelization reduces memory per processor by factor of np Orders of magnitude memory savings via efficient GA Example: ~32 MB per processor on a modest 128 processors for billion-bit optimization 17
- Slide 18: Vectorization Yields Speedup of 4 Vectorize costly code segments with AltiVec/SSE2 Generate 4 random numbers at a time Sample 4 bits at a time Update 4 probabilities at a time SIMD instruction set allows vector operations on 128-bit registers Equivalent to 4 processors per processor 18
- Slide 19: Other Efficiencies Yield Speedup of 15 Bitwise operations Limited floating-point operations Inline functions Avoid using mod and division operations Precomputing bit sums and indexing Parallel, vectorized, and efficient GA: Memory scales as (/np); Speedup scales as 60np ~32 MB memory, and ~104 speedup with 128 processors 19
- Slide 20: GA Population Sizing Noise-to-fitness variance ratio Error tolerance # Components (# BBs) Signal-to-Noise ratio # Competing sub-components Additive Gaussian noise with variance 2N Population sizing scales: O(m0.5 log m) – O(m log m) [Harik, et al, 1997] 20
- Slide 21: GA Convergence Time Problem size (m·k ) Selection Intensity Convergence time scales: O(m0.5) – O(m) GA scales as: O(m log m) – O(m2 log m) [Miller & Goldberg, 1995; Goldberg, 2002; Sastry & Goldberg, 2002] 21
- Slide 22: GA Solves Billion-Variable Optimization Problem GA scales (¢log¢(1+2N/2f)) Solved 33 million (225) bit problem to optimality. Solved 1.1 billion (230) bit problem with relaxed, but guaranteed convergence 22
- Slide 23: Do Problems Like This Matter? Yes, for three reasons: Many GAs no more sophisticated than cGA. Inclusion of noise was important because it covers all difficulty (except deception). Know how to handle deception and other problems through PMBGAs like hBOA. Have experience solving tough problems with ordinary genetic and evolutionary algorithms: Material science. Chemistry. Complex versions of these kinds of problems need billion-bit optimization. 23
- Slide 24: Multiscale Nanomaterials Modeling Accuracy of modeling depends on accurate representation of potential energy surface (PES) Both values and shape matter Ab initio methods: Accurate, but slow (hours-days) Compute PES from scratch Faster methods: Fast (seconds-minutes), accuracy depends on PES accuracy Need direct/indirect knowledge of PES Known and unknown potential function/method Multiscaling quantum chemistry simulations [Sastry et al, 2006] Multi-timescaling alloy kinetics [Sastry et al, 2004; Sastry et al, 2005] 24
- Slide 25: Multi-timescale Modeling of Alloys Molecular dynamics (MD): (~10–9 secs) many realistic processes are inaccessible. Kinetic Monte Carlo (KMC): (~secs) need all diffusion barriers a priori. (God or compute) Real time Table Efficient Coupling of MD and KMC Lookup Symbolically KMC Regressed KMC Use MD to get some diffusion barriers. (sr-KMC) Use KMC to span time. On the fly KMC Use GP to regress all barriers from some barrier info. Complexity Span 10–15 seconds to seconds (15 orders of magnitude) chosen by the AIP editors as focused article of frontier research in Virtual Journal of Nanoscale Science & Technology, 12(9), 2005 25
- Slide 26: GP-Enhanced Kinetics Modeling Total 2nd n.n. Active configurations: 8192 Dramatic scaling over MD (109 at 300 K) 102 decrease in CPU time for calculating barriers 103-106 less CPU time than on-they-fly KMC E calculated: » 3% (256) configurations Low-energy events: <0.1% prediction error Overall events: <1% prediction error 26
- Slide 27: Chemistry: GA-Enhanced Reaction Dynamics Ab Initio Tune Semiempirical Quantum Chemistry Semiempirical Methods Methods Parameters Accurate but slow (hours-days) Fast (secs.-mins.), accuracy Can calculate excited states depends on parameters. Calculate integrals from fit parameters. Accurate excited-state surfaces with semiempirical methods Permits dynamics of larger systems: proteins, nanotubes, etc., Energy & shape of the PES matter Uknown PES functional form: Multi-timescaling alloy kinetics [ Sastry et al, 2004; Sastry et al, 2005] [Best paper and Silver “Humies” award. GECCO (ACM SIG conference)] 27
- Slide 28: MOGA Finds Physical and Accurate PES 102-105 increase in simulation time 10-103 times faster than current methods Produces transferable SE parameters Each point is a set of 11 Interpretable parameters semiempirical methods MOGA results have significantly lower errors than current results. Globally accurate PES yielding physical reaction dynamics 28
- Slide 29: Do You Make a Million/Billion Decisions? Materials and chemistry just examples: Increased complexity increases appetite for large optimization. Modern design increasingly complex. The Os all have many decisions to make: Nano Bio Info Generally systems increasingly complex: ~105 in a modern automobile ~107 parts in commercial jetliner. Will be driven toward routine million/billion variable problems. We get the warhead and then hold the world ransom for... 1 MILLION dollars! 29
- Slide 30: Challenges to Routine Billion-Bit Optimization What if you have large nonlinear solver (PDE, ODE, FEM, KMC, MD, whatever)? Need efficiency enhancement: Parallel Time continuation Hybridization Evaluation Relaxation Take 100-node cluster, and 26% speed increase of other effects: Combined effect is multiplicative: 100*1.26*1.26*1.26 ≈ 200 . Good, but not good enough. 30
- Slide 31: Data-Mined Evaluation Relaxation as Key Steps: 1. Collect evolutionary stream of data: Solution sets and function values. 2. Build structural model of key modules & relationships (e.g., hBOA). 3. Build fitness surrogate using structural model. 4. Substitute surrogate evaluations in place of expensive evaluations. Need sample small fraction of expensive evaluations. Sounds too good to be true: something for nothing? Not really, analog to human cognition. 31
- Slide 32: Supermultiplicative Speedups Synergistic integration of probabilistic models and efficiency enhancement techniques Evaluation relaxation Learn structural model Induce surrogate fitness from structural model Estimate coefficients using standard methods. Only 1-15% individuals need evaluation Speed-Up: 30–53 [Pelikan & Sastry, 2004; Sastry, Pelikan & Goldberg, 2004 32
- Slide 33: Summary What is a genetic algorithm? 1980s: Unrealized promise of robustness. 1990s: Increasing competence (scalability) & efficiency. 2000s: Toward billion-variable optimization. Why this matters in practice: Multiscale materials and chemistry modeling as two examples. Challenges to routine billion-bit optimization. Efficiency enhancement in 4-part harmony. Supermultiplicative speedups by mining evolved data & fitness surrogates. 33
- Slide 34: Conclusions Big hardware was a new frontier 20 years ago. Big optimization is a frontier today: Take extant cluster computing. Mix in robustness lessons of the 90s. Efficiency enhancement of the 2000s. Integrate into problems. This technology can create competitive advantage for industries visionary enough to grab hold: In the O’s (bio, nano, info) Large-scale complex systems. Find out more by engaging UIUC and NCSA work, today. 34
- Slide 35: More Information Billion-bit article in Complexity ( http://www3.interscience.wiley.com/cgi-bin/abstract/114068026/ABSTRACT?CRETRY=1&SRETRY=0 ) and tech report ( ftp://ftp-illigal.ge.uiuc.edu/pub/papers/IlliGALs/2007007.pdf ). Illinois Genetic Algorithms Laboratory: http://www-illigal.ge.uiuc.edu/. The Design of Innovation: Lessons from and for Competent Genetic Algorithms (Kluwer, 2002). Speakers: deg@uiuc.edu, ksastry@uiuc.edu. http://www.amazon.com/exec/obidos/tg/detail/-/1402070985/ref=pd_sl_aw_alx-jeb-9-1_book_5065571_2 35

