Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
- All Comments (0)
- Notes on Slide 1
- How do you put an event into context? And, where is the next disease is going to emerge from... that is the holly grail in this business... Dead crows on the streets of NYCPepto-bismol disappearing from the shelves of grocery storesPhone calls from citizens and the media to the health department about increased absenteeism from schools and businessesIncreased Internet search hits on certain terms per weekImage Source:Dead Crow: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Empty Shelves: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295...
- This is how it is today… We are reactive, and often react late…
- But, can we do better? Can we reduce the response cycle in order to have better and effective control of events?This is important because of the burden of illness on the entire community. The Milwaukee Outbreak in 1993, almost ½ the population was affected and with nearly 1/2 million lost days at work it costs businesses tens of millions of dollar during 3 weeks. Luckily, no one had to drink Pepto Bismol for the rest of their life … but was it possible to prevent this event from becoming SO big?Our emphasis is on shifting towards expanding beyond early detection and into covering the entire evolution of an event – from very early indications of an event, to classification and characterization, to detection, and finally insuring an effective response… and for this to be based on evidence, not just intuitionBasically, we need to \'change\' the way we look for diseases… For instance, diseases often emerge as the population grows, as we move into new areas, as we introduce wild life stock exchanges, as we introduce new drugs, etc. So we have to look for elements and touch points in order to make better sense of the information and better predict where the next hot spot is so before it becomes the next endemic or pandemic…Sidebar: Over the last 60 years there has been a significant rise in # diseases every decade…20% of which are due to drug resistance microbeszoonotic diseases (HIV/SARS) start from very bizarre origins in the wild life (SARS; bats human civets!!!)
- It is not necessarily lack of information… we have a lot of information… rather, can we put the information into intelligence (or context) in a timely manner?Multiple streams include the following- say something about why you need to stitch multiple sources together...Sidebar: 5/50 rule, in 5 years time, 50% of all content will be user-generated: (Reference: The Podshow by Ron Bloom (http://www.ronbloom.com/?p=11) 60% content has geo-spatial and temporal aspects… Image Sources:Wikipedia: http://www.citris-uc.org/system/files/imce-u10/Wikipedia-... Blogger: http://z.about.com/d/weblogs/1/5/V/-/-/-/BloggerHomePage.PNG OpenMRS: http://ruddzw.files.wordpress.com/2007/05/openmrs_osx.png Remote Sensing: http://www.medscape.com/content/2000/00/41/47/414717/art-... Cell phone/iPhone; http://healthinformaticsblog.files.wordpress.com/2008/03/... http://gmapsmania.googlepages.com/whosickgmm.JPG
- Proportion of infection detected…Control confounding effects by:Including more than the demand side (Internet search query) but also the supply side (e.g., information on news websites)Link to Healthmap.org or GPHINIncluding longitudinal data on health information supplyIncluding accurate geographic distribution Infodemiology:Develop methodology and real-time measures (indices) to understand patterns and trends for general health informationUnderstand the predictive value of what the community of practice is looking for (demand) for early detection of emerging diseases, infectious disease outbreaks, or bioterrorismIdentify and quantify gaps between information supply and demandDiscover and validate predictive metricsCould an X number (threshold) of Internet search hits on fever per week trigger a flu-outbreak?7
- Timeliness…We could potentially observe the progression of a disease outbreak within a population at multiple touch points (data)Some of these data may be collected before visits to the physician or hospital have actually happenedPatients might search the Internet on symptoms they’re experiencingPatients might adjust their diet when they feel ill (such as drinking more water, juice, and have more rest)If the symptoms become more severe, patients might seek over-the counter (OTC) medicine, and miss classes or workIn many cases, patients might go to work late or leave for home earlyPatients might also experience subtle change of their behavior at workWhen the symptoms continue, patients might seek help from physicians (e.g., schedule appointments, present with chief complaints, lab tests ordered, medicines prescribed)Similar models can also be established for pollution, non-infectious diseases, chronic diseases, injury, and natural disasters
- This is an old idea…Crows recognized for divination in Roman times: A crucial component of the US West Nile Virus control program…However, in current systems…Much less has been towards interaction with humans (responders and domain experts)… and not just for the purpose of early detection, but also working together…They often provide contradictory interpretations of ongoing events and not enough evidence to issue a responseFurthermore,We are more vulnerable to those threats we know about, but have not faced on a major scale: imagine the regular flu season is twice as bad as it was last year, we need to prepare before hand in order to make effective use of our already limited resources to insure an effective responseWe are even more vulnerable to those threats that we don’t know about: such as emerging infectious diseases or large release of aerosol agent in an act of bioterrorism
- To recap,The human experts interacting with automated systemsThe collaborative decision making environmentI am sure one day soon we will have an EID impact assessment... just like there is an environmental impact assessment…Thank you VERY much for your time today…
-
kapil favorited this 1 month ago
Slideshow Transcript
- Slide 1: MACHINE LEARNING AND DISEASE SURVEILLANCE Taha Kass-Hout, MD, MS Nicolás di Tada October 2008
- Slide 2: Image source: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Image source: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295.jpg?v=0
- Slide 3: LATE DETECTION – RESPONSE Opportunity for control CASES DAY
- Slide 4: EARLY DETECTION AND RESPONSE Opportunity for control CASES DAY
- Slide 5: INFORMATION SOURCES Event-based – ad-hoc unstructured reports issued by formal or informal sources Indicator-based – (number of cases, rates, proportion of strains…)
- Slide 6: PUBLIC HEALTH MEASURES Representativeness Completeness Predictive Value Timeliness
- Slide 7: PUBLIC HEALTH MEASURES Specificity / 50 Malaria Reliability notifications (5%) Urge frequent reporting: Weekly daily immediately 1000 Malaria infections (100%) Get as close to the bottom of the pyramid as possible Sensitivity / Timeliness • Main attributes o Representativeness o Completeness o Predictive value positive
- Slide 8: PUBLIC HEALTH MEASURES Health care hotline • Main attributes o Timeliness Signal as early as possible Analyze and interpret Time Automated analysis/ thresholds
- Slide 9: THE PROBLEM SPACE Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans We have systems in place for those threats we have been faced with before
- Slide 10: PUBLIC HEALTH – TWO PERSPECTIVES Case management Individual cases of notifiable diseases Relationship networks (contact tracing) Population surveillance Larger risk patterns
- Slide 11: CASE MANAGEMENT Questions/problems: Is a case due to recent transmission? If so, does the case share any feature with other, recent cases? Ways it's being done: Investigations/interviews Meeting with other investigators
- Slide 12: POPULATION SURVEILLANCE Questions/problems: Are more cases happening than expected? Does an excess suggest ongoing transmission in a specific region? Way it's being done: Semi-automated routine temporal and space-time statistical analysis
- Slide 13: WHY LOCATION MATTERS – CASE MANAGEMENT If you are studying a case of a certain disease that was just declared It is harder to picture the situation by looking at something as this..
- Slide 14: WHY LOCATION MATTERS – CASE MANAGEMENT
- Slide 15: WHY LOCATION MATTERS – CASE MANAGEMENT Than by looking at this..
- Slide 16: WHY LOCATION MATTERS – CASE MANAGEMENT
- Slide 17: WHY LOCATION MATTERS – POP SURVEILLANCE If you are studying the spatial distribution of a set of disease clusters This would seem more difficult..
- Slide 18: WHY LOCATION MATTERS – POP SURVEILLANCE
- Slide 19: WHY LOCATION MATTERS – POP SURVEILLANCE Than this..
- Slide 20: WHY LOCATION MATTERS – POP SURVEILLANCE
- Slide 21: MODERN DISEASE SURVEILLANCE In the past two decades, much disease surveillance research has focused on developing analytical methods for automatically detecting anomalous patterns in data Modern methods can achieve timely detection of anomalies by incorporating temporal, spatial, and multivariate information
- Slide 22: MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm Too many “What are we supposed alerts to do with this?”
- Slide 23: MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop
- Slide 24: ADVANTAGES OF MACHINE LEARNING P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33%
- Slide 25: MACHINE LEARNING TECHNIQUES Classifiers Clustering Bayesian Statistics Neural Networks Genetic Algorithms
- Slide 26: HOW TO REPRESENT A DOCUMENT? “I had a flu last month. […] I had flu a flu early this week.” fever “This morning I woke up with fever, I might have a flu.”
- Slide 27: CLASSIFIERS – PROBLEM DEFINITION Map items to vectors (Feature extraction) Normalize those vectors Train the classifier Measure the results with new information Feedback the classifier Separate classes in feature space
- Slide 28: CLASSIFIERS - SVM
- Slide 29: SVM – MARGIN MAXIMIZATION Support vectors define the separator
- Slide 30: SVM – NON LINEAR? Map to higher-dimension space Φ: x → φ(x)
- Slide 31: SVM – FILTERING OR CLASSIFYING Training Training Document Document Positives Document 1 Document 2 Classifier Document 3 Negatives
- Slide 32: CLUSTERING – PROBLEM DEFINITION Map items to vectors (Feature extraction) Normalization Agglomerative and Partitional
- Slide 33: CLUSTERING - AGGLOMERATIVE
- Slide 34: CLUSTERING - PARTITIONAL
- Slide 35: BAYESIAN STATISTICS Probability of Probability of flu Probability of fever once flu is disease A (flu) once (prior or confirmed marginal) symptoms B (fever) are observed P(B | A).P(A) P(A | B) = P(B) Probability of fever (prior or marginal)
- Slide 36: NEURAL NETWORKS Given a set of stimulus, train a system to produce a given output
- Slide 37: NEURAL NETWORKS - STRUCTURE Weight {I0,I1,……In} Input Layer […] I H n = ¥ (Ii .w in ) i= 0 ᅠ Hidden Layer […] Output Layer {O0,O1,……On}
- Slide 38: NEURAL NETWORK - APPLICATION Event?
- Slide 39: GENETIC ALGORITHM - BASICS Define the model that you want to optimize Create the fitness function Evolve the gene pool testing against the fitness function. Select the best individual
- Slide 40: GENETIC ALGORITHM – MODEL Model the transmission process using a set of parameters: Onset time between an infection and illness Latency period Incubation period Symptomatic period Infectious period (Onset, Latency, Incubation, Symptomatic , Infectious) ( 2 days, 3 days, 1 day, 4 days, 3 days)
- Slide 41: GENETIC ALGORITHM – MODEL FITNESS Fitness = 1/Area
- Slide 42: GENETIC ALGORITHM – PROCESS 1. Create an initial population of candidates 2. Use operators to generate new candidates (mating and mutation) 3. Discard worst individuals or select best individuals in generation 4. Repeat from 2 until you find a candidate that satisfies the solution searched
- Slide 43: GENETIC ALGORITHM - PROCESS (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (4,3,6,2,5) (4,5,6,3,5) (3,4,5,2,6) (3,4,4,6,2) (5,3,2,6,5) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6) (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4) (5,3,2,6,5) (3,4,4,6,2)
- Slide 44: RESULTS – IMPROVED SURVEILLANCE
- Slide 45: Q&A
- Slide 46: THANK YOU! Taha Kass-Hout, MD, MS http://www.instedd.org kasshout@instedd.org http://taha.instedd.org Nicolás di Tada http://www.manas.com.ar nditada@manas.com.ar http://weblogs.manas.com.ar/ndt/
- Slide 47: BACKUP SLIDES
- Slide 48: REFERENCES Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007 EpiNorth-Based material (http://www.epinorth.org): Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006 Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006 Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006 Daniel Neil, Incorporating Learning into Disease Surveillance Systems
- Slide 49: REFERENCES Algorithms Complex Event Processing Over Uncertain Data in Wasserkrug (2008) Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007) Approaches to the evaluation of outbreak detection methods in Watkins (2006) Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004) Data mining in bioinformatics using Weka in Frank (2004)
- Slide 50: REFERENCES Automating Laboratory Reporting Automatic Electronic Laboratory-Based Reporting in Panackal (2002) Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007) Using EMR Data for Disease Surveillance Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007) Electronic Medical Record Support for PH in Klompas (2007) A knowledgebase to support notifiable disease surveillance in Doyle (2005) Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007) Misc Readings Breakthrough in modeling emerging disease hotspots in Jones (2008) Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)
- Slide 51: RELATED PROJECTS InSTEDD RNA (or Event Evolution): Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html ) ALPACA \"ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application.\" http://2008.hfoss.org/ALPACA Surveillance Project An Open Source R-package disease surveillance framework for \"...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data.\" http://surveillance.r-forge.r-project.org/ Weka An open source \"...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.\" http://www.cs.waikato.ac.nz/~ml/weka/

