Quick Upload

Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
 
Post to Twitter Post to Twitter
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons
« Prev Comments 0 - 0 of 0 Next »
    Add a comment If you have a SlideShare account, login to comment; otherwise comment as a guest.
    • How do you put an event into context? And, where is the next disease is going to emerge from... that is the holly grail in this business... Dead crows on the streets of NYCPepto-bismol disappearing from the shelves of grocery storesPhone calls from citizens and the media to the health department about increased absenteeism from schools and businessesIncreased Internet search hits on certain terms per weekImage Source:Dead Crow: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Empty Shelves: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295...
    • This is how it is today… We are reactive, and often react late…
    • But, can we do better? Can we reduce the response cycle in order to have better and effective control of events?This is important because of the burden of illness on the entire community. The Milwaukee Outbreak in 1993, almost ½ the population was affected and with nearly 1/2 million lost days at work it costs businesses tens of millions of dollar during 3 weeks. Luckily, no one had to drink Pepto Bismol for the rest of their life  … but was it possible to prevent this event from becoming SO big?Our emphasis is on shifting towards expanding beyond early detection and into covering the entire evolution of an event – from very early indications of an event, to classification and characterization, to detection, and finally insuring an effective response… and for this to be based on evidence, not just intuitionBasically, we need to \'change\' the way we look for diseases… For instance, diseases often emerge as the population grows, as we move into new areas, as we introduce wild life stock exchanges, as we introduce new drugs, etc. So we have to look for elements and touch points in order to make better sense of the information and better predict where the next hot spot is so before it becomes the next endemic or pandemic…Sidebar: Over the last 60 years there has been a significant rise in # diseases every decade…20% of which are due to drug resistance microbeszoonotic diseases (HIV/SARS) start from very bizarre origins in the wild life (SARS; bats  human  civets!!!)
    • It is not necessarily lack of information… we have a lot of information… rather, can we put the information into intelligence (or context) in a timely manner?Multiple streams include the following- say something about why you need to stitch multiple sources together...Sidebar: 5/50 rule, in 5 years time, 50% of all content will be user-generated: (Reference: The Podshow by Ron Bloom (http://www.ronbloom.com/?p=11) 60% content has geo-spatial and temporal aspects… Image Sources:Wikipedia: http://www.citris-uc.org/system/files/imce-u10/Wikipedia-... Blogger: http://z.about.com/d/weblogs/1/5/V/-/-/-/BloggerHomePage.PNG OpenMRS: http://ruddzw.files.wordpress.com/2007/05/openmrs_osx.png Remote Sensing: http://www.medscape.com/content/2000/00/41/47/414717/art-... Cell phone/iPhone; http://healthinformaticsblog.files.wordpress.com/2008/03/... http://gmapsmania.googlepages.com/whosickgmm.JPG
    • Proportion of infection detected…Control confounding effects by:Including more than the demand side (Internet search query) but also the supply side (e.g., information on news websites)Link to Healthmap.org or GPHINIncluding longitudinal data on health information supplyIncluding accurate geographic distribution Infodemiology:Develop methodology and real-time measures (indices) to understand patterns and trends for general health informationUnderstand the predictive value of what the community of practice is looking for (demand) for early detection of emerging diseases, infectious disease outbreaks, or bioterrorismIdentify and quantify gaps between information supply and demandDiscover and validate predictive metricsCould an X number (threshold) of Internet search hits on fever per week trigger a flu-outbreak?7
    • Timeliness…We could potentially observe the progression of a disease outbreak within a population at multiple touch points (data)Some of these data may be collected before visits to the physician or hospital have actually happenedPatients might search the Internet on symptoms they’re experiencingPatients might adjust their diet when they feel ill (such as drinking more water, juice, and have more rest)If the symptoms become more severe, patients might seek over-the counter (OTC) medicine, and miss classes or workIn many cases, patients might go to work late or leave for home earlyPatients might also experience subtle change of their behavior at workWhen the symptoms continue, patients might seek help from physicians (e.g., schedule appointments, present with chief complaints, lab tests ordered, medicines prescribed)Similar models can also be established for pollution, non-infectious diseases, chronic diseases, injury, and natural disasters
    • This is an old idea…Crows recognized for divination in Roman times: A crucial component of the US West Nile Virus control program…However, in current systems…Much less has been towards interaction with humans (responders and domain experts)… and not just for the purpose of early detection, but also working together…They often provide contradictory interpretations of ongoing events and not enough evidence to issue a responseFurthermore,We are more vulnerable to those threats we know about, but have not faced on a major scale: imagine the regular flu season is twice as bad as it was last year, we need to prepare before hand in order to make effective use of our already limited resources to insure an effective responseWe are even more vulnerable to those threats that we don’t know about: such as emerging infectious diseases or large release of aerosol agent in an act of bioterrorism
    • To recap,The human experts interacting with automated systemsThe collaborative decision making environmentI am sure one day soon we will have an EID impact assessment... just like there is an environmental impact assessment…Thank you VERY much for your time today…
    • angelamaiers
      kapil favorited this 1 month ago
    SlideShare is now available on LinkedIn. Add it to your LinkedIn profile.

    Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada

    From kasshout, 1 month ago Add as contact

    The majority of the designs, analyses and evaluations of early detection (or biosurveillance) systems have been geared towards specific data sources and detection algorithms. Much less effort has been focused on how these systems will "interact" with humans. For example, consider multiple domain experts working at different levels across different organizations in an environment where numerous biosurveillance algorithms may provide contradictory interpretations of ongoing events. We present a framework that consists of a collection of autonomous, machine learning-enabled analytic processes, services and tools that; for the first time, will seamlessly integrate surveillance and response systems with human experts.

    331 views | 0 comments | 1 favorites | 4 downloads | 3 embeds (Stats)

    Embed in your blog options close
    Embed (wordpress.com) Exclude related slideshows Embed in your blog

    More Info

    This slideshow is Public
    Total Views: 331 on Slideshare: 293 from embeds: 38
    Most viewed embeds (Top 5): More
    All Embeds: Less
    Flagged as inappropriate Flag as inappropriate

    Flag as inappropriate

    Select your reason for flagging this slideshow as inappropriate.

    If needed, use the feedback form to let us know more details.

    Slideshow Transcript

    1. Slide 1: MACHINE LEARNING AND DISEASE SURVEILLANCE Taha Kass-Hout, MD, MS Nicolás di Tada October 2008
    2. Slide 2: Image source: http://www.birds.cornell.edu/crows/images/deadcrow.jpg Image source: http://farm3.static.flickr.com/2029/2239605500_6ef2fd2295.jpg?v=0
    3. Slide 3: LATE DETECTION – RESPONSE Opportunity for control CASES DAY
    4. Slide 4: EARLY DETECTION AND RESPONSE Opportunity for control CASES DAY
    5. Slide 5: INFORMATION SOURCES  Event-based – ad-hoc unstructured reports issued by formal or informal sources  Indicator-based – (number of cases, rates, proportion of strains…)
    6. Slide 6: PUBLIC HEALTH MEASURES  Representativeness  Completeness  Predictive Value  Timeliness
    7. Slide 7: PUBLIC HEALTH MEASURES Specificity / 50 Malaria Reliability notifications (5%) Urge frequent reporting: Weekly  daily  immediately 1000 Malaria infections (100%) Get as close to the bottom of the pyramid as possible Sensitivity / Timeliness • Main attributes o Representativeness o Completeness o Predictive value positive
    8. Slide 8: PUBLIC HEALTH MEASURES Health care hotline • Main attributes o Timeliness Signal as early as possible Analyze and interpret Time Automated analysis/ thresholds
    9. Slide 9: THE PROBLEM SPACE  Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans  We have systems in place for those threats we have been faced with before
    10. Slide 10: PUBLIC HEALTH – TWO PERSPECTIVES  Case management  Individual cases of notifiable diseases  Relationship networks (contact tracing)  Population surveillance  Larger risk patterns
    11. Slide 11: CASE MANAGEMENT  Questions/problems:  Is a case due to recent transmission?  If so, does the case share any feature with other, recent cases?  Ways it's being done:  Investigations/interviews  Meeting with other investigators
    12. Slide 12: POPULATION SURVEILLANCE  Questions/problems:  Are more cases happening than expected?  Does an excess suggest ongoing transmission in a specific region?  Way it's being done:  Semi-automated routine temporal and space-time statistical analysis
    13. Slide 13: WHY LOCATION MATTERS – CASE MANAGEMENT  If you are studying a case of a certain disease that was just declared  It is harder to picture the situation by looking at something as this..
    14. Slide 14: WHY LOCATION MATTERS – CASE MANAGEMENT
    15. Slide 15: WHY LOCATION MATTERS – CASE MANAGEMENT  Than by looking at this..
    16. Slide 16: WHY LOCATION MATTERS – CASE MANAGEMENT
    17. Slide 17: WHY LOCATION MATTERS – POP SURVEILLANCE  If you are studying the spatial distribution of a set of disease clusters  This would seem more difficult..
    18. Slide 18: WHY LOCATION MATTERS – POP SURVEILLANCE
    19. Slide 19: WHY LOCATION MATTERS – POP SURVEILLANCE  Than this..
    20. Slide 20: WHY LOCATION MATTERS – POP SURVEILLANCE
    21. Slide 21: MODERN DISEASE SURVEILLANCE  In the past two decades, much disease surveillance research has focused on developing analytical methods for automatically detecting anomalous patterns in data  Modern methods can achieve timely detection of anomalies by incorporating temporal, spatial, and multivariate information
    22. Slide 22: MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Detection algorithm Too many “What are we supposed alerts to do with this?”
    23. Slide 23: MODERN DISEASE SURVEILLANCE 9/20, 15213, cough/cold, … 9/21, 15207, antifever, … 9/22, 15213, CC = cough, ... 1,000,000 more records… Huge mass of data Feedback loop
    24. Slide 24: ADVANTAGES OF MACHINE LEARNING P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33%
    25. Slide 25: MACHINE LEARNING TECHNIQUES  Classifiers  Clustering  Bayesian Statistics  Neural Networks  Genetic Algorithms
    26. Slide 26: HOW TO REPRESENT A DOCUMENT? “I had a flu last month. […] I had flu a flu early this week.” fever “This morning I woke up with fever, I might have a flu.”
    27. Slide 27: CLASSIFIERS – PROBLEM DEFINITION  Map items to vectors (Feature extraction)  Normalize those vectors  Train the classifier  Measure the results with new information  Feedback the classifier  Separate classes in feature space
    28. Slide 28: CLASSIFIERS - SVM
    29. Slide 29: SVM – MARGIN MAXIMIZATION  Support vectors define the separator
    30. Slide 30: SVM – NON LINEAR? Map to higher-dimension space Φ: x → φ(x)
    31. Slide 31: SVM – FILTERING OR CLASSIFYING Training Training Document Document Positives Document 1 Document 2 Classifier Document 3 Negatives
    32. Slide 32: CLUSTERING – PROBLEM DEFINITION  Map items to vectors (Feature extraction)  Normalization  Agglomerative and Partitional
    33. Slide 33: CLUSTERING - AGGLOMERATIVE
    34. Slide 34: CLUSTERING - PARTITIONAL
    35. Slide 35: BAYESIAN STATISTICS Probability of Probability of flu Probability of fever once flu is disease A (flu) once (prior or confirmed marginal) symptoms B (fever) are observed P(B | A).P(A) P(A | B) = P(B) Probability of fever (prior or marginal)
    36. Slide 36: NEURAL NETWORKS  Given a set of stimulus, train a system to produce a given output
    37. Slide 37: NEURAL NETWORKS - STRUCTURE Weight {I0,I1,……In} Input Layer […] I H n = ¥ (Ii .w in ) i= 0 ᅠ Hidden Layer […] Output Layer {O0,O1,……On}
    38. Slide 38: NEURAL NETWORK - APPLICATION Event?
    39. Slide 39: GENETIC ALGORITHM - BASICS  Define the model that you want to optimize  Create the fitness function  Evolve the gene pool testing against the fitness function.  Select the best individual
    40. Slide 40: GENETIC ALGORITHM – MODEL  Model the transmission process using a set of parameters:  Onset time between an infection and illness  Latency period  Incubation period  Symptomatic period  Infectious period (Onset, Latency, Incubation, Symptomatic , Infectious) ( 2 days, 3 days, 1 day, 4 days, 3 days)
    41. Slide 41: GENETIC ALGORITHM – MODEL FITNESS Fitness = 1/Area
    42. Slide 42: GENETIC ALGORITHM – PROCESS 1. Create an initial population of candidates 2. Use operators to generate new candidates (mating and mutation) 3. Discard worst individuals or select best individuals in generation 4. Repeat from 2 until you find a candidate that satisfies the solution searched
    43. Slide 43: GENETIC ALGORITHM - PROCESS (5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2) (2,3,4,6,5) (4,3,6,2,5) (4,5,6,3,5) (3,4,5,2,6) (3,4,4,6,2) (5,3,2,6,5) (3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6) (4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4) (5,3,2,6,5) (3,4,4,6,2)
    44. Slide 44: RESULTS – IMPROVED SURVEILLANCE
    45. Slide 45: Q&A
    46. Slide 46: THANK YOU! Taha Kass-Hout, MD, MS http://www.instedd.org kasshout@instedd.org http://taha.instedd.org Nicolás di Tada http://www.manas.com.ar nditada@manas.com.ar http://weblogs.manas.com.ar/ndt/
    47. Slide 47: BACKUP SLIDES
    48. Slide 48: REFERENCES  Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic Detection, AMIA 2007, Symposium Proceedings 2007  EpiNorth-Based material (http://www.epinorth.org):  Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006  Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006  Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006  Daniel Neil, Incorporating Learning into Disease Surveillance Systems
    49. Slide 49: REFERENCES  Algorithms  Complex Event Processing Over Uncertain Data in Wasserkrug (2008)  Outbreak detection through automated surveillance A review of the determinants of detection in Buckeridge (2007)  Approaches to the evaluation of outbreak detection methods in Watkins (2006)  Algorithms for rapid outbreak detection a research synthesis Buckeridge (2004)  Data mining in bioinformatics using Weka in Frank (2004)
    50. Slide 50: REFERENCES  Automating Laboratory Reporting  Automatic Electronic Laboratory-Based Reporting in Panackal (2002)  Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable Diseases in Nguyen (2007)  Using EMR Data for Disease Surveillance  Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007)  Electronic Medical Record Support for PH in Klompas (2007)  A knowledgebase to support notifiable disease surveillance in Doyle (2005)  Automated Detection of Tuberculosis Using Electronic Medical Record Data in Calderwood (2007)  Misc Readings  Breakthrough in modeling emerging disease hotspots in Jones (2008)  Use of data mining techniques to investigate disease risk classification as a proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)
    51. Slide 51: RELATED PROJECTS  InSTEDD RNA (or Event Evolution): Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response ( http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html )  ALPACA \"ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application.\" http://2008.hfoss.org/ALPACA  Surveillance Project An Open Source R-package disease surveillance framework for \"...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data.\" http://surveillance.r-forge.r-project.org/  Weka An open source \"...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.\" http://www.cs.waikato.ac.nz/~ml/weka/