Latent Fault Detection in Large Scale Services

Moshe Gabel, Assaf Schuster, Ran-Gilad Bachrach, Nikolaj Bjorner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Unexpected machine failures, with their resulting service outages and data loss, pose challenges to datacenter management. Existing failure detection techniques rely on domain knowledge, precious (often unavailable) training data, textual console logs, or intrusive service modifications.
Original languageUndefined/Unknown
Title of host publicationProceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Place of PublicationUSA
PublisherIEEE Computer Society
Pages1–12
ISBN (Print)9781467316248
DOIs
StatePublished - 2012
Externally publishedYes
Event42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012 - Boston, MA, United States
Duration: 25 Jun 201228 Jun 2012

Publication series

NameDSN '12
PublisherIEEE Computer Society

Conference

Conference42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
Country/TerritoryUnited States
CityBoston, MA
Period25/06/1228/06/12

Keywords

  • web services
  • statistical learning
  • statistical analysis
  • distributed computing
  • fault detection

Cite this