L1-depth revisited: A robust angle-based outlier factor in high-dimensional space

Ninh Pham*

*Corresponding author af dette arbejde

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

4 Citationer (Scopus)

Abstract

Angle-based outlier detection (ABOD) has been recently emerged as an effective method to detect outliers in high dimensions. Instead of examining neighborhoods as proximity-based concepts, ABOD assesses the broadness of angle spectrum of a point as an outlier factor. Despite being a parameter-free and robust measure in high-dimensional space, the exact solution of ABOD suffers from the cubic cost O(n3 regarding the data size n, hence cannot be used on large-scale data sets. In this work we present a conceptual relationship between the ABOD intuition and the L1-depth concept in statistics, one of the earliest methods used for detecting outliers. Deriving from this relationship, we propose to use L1-depth as a variant of angle-based outlier factors, since it only requires a quadratic computational time as proximity-based outlier factors. Empirically, L1-depth is competitive (often superior) to proximity-based and other proposed angle-based outlier factors on detecting high-dimensional outliers regarding both efficiency and accuracy. In order to avoid the quadratic computational time, we introduce a simple but efficient sampling method named SamDepth for estimating L1-depth measure. We also present theoretical analysis to guarantee the reliability of SamDepth. The empirical experiments on many real-world high-dimensional data sets demonstrate that SamDepth with √n samples often achieves very competitive accuracy and runs several orders of magnitude faster than other proximity-based and ABOD competitors. Data related to this paper are available at: https://www.dropbox.com/s/nk7nqmwmdsatizs/Datasets.zip. Code related to this paper is available at: https://github.com/NinhPham/Outlier.

OriginalsprogEngelsk
TitelMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings
RedaktørerFrancesco Bonchi, Thomas Gärtner, Neil Hurley, Georgiana Ifrim, Michele Berlingerio
Antal sider17
Vol/bind1
ForlagSpringer
Publikationsdato2019
Sider105-121
ISBN (Trykt)9783030109240
DOI
StatusUdgivet - 2019
BegivenhedEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018 - Dublin, Irland
Varighed: 10 sep. 201814 sep. 2018

Konference

KonferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018
Land/OmrådeIrland
ByDublin
Periode10/09/201814/09/2018
NavnLecture notes in computer science
Vol/bind11051
ISSN0302-9743

Citationsformater