55[ ![ Release] ( https://img.shields.io/github/v/release/linkedin/isolation-forest )] ( https://github.com/linkedin/isolation-forest/releases/ )
66[ ![ License] ( https://img.shields.io/badge/License-BSD%202--Clause-orange.svg )] ( LICENSE )
77
8- ## Introduction
9-
10- This is a Scala/Spark implementation of the Isolation Forest unsupervised outlier detection
11- algorithm. This library was created by [ James Verbus] ( https://www.linkedin.com/in/jamesverbus/ ) from
12- the LinkedIn Anti-Abuse AI team.
13-
14- The ` isolation-forest ` module supports distributed training and scoring in Scala using Spark data structures.
15- It inherits from the ` Estimator ` and ` Model ` classes in [ Spark's ML library] ( https://spark.apache.org/mllib/ )
16- in order to take advantage of machinery such as ` Pipeline ` s. Model persistence on HDFS is
17- supported.
8+ ## Table of contents
9+ - [ Introduction] ( #introduction )
10+ - [ Features] ( #features )
11+ - [ Getting started] ( #getting-started )
12+ - [ Building the library] ( #building-the-library )
13+ - [ Add an isolation-forest dependency to your project] ( #add-an-isolation-forest-dependency-to-your-project )
14+ - [ Usage examples] ( #usage-examples )
15+ - [ Model parameters] ( #model-parameters )
16+ - [ Training and scoring] ( #training-and-scoring )
17+ - [ Saving and loading a trained model] ( #saving-and-loading-a-trained-model )
18+ - [ ONNX conversion for portable inference] ( #onnx-conversion-for-portable-inference )
19+ - [ Converting a trained model to ONNX] ( #converting-a-trained-model-to-onnx )
20+ - [ Using the ONNX model for inference (example in Python)] ( #using-the-onnx-model-for-inference-example-in-python )
21+ - [ Performance and benchmarks] ( #performance-and-benchmarks )
22+ - [ Copyright and license] ( #copyright-and-license )
23+ - [ Contributing] ( #contributing )
24+ - [ References] ( #references )
1825
19- The ` isolation-forest-onnx ` module provides Python-based converter to convert a trained model to ONNX format for broad
20- portability across platforms and languages. [ ONNX] ( https://onnx.ai/ ) is an open format built to represent machine
21- learning models.
26+ ## Introduction
2227
23- ## Copyright
28+ This is a distributed Scala/Spark implementation of the Isolation Forest unsupervised outlier detection
29+ algorithm. It features support for ONNX export for easy cross-platform inference. This library was created
30+ by [ James Verbus] ( https://www.linkedin.com/in/jamesverbus/ ) from the LinkedIn Anti-Abuse AI team.
2431
25- Copyright 2019 LinkedIn Corporation
26- All Rights Reserved.
32+ ## Features
2733
28- Licensed under the BSD 2-Clause License (the "License").
29- See [ License] ( LICENSE ) in the project root for license information.
34+ * ** Distributed training and scoring:** The ` isolation-forest ` module supports distributed training and scoring in Scala
35+ using Spark data structures. It inherits from the ` Estimator ` and ` Model ` classes in [ Spark's ML library] ( https://spark.apache.org/mllib/ ) in
36+ order to take advantage of machinery such as ` Pipeline ` s. Model persistence on HDFS is supported.
37+ * ** Broad portability via ONNX:** The ` isolation-forest-onnx ` module provides Python-based converter to convert a
38+ trained model to ONNX format for broad portability across platforms and languages. [ ONNX] ( https://onnx.ai/ ) is an open format built
39+ to represent machine learning models.
3040
31- ## How to use
41+ ## Getting started
3242
3343### Building the library
3444
@@ -51,6 +61,11 @@ To force a rebuild of the library, you can use:
5161./gradlew clean build --no-build-cache
5262```
5363
64+ To just run the tests:
65+ ``` bash
66+ ./gradlew test
67+ ```
68+
5469### Add an isolation-forest dependency to your project
5570
5671Please check [ Maven Central] ( https://repo.maven.apache.org/maven2/com/linkedin/isolation-forest/ ) for the latest
@@ -89,6 +104,8 @@ Here is an example for a recent Spark/Scala version combination.
89104</dependency>
90105```
91106
107+ ## Usage examples
108+
92109### Model parameters
93110
94111| Parameter | Default Value | Description |
@@ -104,6 +121,7 @@ Here is an example for a recent Spark/Scala version combination.
104121| predictionCol | "predictedLabel" | The predicted label. This column is appended to the input DataFrame upon scoring. |
105122| scoreCol | "outlierScore" | The outlier score. This column is appended to the input DataFrame upon scoring. |
106123
124+
107125### Training and scoring
108126
109127Here is an example demonstrating how to import the library, create a new ` IsolationForest `
@@ -203,7 +221,7 @@ isolationForestModel.write.overwrite.save(path)
203221val isolationForestModel2 = IsolationForestModel .load(path)
204222```
205223
206- ## ONNX model conversion and inference
224+ ## ONNX conversion for portable inference
207225
208226### Converting a trained model to ONNX
209227
@@ -276,7 +294,7 @@ print('ONNX Converter outlier scores:')
276294print (np.transpose(actual_outlier_scores[:num_examples_to_print])[0 ])
277295```
278296
279- ## Validation
297+ ## Performance and benchmarks
280298
281299The original 2008 "Isolation forest" paper by Liu et al. published the AUROC results obtained by
282300applying the algorithm to 12 benchmark outlier detection datasets. We applied our implementation of
@@ -299,11 +317,19 @@ result. The quoted uncertainty is the one-sigma error on the mean.
299317| [ Arrhythmia] ( http://odds.cs.stonybrook.edu/arrhythmia-dataset/ ) | 0.80 | 0.804 ± ; 0.002 |
300318| [ Ionosphere] ( http://odds.cs.stonybrook.edu/ionosphere-dataset/ ) | 0.85 | 0.8481 ± ; 0.0002 |
301319
302- Our implementation provides AUROC values that are in very good agreement the results in the original
303- Liu et al. publication. There are a few very small discrepancies that are likely due the limited
320+ Our implementation provides AUROC values that are in very good agreement with the results in the original
321+ Liu et al. publication. There are a few very small discrepancies that are likely due to the limited
304322precision of the AUROC values reported in Liu et al.
305323
306- ## Contributions
324+ ## Copyright and license
325+
326+ Copyright 2019 LinkedIn Corporation
327+ All Rights Reserved.
328+
329+ Licensed under the BSD 2-Clause License (the "License").
330+ See [ License] ( LICENSE ) in the project root for license information.
331+
332+ ## Contributing
307333
308334If you would like to contribute to this project, please review the instructions [ here] ( CONTRIBUTING.md ) .
309335
0 commit comments