sgammon/benchmarks

Qri Benchmarks

20 entries
768 Bytes size
CSV format
latest commit: 8 days ago
Readme

Benchmarks for qri

This small dataset was prepared as part of an issue on Github, in order to benchmark the qri command line tool for a few sample dataset sizes. Read on to learn more about how the data was prepared.

Test data

The raw data is a stream of tap events from a real-world retail menu application, running on iOS, via kiosk iPad devices. The events are measured on the device, then relayed to a cloud service, which gathers them and places them in event storage. The full dataset weighs in at slightly over 3 million events, with an uncompressed on-disk size of about 7.3G in JSON, and 5.7G in CSV.

Each row in the dataset has 4 simple columns:

uuid,ingest,occurred,raw

A random sample row in the dataset, masked for privacy, looks like this:

BA882B47-B26A-4E29-BFB4-XXXXXXXXXXXX,2018-04-17 02:46:04.137 UTC,2018-04-17 02:46:04.019 UTC,"{""big-json-object"":true}"

Obviously, the big-json-object is actually a big JSON object, which is roughly structured as below, but without the pretty formatting (since these events come out of the pipeline before any processing occurs, they are rather repetitive):

{
  "uuid": "BA882B47-B26A-4E29-BFB4-XXXXXXXXXXXX",
  "timing": {
    "ingest": {"timestamp": 1523933164137},
    "occurred": {"timestamp": 1523933164019},
  },
  "context": {
    "collection": {"name": "products"},
    "other": "properties go here"
  }
}

Sample sizes

In order to benchmark the scalability of the qri tool, I generated sample sizes at various sizes, by grabbing the first n lines of the original dataset: - First 10k records, at ~11mb - First 50k records, at ~55mb - First 100k records, at ~110mb - First 500k records, at ~525mb - First 1 million records, at ~1gb

A screenshot is included on the Github issue that depicts this process.

Returned values from time

For more information about what the user / system / cpu / total values actually mean, consult the man docs for the *nix time command.

Meta

Theme

none

Keywords

benchmarks
qri

Description

Dataset of one-off benchmarks for the Qri tool.

Body Preview
datasetsize_mbcommandusersystemcputotal
110k11init0.970.09138%0.764
210k11save2.260.18132%1.84
310k11get body0.70.1124%3.256
410k11get meta0.030.034%1.407
550k55init4.790.38155%3.327
650k55save11.351.01118%10.457
750k55get body2.990.2989%3.675
850k55get meta0.030.022%1.937
9100k112init10.350.85153%7.288
10100k112save22.821.93127%19.352
11100k112get body5.870.3890%6.915
12100k112get meta0.040.035%1.226
13500k544init53.195.86152%38.747
14500k544save119.2611.23136%95.72
15500k544get body27.861.3197%30.039
16500k544get meta0.030.027%0.767
171m1120init109.2626.28156%86.5
181m1120save252.7640.52140%212.67
191m1120get body59.833.1799%63.61
201m1120get meta0.030.029%0.565
Previewing 20 of 20 entries