Compress and decompress data to make HTML files smaller

Important

Using compression: yes

Problem:

whenever we “handover” data from R to OJS with ojs_define , it gets “injected” into the head of the HTML as is. This blows up the file size of the HTML output.

Here, we use gzip to

Whether or not data should be compressed can be controlled with the compress parameter in the yaml header.

Compression in R

# quarto library and test data
library(quarto)


# https://feederwatch.org/explore/raw-dataset-requests/ 
# https://github.com/rfordatascience/tidytuesday/tree/master/data/2023/2023-01-10
#test_data <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-10/PFW_2021_public.csv')

test_data <- read.csv("06-test-data.csv")
 
# for interactive development, use penguins to speed up
# library(palmerpenguins)
# test_data <- palmerpenguins::penguins

str(test_data)
'data.frame':   100000 obs. of  23 variables:
 $ X                 : int  1 2 3 4 5 6 7 8 9 10 ...
 $ loc_id            : chr  "L981010" "L3161698" "L13210778" "L13258348" ...
 $ latitude          : num  52.1 43.8 39.7 42.2 32.7 ...
 $ longitude         : num  -122.1 -123.1 -75.9 -83.7 -79.9 ...
 $ subnational1_code : chr  "CA-BC" "US-OR" "US-MD" "US-MI" ...
 $ entry_technique   : chr  "POSTCODE LAT/LONG LOOKUP" "/GOOGLE_MAP/ZOOM:18" "/GOOGLE_MAP/ZOOM:15" "/GOOGLE_MAP/ZOOM:15" ...
 $ sub_id            : chr  "S83206450" "S78031190" "S81318993" "S79251313" ...
 $ obs_id            : chr  "OBS1092604618" "OBS1036509564" "OBS1073386105" "OBS1051702542" ...
 $ Month             : int  3 12 2 1 1 3 1 4 11 1 ...
 $ Day               : int  4 19 13 13 11 13 23 23 28 2 ...
 $ Year              : int  2021 2020 2021 2021 2021 2021 2021 2021 2020 2021 ...
 $ PROJ_PERIOD_ID    : chr  "PFW_2021" "PFW_2021" "PFW_2021" "PFW_2021" ...
 $ species_code      : chr  "amegfi" "moudov" "tuftit" "houspa" ...
 $ how_many          : int  20 11 2 2 10 2 5 2 6 9 ...
 $ valid             : int  1 1 1 1 1 1 1 0 1 1 ...
 $ reviewed          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ day1_am           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ day1_pm           : int  0 1 1 1 1 1 1 1 1 1 ...
 $ day2_am           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ day2_pm           : int  0 1 1 1 1 1 1 1 1 1 ...
 $ effort_hrs_atleast: num  1 1 8 4 1 ...
 $ snow_dep_atleast  : num  5 0 5 0 0 ...
 $ Data_Entry_Method : chr  "PFW Web 4.1.4" "PFW Web 4.1.4" "PFW Web 4.1.4" "PFW Web 4.1.4" ...

First we bring the data into a format that we can compress, i.e. a text-format. Here, we do JSON. It is a “row” json, i.e. a list of objects where each object represents a row. This is good because that’s also what many Javascript libraries work with by default.

test_data_json <- jsonlite::toJSON(test_data)
str(test_data_json)
 'json' chr "[{\"X\":1,\"loc_id\":\"L981010\",\"latitude\":52.1298,\"longitude\":-122.1355,\"subnational1_code\":\"CA-BC\",\"| __truncated__

we extract the compression into a little helper function:

compress_for_ojs <- function(string) {
  # gzip 
  compressed_raw <-  memCompress(charToRaw(string), "gzip") # raw vector
  # convert each element of vector from hex to decimal
  # needed because the decompression in js expects it this way and not as hex
  # TODO: check whether an option in decompress function can also make hex acceptable
  compressed_decimal <- as.numeric(compressed_raw) 
  return(compressed_decimal)
}

now we use the compression function or just hand over the json to ojs if compress = FALSE.

if (compress) {
  compressed <- compress_for_ojs(test_data_json)
  ojs_define(data_ojs = compressed)
} else {
  # no compression, just hand over json as is
  ojs_define(data_ojs = test_data_json)
}
[1] "Length of uncompressed JSON string: 43745953"
[1] "Length of compressed string 12394581"

Decompress in OJS

For decompression, we need to load two libraries: buffer and zlib. buffer is needed to create the input for the decompress function of zlib. Both libraries are originally not designed for the browser but we can use “browserified” versions that have been made available. This is a very useful tool for OJS to check whether/how npm libraries can be used in OJS.

buffer = require('https://bundle.run/buffer@6.0.3') // ~8kb
zlib = require('https://bundle.run/browserify-zlib@0.2.0') // ~30kb here we could check whether we can just import inflateSync
data = {
  if (compress_ojs) {
    // if compression was done, decompress
    let decompressed = zlib.inflateSync(new buffer.Buffer(data_ojs, 'base64')).toString()
    return(decompressed)
  } else {
    return(data_ojs)
  }
}
Inputs.table(JSON.parse(data))