Making 2D Hilbert Curve

Zuguang Gu (z.gu@dkfz.de)

2022-04-26

Introduction

Hilbert curve is a type of space-filling curves that folds one dimensional axis into a two dimensional space, but still keeps the locality. It has advantages to visualize data with long axis in following two aspects:

  1. greatly improve resolution of the visualization fron \(n\) to \(\sqrt{n}\);
  2. easy to visualize clusters because generally data points in the axis will also be close in the 2D space.

This package aims to provide an easy and flexible way to visualize data through Hilbert curve. The implementation and example figures are based on following sources:

We first load the packages and set seed for random numbers.

library(HilbertCurve)
library(circlize)
set.seed(12345)

Following plots show Hilbert curves with level 2, 3, 4, 5:

## Warning in matrix(c(cos(theta), sin(theta), -sin(theta), cos(theta)), 2, : data length differs from
## size of matrix: [60 != 2 x 2]
## Warning in matrix(c(cos(theta), sin(theta), -sin(theta), cos(theta)), 2, : data length differs from
## size of matrix: [252 != 2 x 2]
## Warning in matrix(c(cos(theta), sin(theta), -sin(theta), cos(theta)), 2, : data length differs from
## size of matrix: [1020 != 2 x 2]

As shown in the above plots, as level increases, the length of the curve becomes longer and the curve folds more densely. The number of segments (one segment is marked in red) on the Hilbert curve is 4^level - 1. If a Hilbert curve with level 11 is used to map to human chromosome 1, the resolution would be 249250621/4^11 (approximately 59bp per segment).

Locality

Hilbert curve folds one-dimensional axis into a two-dimensional space while still preserves the locality of the data points. We visualize this attribute with a Hilbert curve with level 5. In following animation, the point moves with its natural order on the axis.

Next, we calculate the pairwise distance between data points in the 2D space and visualize it as heatmap. The numbers on x-axis (top) and y-axis (left) illustrate the position of data points in the original 1D axis.