Thoughts.toString() -- Shiki: Highlighting lines and phrases in Shiki

Blogging about and describing code, to me, is an act of irony. The point of having a logical, mathematical language to begin with is to avoid and disambiguate the vagueness in natural language. Yet, it remains important to communicate and bridge that gap. However, if a picture is worth a thousand words, then surely good highlighting is worth at least a hundred. Instead of directing an audience what to look for in a block, and where to look for it, highlighting a certain line or phrase focuses their attention more directly. This is another reason I had to go with a static site generator framework or build my own blog, because good luck trying to get all these features into Wordpress or Wix!

For the purposes of this post, I assume that using a complicated regular expression or string splitting, an input is derived in the following format.

typescript

1type HighlightSegment = {
2    startLine: number,
3    endLine?: number,
4    startChar?: number,
5    endChar?: number,
6    termStr?: string,
7    termRegexp?: RegExp,
8    startMatch?: number,
9    endMatch?: number,
10    dataId?: string
11};

Then I will break down how to apply each piece of this information.

Highlighting entire lines

First, highlighting lines is as simple as applying a class or attribute to the line element that’s within range.

/src/shiki/transforms/highlights.tstypescript

1import { Element } from 'hast';
2// ... Parse input.
3const highlightEntireLine = (
4    line: Element,
5    segment: HighlightSegment
6): void => {
7    line.properties['data-highlighted-line'] = '';
Set an attribute for further styling.
8    if (segment.dataId) {
9        line.properties['data-highlighted-line-id'] = segment.dataId;
10    }
11};

Utility functions

Highlighting parts of lines is a little more involved, especially when it’s possible to highlight from and to any arbitrary character, even if they’re in the middle of spans. The strategy here would then be to traverse the HAST trees (where the line is the root), then make splits to Element nodes so that the highlighted parts can be grouped under a <mark>.

With that in mind, let’s write some functions that make tree traversal easier. The first task is to be able to traverse all the Text nodes in-order. Since these lines are about 3 levels deep, we won’t have to worry about any stack issues. To keep this simple, a recursive function shall do.

/src/shiki/utils.tstypescript

In HAST code, an Element has access to its children, but not its siblings. So it's helpful if the parent node is accessible. The TraversalFunction returns false when we want to stop traversing.
1export type TraversalFunction = (
2    node: ElementContent, 
3    parent: Element | null,
4    siblingIndex: number
5) => boolean;
6
7export const inOrderTraversal = (
8    node: ElementContent,
9    parent: Element | null,
10    siblingIndex: number,
11    func: TraversalFunction
12): boolean => {
Call TraversalFunction on current node. The function should save some data before telling whether it wants traversal to proceed.
13    if (!func(node, parent, siblingIndex)) {
14        return false;
15    }
16    if (node.type === 'element') {
Traverse each child recursively.
17        for (let i = 0; i < node.children.length; i++) {
18            const child = node.children[i];
19            if (!inOrderTraversal(child, node, i, func)) {
20                return false;
21            }
22        }
23    }
24    return true;
25};

With this, we could easily get all the text of a node.

/src/shiki/utils.tstypescript

1import { ElementContent } from 'hast';
type ElementContent = Element | Text | Comment | Root
2export const getNodeText = (node: ElementContent): string => {
3    if (node.type !== 'element') {
4        return node.type === 'text' ? node.value : '';
5    }
6
Aggregate all text from Text nodes
7    let textParts: string[] = [];
8    inOrderTraversal(node, null, 0, (node, _parent, _index) => {
9        if (node.type === 'text') {
10            textParts.push(node.value);
11        }
12        return true;
13    });
14    return textParts.join('');
15};

For the last utility, we’ll make a split within an Element based on the character index of its text.

/src/shiki/utils.tstypescript

1export enum KeepSide { Left, Right };
2type SiblingElement = {
3    node: ElementContent,
4    index: number
5};
6export const cloneElement = (node: Element): Element => ({ ...node, properties: {...node.properties }});
7export const splitElement = (
8    node: Element,
9    splitIndex: number,
10    keepSide: KeepSide
11): Element | null => {
Maintain a hierarchical list of nodes from root to the current node. Once the index where the split will be made is found in a Text node, this list will contain all the nodes that will be considered for splitting.
12    const splitChain: SiblingElement[] = [];
If the split index is in the middle of a Text or Element node in a given tree lavel, then the node will have to be split and the new node stored and returned. If the split index is between two Text or Element nodes, then the split node will remain null.
13    let splitNode: Element | null = null;
14    inOrderTraversal(node, null, 0, (node, parent, index) => {
15        let lastChainIndex = splitChain.length - 1;
16        let resume = true;
17        if (node.type === 'element' && node.children.length > 0) {
Remove nodes until the parent of current element is the last node in the chain.
18            while (parent && parent !== splitChain[lastChainIndex].node) {
19                splitChain.pop();
20                lastChainIndex--;
21            }
There is only one Text node per Element. If the first child isn't text, then there are only Element children.
22            const firstChild = node.children[0];
23            if (firstChild.type === 'text') {
24                if (splitIndex < firstChild.value.length) {
Split is in the middle of current Text node. The side that is "kept" remains on the original nodes. The other side is moved onto the split node.
25                    if (splitIndex > 0) {
26                        const leftText = firstChild.value.substring(0, splitIndex);
27                        const rightText = firstChild.value.substring(splitIndex);
28                        const newText: Text = { type: 'text', value: '' };
29                        if (keepSide === KeepSide.Left) {
30                            firstChild.value = leftText;
31                            newText.value = rightText;
32                        }
33                        else {
34                            newText.value = leftText;
35                            firstChild.value = rightText;
36                        }
const cloneElement = (node: Element): Element => { ...node, properties: {...node.properties }};
37                        splitNode = cloneElement(node);
38                        splitNode.children = [newText];
39                    }
40                    resume = false;
41                }
42                else {
If split has not been reached, subtract index by the length of the text that has been seen.
43                    splitIndex -= firstChild.value.length;
44                }
45            }
46
47            splitChain.push({node, index});
48        }
49        return resume;
50    });
From the bottom of the split chain, use the parent and sibling index where the split occurs at each level to split half of the tree onto a new tree to be returned.
51    for (let i = splitChain.length - 2; i >= 0; i--) {
52        const parent = splitChain[i].node as Element;
53        let splitChildIndex = splitChain[i + 1].index;
Split was never made, so the split chain should be on the left
54        if (resume) {
55            splitChildIndex++;
56        }
Suppose split is made on index s. If the left side is kept, then children represents the right side nodes: [splitNode, ...[s+1, s+2, ..., n-1]]. If the right side is kept, then children represents the left side nodes: [...[0, 1, ..., s], splitNode].
57        const children: ElementContent[] = [];
58        if (splitNode && keepSide === KeepSide.Left) {
59            children.push(splitNode);
60            splitChildIndex++;
61        }
62        if (keepSide === KeepSide.Left) {
63            children.push(...parent.children.splice(splitChildIndex));
64        }
65        else {
66            children.push(...parent.children.splice(0, splitChildIndex));
67        }
68        if (splitNode && keepSide === KeepSide.Right) children.push(splitNode);
69
Aggregate children under new parent, which becomes the split node for the level above.
70        if (children.length > 0) {
71            splitNode = cloneElement(parent);
72            splitNode.children = children;
73        }
74    }
75
76    return splitNode;
77};

Parse and mark

With the heavy lifting out of the way, the only left to do is to parse the text of each line to find the indices where the highlighting will occur, and then make the necessary splits to mark those Element nodes. To find the indices, we aggregate the text values of the leaves of the tree, then use regular expression.

/src/shiki/transforms/highlights.tstypescript

1import type { Element } from 'hast';
2import { getNodeText } from '../utils';
3
4type HighlightRange = {
5    start: number,
6    end: number
7};
8const getRangesInLine = (
9    line: Element,
10    segment: HighlightSegment
11): HighlightRange[] = {
12    const lineText = getNodeText(line);
13    const highlightRanges: HighlightRange[] = [];
14    if (segment.termStr) {
15        const length = segment.termStr.length;
16        let i = lineText.indexOf(segment.termStr);
17        while (i >= 0) {
18            highlightRanges.push({ start: i, end: i + length });
19            i = lineText.indexOf(segment.termStr, i + length);
20        }
21    }
22    else if (segment.termRegexp) {
23        let match;
24        while ((match = segment.termRegexp.exec(lineText)) !== null) {
25            highlightRanges.push({ 
26                start: match.index, 
27                end: segment.termRegexp.lastIndex
28            });
29        }
30    }
31    else {
32        highlightRanges.push({
33            start: segment.startChar, 
34            end: segment.endChar
35        });
36    }
37};

Finally, for each one of these ranges, we make a mark.

/src/shiki/transforms/highlights.tstypescript

1import type { Element, ElementContent } from 'hast';
2import { splitElement } from '../utils';
3
4const createElement = (tagName: string): Element => {
5    return { type: 'element', tagName, properties: {}, children: []};
6}
7const transformer: ShikiTransformer = {
8    line(line: Element, index: number) {
9        // ... Parse input.
10        const segment: HighlightSegment = parseInput(line);
Only match startLine if endLine is not specified.
11        if (segment.startLine && !segment.endLine) {
12            segment.endLine = segment.startLine + 1;
13        }
14        if (index >= startLine && index < endLine) {
If no ranges or search terms given, highlight entire line.
15            if (!(segment.startChar || segment.termStr || segment.termRegexp)) {
16                highlightEntireLine(line, segment);
17            }
18            else {
19                const ranges: HighlightRange[] = getRangesInLine(line, segment);
20                for (const highlight of ranges) {
We don't want to split the line Element itself, only its children. So we make a new <mark> Element under it.
21                    const marked = createElement('mark');
22                    marked.children = line.children;
23                    marked.properties['data-highlighted'] = '';
24                    if (segment.dataId) {
25                        marked.properties['data-highlighted-id'] = segment.dataId;
26                    }
Each highlight makes two potential splits: one at its start and one at its end. For the start, we want the <mark> to hold the right side and another element to hold the left.
27                    const tempLeftElement = splitElement(
28                        marked, highlight.start, KeepSide.Right);
29                    
Since this second split is on the <mark> Element, that becomes the kept side on the left. The <mark> Element only holds the children after the first split at the start index, so the characters before that aren't counted.
30                    const tempRightElement = splitElement(
31                        marked, highlight.end - highlight.start, KeepSide.Left);
Combine overlapping marks.
32                    const markedSpans: ElementContent[] = [];
33                    for (let i = 0; i < marked.children.length; i++) {
34                        const child = marked.children[i];
35                        if (child.type === 'element' && child.tagName === 'mark') {
36                            markedSpans.push(...child.children);
37                        }
38                        else {
39                            markedSpans.push(child);
40                        }
41                    }
42                    marked.children = markedSpans;
Combine temporary Element nodes back into line. The <mark> Element always exists since it's the kept side on both splits. The left and right elements, however, are split nodes, which returns null if nothing is selected.
43                    line.children = [];
44                    if (tempLeftElement) {
45                        line.children.push(...tempLeftElement.children);
46                    }
47                    line.children.push(marked);
48                    if (tempRightElement) {
49                        line.children.push(...tempRightElement.children);
50                    }
51                }
52            }
53        }
54        return line;
55    }
56};

And with this, the line now holds the original children spans, but with the highlighted segments grouped under <mark> elements, with attributes if specified. These mark elements can be styled by the tag. Both the mark and line can also be styled by specified attributes. For example,

css

1.astro-code {
2    [data-line] {
3        background-color: grey;
4    }
5    mark {
6        background-color: grey;
7        padding: 2px;
8        border: 1px solid black;
9        border-radius: 2px;
10    }
11}