Thoughts.toString() -- Shiki: Creating a new transformer for line skips

I found myself in the situation of wanting to analyze a long file of code, of which there are particular parts that are the most relevant. To do this, I could use redundant phrasing to direct a reader to focus on particular lines, or use multiple blocks of code to represent that various chunks of interest. Below are roughly speaking 3 methods that came to mind, from worst to best.

Three examples of ways to discuss long code

One long block

Reading one long block of code, even with highlights to call to attention, requires multiple passes of scanning back and forth to establish context, and to connect the descriptions to code. Presenting code in this way is tiresome to the reader.

Below is an example of a Rehype plugin. From line 233, the visit async function is called from the unist-util-visit package that traverses a HAST (tree) and calls a function for every node that matches the second argument — in this case, nodes with type “element”. The first iteration is to retrieve the languages of all blocks, on line 239. They are aggregated on line 241, and then dynamically loaded on line 266. Then on line 278, a second pass through the tree strips escape characters from the code before parsing it with a cached singleton of the Shiki highlighter object.

rehype-pretty-code/packages/core/src/index.tstypescript

227  return async (tree) => {
228    const langsToLoad = new Set<string>();
229    const highlighter = await cachedHighlighter;
230    if (!highlighter) return;
231
232    // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation>
233    visit(tree, 'element', (element, _, parent) => {
234      if (isInlineCode(element, parent, bypassInlineCode)) {
235        const textElement = element.children[0];
236        if (!isText(textElement)) return;
237        const value = textElement.value;
238        if (!value) return;
239        const lang = getInlineCodeLang(value, defaultInlineCodeLang);
240        if (lang && lang[0] !== '.') {
241          langsToLoad.add(lang);
242        }
243      }
244
245      if (isBlockCode(element)) {
246        const codeElement = element.children[0];
247        if (!isElement(codeElement)) return;
248
249        const { lang } = parseBlockMetaString(
250          codeElement,
251          filterMetaString,
252          defaultCodeBlockLang,
253        );
254
255        if (lang) {
256          langsToLoad.add(lang);
257        }
258      }
259    });
260
261    try {
262      await Promise.allSettled(
263        Array.from(langsToLoad).map((lang) => {
264          try {
265            return highlighter.loadLanguage(
266              lang as Parameters<typeof highlighter.loadLanguage>[0],
267            );
268          } catch (e) {
269            return Promise.reject(e);
270          }
271        }),
272      );
273    } catch (e) {
274      console.error(e);
275    }
276
277    // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation>
278    visit(tree, 'element', (element, _, parent) => {
279      if (isInlineCode(element, parent, bypassInlineCode)) {
280        const textElement = element.children[0];
281        if (!isText(textElement)) return;
282        const value = textElement.value;
283        if (!value) return;

Rehype inline code plugin

Multiple blocks

Breaking up the blocks allows more in-depth commentary without wasting effort on directing the reader’s attention. It is much easier to focus and grasp the context when the code is limited to the vicinity of interest.

Below is an example of a Rehype plugin. First, the visit async function is called from the unist-util-visit package that traverses a HAST (tree) and calls a function for every node that matches the second argument — in this case, nodes with type “element”. The first iteration is to retrieve the languages, aggregating them in a langsToLoad Set (to eliminate redundancies).

rehype-pretty-code/packages/core/src/index.tstypescript

227  return async (tree) => {
228    const langsToLoad = new Set<string>();
229    const highlighter = await cachedHighlighter;
230    if (!highlighter) return;
231
232    // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation>
233    visit(tree, 'element', (element, _, parent) => {
234      if (isInlineCode(element, parent, bypassInlineCode)) {
235        const textElement = element.children[0];
236        if (!isText(textElement)) return;
237        const value = textElement.value;
238        if (!value) return;
239        const lang = getInlineCodeLang(value, defaultInlineCodeLang);
240        if (lang && lang[0] !== '.') {
241          langsToLoad.add(lang);
242        }
243      }

Rehype inline code plugin

The retrieved languages are then dynamically loaded using a resolved singleton of a cached Shiki highlighter object.

rehype-pretty-code/packages/core/src/index.ts (cont'd)typescript

261    try {
262      await Promise.allSettled(
263        Array.from(langsToLoad).map((lang) => {
264          try {
265            return highlighter.loadLanguage(
266              lang as Parameters<typeof highlighter.loadLanguage>[0],
267            );
268          } catch (e) {
269            return Promise.reject(e);
270          }

Rehype inline code plugin

A second pass through the tree strips escape characters from the code before parsing it with Shiki. The elements are checked for validity and the languages are tested to see if loading was successful.

rehype-pretty-code/packages/core/src/index.ts (cont'd)typescript

277    // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation>
278    visit(tree, 'element', (element, _, parent) => {
279      if (isInlineCode(element, parent, bypassInlineCode)) {
280        const textElement = element.children[0];
281        if (!isText(textElement)) return;
282        const value = textElement.value;
283        if (!value) return;

Rehype inline code plugin

Multiple chunks in a block

Although content-wise this version is nearly identical to the second one, describing specific lines of code right next to them maintains a single continuous context instead of forcing the reader to endure multiple subconscious context switches. It also allows for the separation of more general statements when context switches are necessary. In my opinion, this makes it much easier to read.

Below is an example of a Rehype plugin.

rehype-pretty-code/packages/core/src/index.tstypescript

227  return async (tree) => {
228    const langsToLoad = new Set<string>();
229    const highlighter = await cachedHighlighter;
230    if (!highlighter) return;
231    
The 'visit' async function is called from the 'unist-util-visit' package that traverses a HAST (tree) and calls a function for every node that matches the second argument -- in this case, nodes with type 'element'.
232    // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation>
233    visit(tree, 'element', (element, _, parent) => {
234      if (isInlineCode(element, parent, bypassInlineCode)) {
235        const textElement = element.children[0];
236        if (!isText(textElement)) return;
237        const value = textElement.value;
238        if (!value) return;
The first iteration retrieves the language from each block, storing unique languages into a Set.
239        const lang = getInlineCodeLang(value, defaultInlineCodeLang);
240        if (lang && lang[0] !== '.') {
241          langsToLoad.add(lang);
242        }
243      }
skip to line 261
261    try {
262      await Promise.allSettled(
263        Array.from(langsToLoad).map((lang) => {
264          try {
After 'visit' is complete, the retrieved languages are then dynamically loaded using a resolved singleton of a cached Shiki highlighter object.
265            return highlighter.loadLanguage(
266              lang as Parameters<typeof highlighter.loadLanguage>[0],
267            );
268          } catch (e) {
269            return Promise.reject(e);
270          }
skip to line 277
277    // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation>
A second pass through the tree strips escape characters from the code before parsing it with Shiki. The elements are checked for validity and the languages are tested to see if loading was successful.
278    visit(tree, 'element', (element, _, parent) => {
279      if (isInlineCode(element, parent, bypassInlineCode)) {
280        const textElement = element.children[0];
281        if (!isText(textElement)) return;
282        const value = textElement.value;
283        if (!value) return;

Rehype inline code plugin

Another reason for a customized code block

It’s a relief to pass Groundhog Day, I’m sure, but I hope I’ve impressed upon you the importance of a customized syntax highlighting environment. Even if you disagree with my ideas about what is easier to read, the more important point is that each blog author could advance their own ideas, making each blog stand out with a unique style.

What’s more, it’s not that hard to implement!

For the line skips shown above, first recall the overall form of a Transformer object.

/src/shiki/transforms/linebreaks.tstypescript

1import { ShikiTransformer, CodeOptionsMeta } from 'shiki';
2const transformer: ShikiTransformer = {
3    // Hook functions...
4};
5export default transformer;

For each line that a skip occurs, we want to use a comment with a code to tell it which line to skip to. For example,

typescript

1// Line 1
2// [!cоde skipto 261]
3// Line 261

causes

typescript

1// Line 1
skip to line 261
261// Line 261

This means that we should iterate through the lines of the raw code before tokenization to find these codes, then keep a map of line indices to the corresponding numbering.

/src/shiki/transforms/linebreaks.tstypescript

1const transformer: ShikiTransformer = {
2    preprocess(raw_code: string, meta: CodeOptionsMeta ) {
3        const regexp: RegExp = /\[!code skipto (\d+)\]/;
4        const lines: string[] = raw_code.split('\n');
5        let lineNumber = 1;
We use the meta object to pass objects between functions.
6        meta['skipMap'] = new Map<number, number | null>();
7        for (let i = 0; i < lines.length; i++) {
8            const match: Array = lines[i].match(regexp);
9            if (match) {
10                const skipTo = parseInt(match[1]);
11                if (!isNaN(skipTo)) {
Line index is 1-based. The skip itself is doesn't have a line number. Set the next line to the skip-to line.
12                    meta['skipMap'].set(i + 1, null);
13                    lineNumber = skipTo;
14                }
15            }
16            else {
Post-increment after assigning.
17                meta['skipMap'].set(i + 1, lineNumber++);
18            }
19        }
20    },
21    // ...
22};

Now we just need to add several elements to each line that matches our criteria.

/src/shiki/transforms/linebreaks.tstypescript

1import { Element } from 'hast';
2const createElement = (tagName: string): Element => ({
3    type: 'element', tagName, properties: {}, children: []
4});
5const transformer: ShikiTransformer = {
6    line(line: Element, index: number ) {
Pass meta object from ShikiTransformer context.
7        const skipMap: Map<number, number | null> = this.options.meta['skipMap'];
8        const lineNumber: number | null = skipMap.get(index);
9        if (lineNumber) {
Create div to hold line number.
10            const lineNumberDiv = createElement('div');
11            lineNumberDiv.properties['data-line-number'] = '';
12            lineNumberDiv.children = [{type: 'text', value: 
13                lineNumber ? lineNumber.toString() : ' ' }];
Put code tokens under sibling div.
14            const lineCodeDiv = createElement('div');
15            lineCodeDiv.properties['data-line-code'] = '';
16            lineCodeDiv.children = line.children;
17            line.children = [lineNumberDiv, lineCodeDiv];
18        }
19        else {
Lines without a number are skips in this example.
20            const lineSkip = createElement('div');
21            lineSkip.properties['data-line-break-skip'] = '';
22            const topSide = createElement('div');
23            topSide.properties['data-line-break-top'] = '';
24            const bottomSide = createElement('div');
25            bottomSide.properties['data-line-break-bottom'] = '';
26            line.properties['data-line-break'] = '';
27            line.children = [topSide, lineSkip, bottomSide];
28        }
29        return line;
30    },
31    // ...
32};

Finally, we can give some basic styling that approximates what it looks like above.

css

1.astro-code {
2    & span[data-line-break] {
We want [topSide, lineSkip, bottomSide] to flow vertically.
3        display: flex;
4        flex-direction: column;
5        justify-content: center;
6        align-items: center;
7        width: 100%;
8        height: 4rem;
9
10        & [data-line-break-top],
11        & [data-line-break-bottom] {
The sides should have the same color as the theme background.
12            @media (prefers-color-scheme: light) {
13                background-color: var(--shiki-light-bg);
14            }
15            @media (prefers-color-scheme: dark) {
16                background-color: var(--shiki-dark-bg);
17            }
Keep size fixed.
18            flex-basis: 1rem;
19            flex-grow: 0;
20            flex-shrink: 0;
21        }
22
23        & [data-line-break-top] {
Add some drop shadow from the top. Realistically the colors here need to be styled by theme as well.
24            box-shadow: 0px 5px 0.75rem 0.4rem #999
25        }
26
27        & [data-line-break-skip] {
28            background-color: grey;
29            flex-basis: 2rem;
30            flex-grow: 0;
31            flex-shrink: 0;
32        }
33    }
34}

Obviously, the styles could use some tuning, but the code here took about half an hour to write. I’ve spent more time just trying to get the font right in Wordpress.