Thoughts .toString()

Shiki: Syntax highlighting for inline code in Astro


Note: Between Astro version 5.1.1 and 5.1.7, getSingletonHighlighterCore no longer works. There is a “must invoke loadWasm first” error, but Vite module runner crashes when web assemblies are loaded. Luckily, the compatible alternative createHighlighter has mostly the same interface and methods, except perhaps that it doesn’t handle caching out of the box. See this post for more details.

Astro is shipped with Shiki, which can be configured directly through its built-in defineConfig function.

/astro.config.tstypescript
1export default defineConfig({
2    // ...
3    markdown: {
4        shikiConfig: {
5            themes: {
6                light: 'github-light-default',
7                dark: 'github-dark-dimmed',
8            },
9            defaultColor: false,
10            transformers: [],
11            // ...
12        },
13        rehypePlugins: [],
14        remarkPlugins: [],
15        // ...
16    }
17});

Reading through the documentation, I noticed that there is no option to also apply Shiki to anything besides code blocks in Markdown content. If we wanted to add syntax highlighting elsewhere, how do we do it? The answer lies in the rehypePlugins that are applied to HTML (as opposed to remarkPlugins that are applied to Markdown), and we could gather some inspiration from the Rehype Pretty Code library for how to build one.

Figuring out the Rehype plugin

The specific goal is to apply syntax highlighting to inline code (fenced by a single backtick in Markdown). Right-clicking the element in a browser, we see that the generated code looks like this.

html
Block code.
1<pre class="astro-code astro-code-themes">
2    <code>
3        <span class="line">
4            <span style="--shiki-dark:#F47067">export default </span>
5            <span style="--shiki-dark:#DCBDFB">defineConfig</span>
6            <span style="--shiki-dark:#F69D50">({</span>
7        </span>
8        <!-- more lines... -->
9    </code>
10</pre>
Inline code.
11<code>export default defineConfig({</code>

So the pattern that we have to match in the tree is a <code> without a <pre> parent or parent with the class .astro-code, and only has text as a child. We can define a function that looks like this.

typescript
1import { ElementContent } from 'hast';
2const isInlineCode = (node: ElementContent, parent: ElementContent): boolean => {
3    return (
4        node.type === 'element' && parent.type === 'element' &&
5        (
6            parent.tagName !== 'pre' || 
7            !('class' in parent.properties) ||
8            !parent.properties['class'].includes(astro-code)
9        ) &&
10        node.tagName === 'code' &&
11        node.children[0].type === 'text'
12    );
13};

We’ll see something similar in Rehype Pretty’s code base. The following is an excert from the Github repository of Rehype Pretty Code. Specifically, it’s from the file /packages/core/src/index.ts.

rehype-pretty-code/packages/core/src/index.tstypescript
227return async (tree) => {
The cachedHighlighter is an instance of Shiki. It's meant to be a singleton, such that all languages and themes are loaded and processed only once, instead of having to do so at every code block.
228    const langsToLoad = new Set<string>();
229    const highlighter = await cachedHighlighter;
230    if (!highlighter) return;
231
This function comes from the 'unist-util-visit' package. It's an in-order traversal through a HAST (tree) representing an HTML element (1st argument) that runs a given function (3rd argument) on nodes that match a criteria (2nd argument). In this case, it runs for all node.type === 'element'.
232    // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation>
233    visit(tree, 'element', (element, _, parent) => {
Current node matches the inline code pattern described above.
234      if (isInlineCode(element, parent, bypassInlineCode)) {
235        const textElement = element.children[0];
236        if (!isText(textElement)) return;
237        const value = textElement.value;
238        if (!value) return;
Code either has a language denoted by a suffix containing {:language}, or its 'plaintext'. All languages are added into a Set.
239        const lang = getInlineCodeLang(value, defaultInlineCodeLang);
240        if (lang && lang[0] !== '.') {
241          langsToLoad.add(lang);
242        }
243      }
skip to line 261
261 try { 262 await Promise.allSettled( 263 Array.from(langsToLoad).map((lang) => { 264 try { Load all languages found into the singlton instance of Shiki so it can be processed once for all code blocks. 265 return highlighter.loadLanguage( 266 lang as Parameters<typeof highlighter.loadLanguage>[0], 267 ); 268 } catch (e) { 269 return Promise.reject(e); 270 } 271 }), 272 ); 273 } catch (e) { 274 console.error(e); 275 } 276 277 // biome-ignore lint/complexity/noExcessiveCognitiveComplexity: <explanation> 278 visit(tree, 'element', (element, _, parent) => { 279 if (isInlineCode(element, parent, bypassInlineCode)) { 280 const textElement = element.children[0]; 281 if (!isText(textElement)) return; 282 const value = textElement.value; 283 if (!value) return; 284 This part looks identical to the implementation getInlineCodeLang, which was also called in the first iteration, except that it also strips the language part from the code. 285 const keepLangPart = /\\{:[a-zA-Z.-]+}$/.test(value); 286 const strippedValue = keepLangPart 287 ? value.replace(/\\({:[a-zA-Z.-]+})$/, '$1') 288 : value.replace(/{:[a-zA-Z.-]+}$/, ''); 289 textElement.value = strippedValue; 290 const lang = keepLangPart 291 ? '' 292 : getInlineCodeLang(value, defaultInlineCodeLang); 293 const isLang = lang[0] !== '.'; 294 if (!lang) return; 295 296 let codeTree: Root; 297 In this second iteration through the tree, run the syntax highlighting on the code using its language, which should have been loaded from the first iteration. 298 if (isLang) { 299 try { 300 codeTree = hastParser.parse( 301 highlighter.codeToHtml(strippedValue, getOptions(lang)), 302 ); 303 } catch { 304 codeTree = hastParser.parse( 305 highlighter.codeToHtml(strippedValue, getOptions('plaintext')), 306 ); 307 } 308 } else {

In my opinion, that was quite redundant, though to be fair since we’re not publishing our blog on npm, we can get away with narrowing the scope a bit. Specifically, the following code that we’ll adapt will only work on bundled languages and themes, which we assume will be global. In other words, there won’t be one code block with one theme, and another block with some other theme. Though, if this is desired, it can be achieved by naming different theme options, or caching multiple highlighters.

The most significant change I propose is that instead of adding languages into a Set in the first traversal, we add the language, the code and maybe even a reference to the node containing the code in an object. That way we will only need a single traversal, since we’ll have a list of every inline code element on a page in an Array. I’m not sure why Shiki found it necessary to be so wordy, but maybe that’s my inexperience talking?

Figuring out the Shiki highlighter

Since this is being done for a single blog, we will leverage the existing config Astro has so that we don’t have to define themes in two different places. So in astro.config.ts, we could move our Shiki themes to a common file so that they could be referenced. We’ll also add a Rehype plugin at this time.

/astro.config.tstypescript
1import { shikiThemes } from './src/markdown/settings';
2import inlineCodePlugin from './src/markdown/inline-code';
3
4export default defineConfig({
5    // ...
6    markdown: {
7        shikiConfig: {
8            themes: shikiThemes,
9            // ...
10        },
11        rehypePlugins: [inlineCodePlugin],
12        // ...
13    }
14});
/src/markdown/settings.tstypescript
1import type { ThemePresets } from '@astrojs/markdown-remark';
2import type { ThemeRegistration, ThemeRegistrationRaw } from 'shiki';
3import type { ShikiConfig } from 'astro';
4
5export type ThemeTypes = type ThemeTypes = 
6    ThemeRegistration | ThemeRegistrationRaw | ThemePresets;
7export type ShikiThemes = Record<string, ThemeTypes>;
8
9export const shikiThemes: ShikiThemes =  {
10    light: 'github-light-default',
11    dark: 'github-dark-dimmed',
12};
13export const shikiConfig: Partial<ShikiConfig> = {
14    themes: shikiThemes,
15    defaultColor: false,
16    transformers: []
17};
18// ... etc.

Astro’s configs expect certain types like the ones in the Record above. It throws an error if we try to directly assert it “as string”. However, the Shiki highlighter expects yet another type. Most blogs will probably just use the bundled themes and languages in Shiki instead of defining their own, so that is what we’ll assume. This time, we take a hint from Shiki’s bundle-factory.ts

shiki/packages/core/src/constructors/bundle-factory.tstypescript
68export function createdBundledHighlighter<BundledLangs extends string, BundledThemes extends string>(
69  arg1: Record<BundledLangs, LanguageInput> | CreatedBundledHighlighterOptions<BundledLangs, BundledThemes>,
70  arg2?: Record<BundledThemes, ThemeInput>,
71  arg3?: HighlighterCoreOptions['loadWasm'],
72): CreateHighlighterFactory<BundledLangs, BundledThemes> {
73  let bundledLanguages: Record<BundledLangs, LanguageInput>
74  let bundledThemes: Record<BundledThemes, ThemeInput>
75  let engine: () => Awaitable<RegexEngine>
skip to line 105
105 function resolveTheme(theme: ThemeInput | BundledThemes | SpecialTheme): ThemeInput | SpecialTheme { 106 if (isSpecialTheme(theme)) 107 return 'none' Since the themes in Astro's config are all defined by string, the bundledTheme here is the only type that's relevant. This object is declared on line 74, which takes in a BundledThemes type as key, returning a ThemeInput as value. 108 if (typeof theme === 'string') { 109 const bundle = bundledThemes[theme] 110 if (!bundle) 111 throw new ShikiError(`Theme \`${theme}\` is not included in this bundle. You may want to load it from external source.`) 112 return bundle 113 } 114 return theme 115 }

Therefore, our own simplified version of a “resolve” function should input Astro’s expected types, and output as described above.

/src/markdown/inline-code.tstypescript
1import { type ThemeTypes, shikiThemes } from './settings';
2import type { 
3    ThemeInput, LanguageInput, 
4    BundledThemes, BundledLanguages,
5    getSingletonHighlighterCore, 
6    HighlighterCore 
7} from 'shiki';
8import type { Element } from 'hast';
9
10const resolveLanguage = (languageText: string): LanguageInput => {
11    return bundledLanguages[languageText as BundledLanguages];
12};
13const resolveTheme = (themeText: ThemeTypes): ThemeInput => {
14    return bundledThemes[(themeText! as string) as BundledThemes];
15};

Now that the highlighter settings are ready, we can now initialize the cached highlighter and write a function to load languages, which seems to be a large consideration in the Rehype Pretty Code example. We’ll also define an interface for the data that will be extracted from each inline code instance we encounter from the visit function tree traversal.

/src/markdown/inline-code.tstypescript
17interface InlineCodeInstance {
18    node: Element,
19    code: string,
20    language: string
21}
At some version between Astro 5.1.1 and 5.1.7, getSingletonHighlighterCore is no longer compatible. Use "import { createHighlighter } from 'shiki';" instead.
22const getCachedHighlighter = (): Promise<HighlighterCore> => {
23    return getSingletonHighlighterCore({
24        langs: [resolveLanguage('plaintext')], 
25        themes: Object.values(shikiThemes).map(resolveTheme)
26    });
27};
28const loadLanguage = async (
29    highlighter: HighlighterCore, 
30    instance: InlineCodeInstance,
31    loadedLanguages: Set<string>
32): Promise<void> => {
33    if (!loadedLanguages.has(instance.language)) {
34        const lang = resolveLanguage(instance.language);
35        if (lang) {
36            await highlighter.loadLanguage(lang);
37        }
38        else {
If language is not recognized by Shiki to be one of its bundled languages, then interpret it as plain text.
39            console.error(
40                "Invalid language on inline code, using 'plaintext' instead.",
41                instance.code);
42            instance.language = 'plaintext';
43        }
Add loaded languages to a set to prevent redundant loading.
44        loadedLanguages.add(instance.language);
45    }
46};

Putting it all together

Now let’s actually create the plugin, find all instances of inline code with a tree traversal, then run syntax highlighting on those nodes. Rather than using an in-house traversal for this one, we’ll just go ahead and npm install unist-util-visit.

/src/markdown/inline-code.tstypescript
48const inlineCodePlugin = () => {
49    const cachedHighlighter = getCachedHighlighter();
50    return async (tree: Root) => {
51        const instances: InlineCodeInstance[] = [];
52        const loadedLanguages: Set<string> = new Set(['plaintext']);
Retrieve all instances of inline code in document.
53        visit(tree, 'element', (node: Element, 
54                                _index: number | undefined,
55                                parent: Element | Root | undefined) => {
56
57            if (isInlineCode(node, parent)) {
58                const textNode = node.children[0] as Text;
59                const value: string = textNode.value;
Match code{:language}, where language conists of one or more alphabet characters. Example: For "const x = 5;{:js}", code = "const x = 5;", and language = "js".
60                const match: Array = value.match(/^(.+){:(\w+)}$/);
61                if (match) {
62                    const [_matchText, code, language] = match;
Store a reference to the Element, and separate code from the language (from {:language}). The example from Rehype Pretty Code only extracted the language, and had to take a second iteration to strip the language from the code. Since we get both the code and language from the regular expression, we can easily strip the language by replacing the contents with just the code.
63                    instances.push({ node, code, language });
64                }
65                else {
If no language is passed, then interpret it as plain text. Shiki will still color it with the theme font and background, maintaining a uniform look.
66                    instances.push({ node, code: value, language: 'plaintext' });
67                }
68            }
69        });
70
71        const highlighter = await cachedHighlighter;
72        for (const instance of instances) {
73            await loadLanguage(highlighter, instance, loadedLanguages);
Create a new tree containing all the syntax highlighting for each inline code instance. The themes and languages are already loaded, but we can tell Shiki which we want it to interpret with. The output will be a HAST since it's a Rehype plugin.
74            const newRoot = highlighter.codeToHast(instance.code, {
75                lang: instance.language, 
76                themes: shikiThemes,
Shiki adds CSS variables for both light and dark themes if there's no default.
77                defaultColor: false
78            });
Replace the original <code> element with the syntax highlighted one.
79            const newPre = newRoot.children[0] as Element;
80            const newCode = newPre.children[0] as Element;
81            instance.node = newCode;
82            instance.node.properties = newPre.properties;
83            instance.node.properties['data-inline-code'] = '';
84
Replace shiki with astro-code by convention.
85            const classes: Array = instance.node.properties['class'];
86            instance.node.properties['class'] = classes.map(
87                (className: string) => className.replace('shiki', 'astro-code'));
88        }
89    }
90};
91export default inlineCodePlugin;

When Astro intreprets Markdown content, it will now call the Rehype plugin we just built, then convert the output back to HTML. To show that this works `const x = 5;{:js}` now renders to (without escape characters): const x = 5;