Blogging about and describing code, to me, is an act of irony. The point of having a logical, mathematical language to begin with is to avoid and disambiguate the vagueness in natural language. Yet, it remains important to communicate and bridge that gap. However, if a picture is worth a thousand words, then surely good highlighting is worth at least a hundred. Instead of directing an audience what to look for in a block, and where to look for it, highlighting a certain line or phrase focuses their attention more directly. This is another reason I had to go with a static site generator framework or build my own blog, because good luck trying to get all these features into Wordpress or Wix!
For the purposes of this post, I assume that using a complicated regular expression or string splitting, an input is derived in the following format.
Then I will break down how to apply each piece of this information.
Highlighting entire lines
First, highlighting lines is as simple as applying a class or attribute to the line element that’s within range.
/src/shiki/transforms/highlights.tstypescript
1import{Element}from'hast';2//...Parseinput.3consthighlightEntireLine=(4line:Element,5segment:HighlightSegment6):void=>{7line.properties['data-highlighted-line']='';Set an attribute for further styling.8if(segment.dataId){9line.properties['data-highlighted-line-id']=segment.dataId;10}11};
Utility functions
Highlighting parts of lines is a little more involved, especially when it’s possible to highlight from and to any arbitrary character, even if they’re in the middle of spans. The strategy here would then be to traverse the HAST trees (where the line is the root), then make splits to Element nodes so that the highlighted parts can be grouped under a <mark>.
With that in mind, let’s write some functions that make tree traversal easier. The first task is to be able to traverse all the Text nodes in-order. Since these lines are about 3 levels deep, we won’t have to worry about any stack issues. To keep this simple, a recursive function shall do.
/src/shiki/utils.tstypescript
In HAST code, an Element has access to its children, but not its siblings. So it's helpful if the parent node is accessible. The TraversalFunction returns false when we want to stop traversing.1exporttypeTraversalFunction=(2node:ElementContent,3parent:Element|null,4siblingIndex:number5)=>boolean;67exportconstinOrderTraversal=(8node:ElementContent,9parent:Element|null,10siblingIndex:number,11func:TraversalFunction12):boolean=>{Call TraversalFunction on current node. The function should save some data before telling whether it wants traversal to proceed.13if(!func(node,parent,siblingIndex)){14returnfalse;15}16if(node.type==='element'){Traverse each child recursively.17for(leti=0;i<node.children.length;i++){18constchild=node.children[i];19if(!inOrderTraversal(child,node,i,func)){20returnfalse;21}22}23}24returntrue;25};
With this, we could easily get all the text of a node.
/src/shiki/utils.tstypescript
1import{ElementContent}from'hast';type ElementContent = Element | Text | Comment | Root2exportconstgetNodeText=(node:ElementContent):string=>{3if(node.type!=='element'){4returnnode.type==='text'?node.value:'';5}6Aggregate all text from Text nodes7lettextParts:string[]=[];8inOrderTraversal(node,null,0,(node,_parent,_index)=>{9if(node.type==='text'){10textParts.push(node.value);11}12returntrue;13});14returntextParts.join('');15};
For the last utility, we’ll make a split within an Element based on the character index of its text.
/src/shiki/utils.tstypescript
1exportenumKeepSide{Left,Right};2typeSiblingElement={3node:ElementContent,4index:number5};6exportconstcloneElement=(node:Element):Element=>({...node,properties:{...node.properties}});7exportconstsplitElement=(8node:Element,9splitIndex:number,10keepSide:KeepSide11):Element|null=>{Maintain a hierarchical list of nodes from root to the current node. Once the index where the split will be made is found in a Text node, this list will contain all the nodes that will be considered for splitting.12constsplitChain:SiblingElement[]=[];If the split index is in the middle of a Text or Element node in a given tree lavel, then the node will have to be split and the new node stored and returned. If the split index is between two Text or Element nodes, then the split node will remain null.13letsplitNode:Element|null=null;14inOrderTraversal(node,null,0,(node,parent,index)=>{15letlastChainIndex=splitChain.length-1;16letresume=true;17if(node.type==='element'&&node.children.length>0){Remove nodes until the parent of current element is the last node in the chain.18while(parent&&parent!==splitChain[lastChainIndex].node){19splitChain.pop();20lastChainIndex--;21}There is only one Text node per Element. If the first child isn't text, then there are only Element children.22constfirstChild=node.children[0];23if(firstChild.type==='text'){24if(splitIndex<firstChild.value.length){Split is in the middle of current Text node. The side that is "kept" remains on the original nodes. The other side is moved onto the split node.25if(splitIndex>0){26constleftText=firstChild.value.substring(0,splitIndex);27constrightText=firstChild.value.substring(splitIndex);28constnewText:Text={type:'text',value:''};29if(keepSide===KeepSide.Left){30firstChild.value=leftText;31newText.value=rightText;32}33else{34newText.value=leftText;35firstChild.value=rightText;36}const cloneElement = (node: Element): Element => { ...node, properties: {...node.properties }};37splitNode=cloneElement(node);38splitNode.children=[newText];39}40resume=false;41}42else{If split has not been reached, subtract index by the length of the text that has been seen.43splitIndex-=firstChild.value.length;44}45}4647splitChain.push({node,index});48}49returnresume;50});From the bottom of the split chain, use the parent and sibling index where the split occurs at each level to split half of the tree onto a new tree to be returned.51for(leti=splitChain.length-2;i>=0;i--){52constparent=splitChain[i].nodeasElement;53letsplitChildIndex=splitChain[i+1].index;Split was never made, so the split chain should be on the left54if(resume){55splitChildIndex++;56}Suppose split is made on index s. If the left side is kept, then children represents the right side nodes: [splitNode, ...[s+1, s+2, ..., n-1]]. If the right side is kept, then children represents the left side nodes: [...[0, 1, ..., s], splitNode].57constchildren:ElementContent[]=[];58if(splitNode&&keepSide===KeepSide.Left){59children.push(splitNode);60splitChildIndex++;61}62if(keepSide===KeepSide.Left){63children.push(...parent.children.splice(splitChildIndex));64}65else{66children.push(...parent.children.splice(0,splitChildIndex));67}68if(splitNode&&keepSide===KeepSide.Right)children.push(splitNode);69Aggregate children under new parent, which becomes the split node for the level above.70if(children.length>0){71splitNode=cloneElement(parent);72splitNode.children=children;73}74}7576returnsplitNode;77};
Parse and mark
With the heavy lifting out of the way, the only left to do is to parse the text of each line to find the indices where the highlighting will occur, and then make the necessary splits to mark those Element nodes. To find the indices, we aggregate the text values of the leaves of the tree, then use regular expression.
Finally, for each one of these ranges, we make a mark.
/src/shiki/transforms/highlights.tstypescript
1importtype{Element,ElementContent}from'hast';2import{splitElement}from'../utils';34constcreateElement=(tagName:string):Element=>{5return{type:'element',tagName,properties:{},children:[]};6}7consttransformer:ShikiTransformer={8line(line:Element,index:number){9//...Parseinput.10constsegment:HighlightSegment=parseInput(line);Only match startLine if endLine is not specified.11if(segment.startLine&&!segment.endLine){12segment.endLine=segment.startLine+1;13}14if(index>=startLine&&index<endLine){If no ranges or search terms given, highlight entire line.15if(!(segment.startChar||segment.termStr||segment.termRegexp)){16highlightEntireLine(line,segment);17}18else{19constranges:HighlightRange[]=getRangesInLine(line,segment);20for(consthighlightofranges){We don't want to split the line Element itself, only its children. So we make a new <mark> Element under it.21constmarked=createElement('mark');22marked.children=line.children;23marked.properties['data-highlighted']='';24if(segment.dataId){25marked.properties['data-highlighted-id']=segment.dataId;26}Each highlight makes two potential splits: one at its start and one at its end. For the start, we want the <mark> to hold the right side and another element to hold the left.27consttempLeftElement=splitElement(28marked,highlight.start,KeepSide.Right);29Since this second split is on the <mark> Element, that becomes the kept side on the left. The <mark> Element only holds the children after the first split at the start index, so the characters before that aren't counted.30consttempRightElement=splitElement(31marked,highlight.end-highlight.start,KeepSide.Left);Combine overlapping marks.32constmarkedSpans:ElementContent[]=[];33for(leti=0;i<marked.children.length;i++){34constchild=marked.children[i];35if(child.type==='element'&&child.tagName==='mark'){36markedSpans.push(...child.children);37}38else{39markedSpans.push(child);40}41}42marked.children=markedSpans;Combine temporary Element nodes back into line. The <mark> Element always exists since it's the kept side on both splits. The left and right elements, however, are split nodes, which returns null if nothing is selected.43line.children=[];44if(tempLeftElement){45line.children.push(...tempLeftElement.children);46}47line.children.push(marked);48if(tempRightElement){49line.children.push(...tempRightElement.children);50}51}52}53}54returnline;55}56};
And with this, the line now holds the original children spans, but with the highlighted segments grouped under <mark> elements, with attributes if specified. These mark elements can be styled by the tag. Both the mark and line can also be styled by specified attributes. For example,