tsearch first public version
Originally posted on dev.to/gillchristian.
Last week on the first entry of this devlog I shared how I came up with the idea of writing a search engine for TypeScript functions.
Since then I’ve been working on releasing the first version of the PoC. And I'm glad to say that tsearch.io is live since this Monday.
In this entry I want to share how I built this PoC and the limitations it has.
There are three main parts:
- A cli that extracts type information from exported functions of the TS files (
.ts
&.tsx
) from a directory. - A node server that performs searches on the extracted functions.
- A web client to interact with the server.
Extracting type information
To extract the types I used ts-simple-ast:
TypeScript Compiler API wrapper. Provides a simple way to navigate and manipulate TypeScript and JavaScript code.
The idea is simple, load a directory with ts-simple-ast
and output an array of all the exported functions.
Let’s say we have module, strs
, with only one exported function:
// ~/code/strs/src/index.ts
export function replace(search: RegExp, replace: string, str: string) {
return str.replace(searchValue, replaceValue)
}
The output would be something like:
[
{
name: 'replace',
parameters: [
{
name: 'search',
type: 'RegExp',
},
{
name: 'replace',
type: 'string',
},
{
name: 'str',
type: 'string',
},
],
returnType: 'string',
module: 'some-module',
}
]
I did not include any extra information of the types, like generics or symbol. For more "elaborate" types (e.g. arrays, functions, unions, generics, etc) this is important because matching cannot be done by comparing strings of type names. For example, string | number
is the same type as number | string
but if compared by names, like I'm doing now, they do not match.
The average time for extracting the types of one project is around ~10s. This is fine for one directory, but I wanted to run it on all the packages available in DefinitelyTyped (DT), 5639 at the moment of writing this 🎉
For that I wrote a Go program that runs the extractor on each of those packages and then writes all the output together into a JSON file. I chose Go because of how easy it is to run things in parallel and sync everything.
In my machine (8 core i7 7th gen, 16GB RAM) it takes almost 3 hours running at 100% CPU usage to extract all the types from DT resulting in a 5.8mb JSON file. Not bad at all, but clearly not doable on the 500mb of RAM and 1 core DigitalOcean droplet I have 😂
Searching
Instead of supporting actual TS type signatures and having to get into parsing it I came up with this very simple query syntax that is trivial to parse. A comma-separated list for the types of the parameters, an arrow (=>
) and then the return type. The downside is it's not so powerful or even useful for all cases, but hey it's a PoC after all!
(search: RegExp, replace: string, str: string) => string
// would be
RegExp, string, string => string
<A>(xs: A[]) => A
// would be
A[] => A
I still think some kind of syntax as well as a few logical defaults (e.g. single capital letters as generics) could result in a much cleaner and easier way to search than actual valid TS signatures.
<A, B>(array: A[], fn: ((a: A) => B)) => B[]
// vs
A[], (A => B) => B[]
But using valid TS has its merits too. It means one thing less to learn. Also pasting a signature would just work :tm:
As per the search algorithm, this first iteration is pretty simple, even naive. I came up with a weighing function that assigns a value to the compared
if the return type matches as well as how many of the types of parameters match against the query
.
weight = 0
n = query.parameters.length
if n === compared.parameters.length
weight += m/n // m: matches by position
if weight > 0
weight += m/n + 1 // m/n added twice
else
weight += m/n // m: matches in any position
if query.returnType === compared.returnType
weight += 2
And then it picks the top hundred ones that scored more than 0. As you can see is not a fancy search algorithm but it already provides some matches for simple searches (e.g. A[] => A
, number => number[]
).
One extra thing I did not mention, if the search query is a single word it searches by function and types names instead. That idea came directly from Hoogle.
What's next?
Here are some of the challenges I want to tackle next:
- Subscribe to DT repo and run the extractor only on the packages that changed.
- Create a curated list of TS projects that publish their own types and update the extractor to run them.
- Extract more information of the functions types.
- Improve the query syntax and/or support valid TS signatures.
- Make TS semantics part the search algorithm.
There is A LOT of room for improvement for sure but I'm happy that people is already interested in the idea and giving it a try. That was exactly the goal of making the project public in such early stage.
For those that would like to contribute I created help wanted
issues in the repo and will be adding more soon.
I look forward get your feedback, comments and suggestions.