Christian Gill

tsearch first public version

Originally posted on dev.to/gillchristian.

tsearch first public version

Last week on the first entry of this devlog I shared how I came up with the idea of writing a search engine for TypeScript functions.

Since then I’ve been working on releasing the first version of the PoC. And I'm glad to say that tsearch.io is live since this Monday.

In this entry I want to share how I built this PoC and the limitations it has.

There are three main parts:

  1. A cli that extracts type information from exported functions of the TS files (.ts & .tsx) from a directory.
  2. A node server that performs searches on the extracted functions.
  3. A web client to interact with the server.

Extracting type information

To extract the types I used ts-simple-ast:

TypeScript Compiler API wrapper. Provides a simple way to navigate and manipulate TypeScript and JavaScript code.

The idea is simple, load a directory with ts-simple-ast and output an array of all the exported functions.

Let’s say we have module, strs, with only one exported function:

// ~/code/strs/src/index.ts

export function replace(search: RegExp, replace: string, str: string) {
  return str.replace(searchValue, replaceValue)
}

The output would be something like:

[
  {
    name: 'replace',
    parameters: [
      {
        name: 'search',
        type: 'RegExp',
      },
      {
        name: 'replace',
        type: 'string',
      },
      {
        name: 'str',
        type: 'string',
      },
    ],
    returnType: 'string',
    module: 'some-module',
  }
]

I did not include any extra information of the types, like generics or symbol. For more "elaborate" types (e.g. arrays, functions, unions, generics, etc) this is important because matching cannot be done by comparing strings of type names. For example, string | number is the same type as number | string but if compared by names, like I'm doing now, they do not match.

The average time for extracting the types of one project is around ~10s. This is fine for one directory, but I wanted to run it on all the packages available in DefinitelyTyped (DT), 5639 at the moment of writing this 🎉

For that I wrote a Go program that runs the extractor on each of those packages and then writes all the output together into a JSON file. I chose Go because of how easy it is to run things in parallel and sync everything.

In my machine (8 core i7 7th gen, 16GB RAM) it takes almost 3 hours running at 100% CPU usage to extract all the types from DT resulting in a 5.8mb JSON file. Not bad at all, but clearly not doable on the 500mb of RAM and 1 core DigitalOcean droplet I have 😂

Searching

Instead of supporting actual TS type signatures and having to get into parsing it I came up with this very simple query syntax that is trivial to parse. A comma-separated list for the types of the parameters, an arrow (=>) and then the return type. The downside is it's not so powerful or even useful for all cases, but hey it's a PoC after all!

(search: RegExp, replace: string, str: string) => string
// would be
RegExp, string, string => string
<A>(xs: A[]) => A
// would be
A[] => A

I still think some kind of syntax as well as a few logical defaults (e.g. single capital letters as generics) could result in a much cleaner and easier way to search than actual valid TS signatures.

<A, B>(array: A[], fn: ((a: A) => B)) => B[]
// vs
A[], (A => B) => B[]

But using valid TS has its merits too. It means one thing less to learn. Also pasting a signature would just work :tm:

As per the search algorithm, this first iteration is pretty simple, even naive. I came up with a weighing function that assigns a value to the compared if the return type matches as well as how many of the types of parameters match against the query.

weight = 0
n = query.parameters.length

if n === compared.parameters.length
  weight += m/n // m: matches by position
  if weight > 0
    weight += m/n + 1 // m/n added twice
else
  weight += m/n // m: matches in any position

if query.returnType === compared.returnType
  weight += 2

And then it picks the top hundred ones that scored more than 0. As you can see is not a fancy search algorithm but it already provides some matches for simple searches (e.g. A[] => A, number => number[]).

One extra thing I did not mention, if the search query is a single word it searches by function and types names instead. That idea came directly from Hoogle.

What's next?

Here are some of the challenges I want to tackle next:

  • Subscribe to DT repo and run the extractor only on the packages that changed.
  • Create a curated list of TS projects that publish their own types and update the extractor to run them.
  • Extract more information of the functions types.
  • Improve the query syntax and/or support valid TS signatures.
  • Make TS semantics part the search algorithm.

There is A LOT of room for improvement for sure but I'm happy that people is already interested in the idea and giving it a try. That was exactly the goal of making the project public in such early stage.

For those that would like to contribute I created help wanted issues in the repo and will be adding more soon.

I look forward get your feedback, comments and suggestions.

Until the next time!


almost 4 years ago

Christian Gill