Baron Schwartz, Peter Zaitsev, Vadim Tkachenko: High Performance MySQL



book cover white Apart from using Cloud SQL, I haven’t touched MySQL for a while, so I decided to freshen up things and read High Performance MySQL by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko. The book feels solid, it explains how MySQL works (and worked before) inside, what problems storage engines/parser/optimizer/etc have and how to leverage them. It’s kind of nice that a big part of the book is about MySQL scaling. And it’s also good that the book has a lot of information about troubleshooting, debugging, profiling and some MySQL related tools.

Although the book is probably a bit outdated, it covers MySQL versions up to 5.5, but nowadays the latest version is 5.7.

A year in Prague



Prague

From September 2016 to August 2017 I was living in Prague, it was almost a year. I was studying the Czech language there because I was planning to continue my education and finish university here. But I decided that studying and working full time simultaneously is too much for me, so I’m moving to the Netherlands for work. Got B2 in Czech language and got accepted to two universities in Prague though.

I was living in the country with a student visa. CR student visa is fairly easy to obtain, it just requires confirmation from a university and around €5000 on a bank account. With this kind of visa, you can entry Czech Republic multiple times and travel all over Schengen Area.

Because I was studying, I managed to live in a students campus, but in something like an apartment, but with a shared kitchen for 7500 CZK (€300). The only problem that it was far away from the city center. But the city is not too big, and as a student, I’ve got a travel card almost for free. So the location wasn’t a big problem.

Internet was included in my rent, I had something around 50 Mbit/s. Mobile internet and mobile operators in the Czech Republic are the worst. I was using Vodafone and initially, I was paying 519 CZK (€20) for 4GB, but somehow in June this option disappeared and I started to pay 399 CZK (€15) for just 1.5GB.

Food is cheap there, I was spending around 800 CZK (€30) per week for everyday chicken/meat/fish. Although in Prague there’s very small selection of fish and almost no seafood. The booze is very cheap there, 15-25 CZK (€0.5-1) for a fairly good beer in a supermarket and around 30 CZK (€1.2) in a pub.

Summing up everything, Prague is a very nice city with a lot of cultures, attractions, and events. It’s in the middle of Europe and it’s very easy to travel from there. And I kind of glad that I was living there.

AR* Geolocation Snake



instruction     gameplay

I like rollerskating, but sometimes it’s kind of boring to skate by the same routes. I was using Pokemon GO for making the route more fun, but pokestops have fixed locations and catching the pokemons after a few months is kind of boring too. So I thought that it can be interesting to randomly select places to skate. And snake game makes it even more interesting and challenging because I need to select a more complex route for avoiding snake’s tail and not losing the game.

Although sometimes the app puts candies on the other side of the road or requires me to ride on a sidewalk with intolerable quality, I solved it with an option to regenerate the candy.

TLDR: the app, source code.

What’s inside

The app is written in JavaScript with flow, React Native and redux with redux-thunk. For the map, I used react-native-maps which are nice, because it works almost out of the box. So mostly the game is very simple.

The first challenging part is candies generation. As a first attempt the app uses nearby search from Google Places API (hah, it’s already deprecated) with a specified radius, filters places with the radius greater than minimal and selects random place. As we can’t just use coordinates for filtering by distance, I used node-geopoint library.

const generateCandyFromPlacesNearby = async (
  position: Position,
): Promise<?Position> => {
  const positionPoint = new GeoPoint(position.latitude, position.longitude);

  const response = await fetch(
    "https://maps.googleapis.com/maps/api/place/nearbysearch/json?" +
      `location=${position.latitude},${position.longitude}` +
      `radius=${config.CANDY_MAX_DISTANCE}`,
  );
  const { results } = await response.json();

  const availablePositions = results.filter(({ geometry }) => {
    const point = new GeoPoint(geometry.location.lat, geometry.location.lng);

    return positionPoint.distanceTo(point, true) > constants.CANDY_MIN_DISTANCE;
  });

  return sample(availablePositions);
};

If there’s no place with appropriate distance in the specified radius, the app just chooses a random latitude and longitude offset within specified bounds.

const generateCandyFromRandom = (position: Position): Position => {
  const point = new GeoPoint(position.latitude, position.longitude);
  const [minNE, minSW] = point.boundingCoordinates(
    constants.CANDY_MIN_DISTANCE,
    undefined,
    true,
  );
  const [maxNE, maxSW] = point.boundingCoordinates(
    constants.CANDY_MAX_DISTANCE,
    undefined,
    true,
  );

  switch (random(3)) {
    case 0:
      return {
        latitude: random(minNE.latitude(), maxNE.latitude()),
        longitude: random(minNE.longitude(), maxNE.longitude()),
      };
    case 1:
      return {
        latitude: random(minSW.latitude(), maxSW.latitude()),
        longitude: random(minNE.longitude(), maxNE.longitude()),
      };
    case 2:
      return {
        latitude: random(minNE.latitude(), maxNE.latitude()),
        longitude: random(minSW.longitude(), maxSW.longitude()),
      };
    default:
      return {
        latitude: random(minSW.latitude(), maxSW.latitude()),
        longitude: random(minSW.longitude(), maxSW.longitude()),
      };
  }
};

And the last complicated part is detecting if the player touches snake’s tail. As we store tail as a list of coordinates, the game just checks if the head within aspecified radius of the tail parts.

export const isTouched = (
  a: Position,
  b: Position,
  radius: number,
): boolean => {
  const aPoint = new GeoPoint(a.latitude, a.longitude);
  const bPoint = new GeoPoint(b.latitude, b.longitude);

  return aPoint.distanceTo(bPoint, true) <= radius;
};

export const isSnakeTouchedHimself = (positions: Position[]): boolean =>
  some(positions.slice(2), position =>
    isTouched(positions[0], position, constants.SNAKE_TOUCH_RADIUS),
  );

Play Store, Github.

* like Pokemon GO without a camera.

Martin Kleppmann: Designing Data-Intensive Applications



book cover white Recently I wanted to read something about applications design and distributed systems, so I found and read Designing Data-Intensive Applications by Martin Kleppmann. Overall it’s a nice book with a bit of theoretical and practical information. It explains how a lot of things work inside, like databases, messaging systems, and batch/stream processing. The book is a bit high-level, but I guess because of that it’s easy and interesting to read.

Although the last chapter is a bit strange, a tinfoil hat kind of strange.

Kevin R. Fall and W. Richard Stevens: TCP/IP Illustrated, Volume 1



book cover white Recently I was interested how networks work and everywhere I found recommendations to read TCP/IP Illustrated by Kevin R. Fall and W. Richard Stevens. And it’s an educative book, which explains networks even on a physical level. Almost every chapter also contains information about possible problems and vulnerabilities, which is nice and interesting.

Although reading about the same stuff twice is kind of boring, almost everything is explained for IPv4 and IPv6. And not so small part of the books looks like a reference of packets layout and headers.

But I guess it’s nice to have at least a basic knowledge about networks and this book is more than enough.

How I was Full Stack and wrote a mediocre service for searching reaction gifs



screenshot

A while ago I discovered that people often use MRW as a reply in messengers. And not knowing that nowadays even Android keyboard have an option to search gifs, I decided to write some service with a mobile app, a web app and even a Telegram bot, that will do that.

The idea was pretty simple, just parse Reddit, somehow index gifs and allow users to search and share reaction gifs in all possible ways. The result was mediocre at best.

MRW result is mediocre at best

result is mediocre at best

The first part is the parser, it’s pretty simple and can do just two things:

  • index n top of all time posts from r/MRW on an initial run;
  • index n top today posts every 12 hours.

While indexing it gets appropriate links from Reddit’s own images hosting or Imgur, got additional information from nlp-service and put everything in ElasticSearch.

I decided to write it in Clojure because wanted to. The only problem was that Elastisch, a Clojure client for ElasticSearch, wasn’t (doesn’t?) work with the latest version of ElasticSearch. But ElasticSearch REST API is neat, and I just used it.

MRW library doesn’t work with the latest elasticsearch

library doesn't work with the latest elasticsearch

The next and the most RAM consuming part is the nlp-service, it’s written in Python with NLTK and Flask. It also can do just two things:

  • sentiment analysis of a sentence, like {"sentiment": "happiness"} for “someone congrats me”;
  • VADER, which is a some sort of sentiment analysis too, like {"pos": 0.9, "neg": 0.1, "neu": 0.2} for the same sentence.

It doesn’t work very well, because I’m amateur at best in NLP, and had a too small dataset. I was and still planning to make a better dataset with Mechanical Turk in the future.

MRW I have too small dataset

I have too small dataset

The last non-client part is the public facing API, it’s also very simple, written in Clojure, Ring and Compojure. It has just one endpoint /api/v1/search/?query=query. It just requests additional information for the query from nlp-service and searches appropriate gifs in ElasticSearch. Nothing interesting.

MRW public facing api is boring

public facing api is boring

The first client is the web app (source). It’s neat, has just one text input for query and written with ClojureScript and reagent. And it’s so small, that I don’t even use re-frame here.

MRW the web app is neat

the web app is neat

The second client is the mobile app (source). It can search for reaction gifs and can share found gifs to other apps. It’s written with React Native in JavaScript and works only on Android. Yep, I managed to write non-cross-platform RN app, but at least I’m planning to make it cross-platform and publish it to the AppStore.

MRW I managed to write non-cross-platform RN app

I managed to write non-cross-platform RN app

And the last and the most hipsterish client is the Telegram bot (source). It has three types of responses:

  • to /mrw query with appropriate reaction gif;
  • to just /mrw with famous Travolta gif;
  • to /help with obviously help message.

And it’s written in JavaScript with Node.js Telegram Bot API.

MRW I can’t find Travolta gif

I can't find Travolta gif

The last part is deploy. Everything is deployed on docker-cloud. I somehow managed to configure everything a few days before swarm mode announce, so it’s just stacks. But it wouldn’t be a problem to migrate to new swarm mode. The service is deployed as eight containers:

  • ElasticSearch;
  • nginx proxy;
  • letsencrypt nginx proxy companion;
  • the nlp-service;
  • the public API;
  • the parser;
  • the web app (data container);
  • the Telegram bot.

Almost everything worked out of the box, I only changed nginx proxy image to simplify serving assets of the web app. And it’s more than nice, when I push changes to github, docker-cloud rebuilds images and redeploys containers.

MRW almost everything works out of the box

almost everything works out of the box

Summing up everything, it was a totally full stack experience from the cluster on docker-cloud with microservices to the Telegram bot and the mobile app. And the result isn’t the worst part, the worst part is that as a part of my studying I’ve made a presentation for future Software Engineers about that service in Czech.

MRW I’ve made a presentation

I've made a presentation

Sources on github, the web app, the mobile app, the Telegram bot.

Import packages depending on Python version



While working on py-backwards and it’s setuptools integration I found an interesting problem, how to transparently allow people with different Python versions to use different packages. So, imagine we have a package with a structure:

package
  __init__.py
  _compiled_2_7
    package
      __init__.py
      main.py
  _compiled_3_3
    package
      __init__.py
      main.py
  _compiled_3_4
    package
      __init__.py
      main.py
  _compiled_3_5
    package
      __init__.py
      main.py
  _compiled_3_6
    package
      __init__.py
      main.py

And, for example, on Python 2.7 we need to use the package from _compiled_2_7, on 3.3 from _compiled_3_3 and etc.

First of all, we know that if we import package.main, package.__init__ would be imported first:

# __init__.py
print('init')

# main.py
print('main')

# REPL
>>> import package.main
init
main

>>> from package import main
init
main

# shell
 python -m package.main
init
main

So it works all the time. And now if we put special bootstrap script, which would change sys.path depending on current Python version, we can easily import the package that we need:

import sys
import os

VERSIONS = {'_compiled_2_7': (2, 7),
            '_compiled_3_0': (3, 0),
            '_compiled_3_1': (3, 1),
            '_compiled_3_2': (3, 2),
            '_compiled_3_3': (3, 3),
            '_compiled_3_4': (3, 4),
            '_compiled_3_5': (3, 5),
            '_compiled_3_6': (3, 6)}
root = os.path.abspath(os.path.dirname(__file__))


def _get_available_versions():
    for version in os.listdir(root):
        if version in VERSIONS:
            yield version


def _get_version():
    available_versions = sorted(_get_available_versions())[::-1]
    for version in available_versions:
        if VERSIONS[version] <= sys.version_info:
            return version


# We should pass `__name__` as an argument, because
# we can't access `__name__` after module deletion
def _import_module(name):
    version = _get_version()
    version_path = os.path.join(root, version)
    sys.path.insert(0, version_path)
    del sys.modules[name]
    __import__(name)


_import_module(__name__)

Let’s try it on our package, but first, we need to fill _compiled_*/package/__init__.py with:

print('init 2.7')  # replace with required version

And _compiled_*/package/main.py with:

print('main 2.7')  # replace with required version

So now we can try it with Python 2.7:

# REPL
>>> import package
init 2.7

>>> import package.main
init 2.7
main 2.7

>>> from package import main
init 2.7
main 2.7

# shell
 python2 -m package.main
init 2.7
main 2.7

And with Python 3.5:

# REPL
>>> import package
init 3.5

>>> import package.main
init 3.5
main 3.5

>>> from package import main
init 3.5
main 3.5

# shell
 python3 -m package.main
init 3.5
main 3.5

It works!

Guessing complexity of algorithms



Recently, when I was having fun with algorithms exercises, I though, do I solve them correctly? So I thought that it would be neat to somehow find the complexity of an algorithm.

Gathering the data

The first idea was to use timeit and calculate complexity based on consumed time:

import timeit
from numpy.random import randint


def _get_data_set(size):
    return list(randint(1000, size=size))


def main(alist):
    for passnum in range(len(alist) - 1, 0, -1):
        for i in range(passnum):
            if alist[i] > alist[i + 1]:
                temp = alist[i]
                alist[i] = alist[i + 1]
                alist[i + 1] = temp


for size in range(1000, 10000, 1000):
    print(size, '=>', timeit.timeit('main(xs)',
                                    f'xs = _get_data_set({size})',
                                    globals=globals()))                                 
1000 => 0.25852540601044893
2000 => 1.9087769229663536
3000 => 3.760379843064584
4000 => 5.552884255070239
5000 => 9.784064659965225
6000 => 10.014859249000438
7000 => 21.16908495896496
8000 => 19.5277360730106
9000 => 24.729382600053214

But it wasn’t much accurate and useful for small functions, required too big data set and required to run the function multiple times for acquiring meaningful data.

The second idea was to use cProfile:

from cProfile import Profile


profiler = Profile()
profiler.runcall(main, _get_data_set(1000))
profiler.print_stats()
         3 function calls in 0.269 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.269    0.269    0.269    0.269 t.py:43(main)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

But it tracks only function calls and is useless for small algorithmic functions. So I found line_profiler line-by-line profiler:

from line_profiler import LineProfiler


profiler = LineProfiler(main)
profiler.runcall(main, _get_data_set(1000))
profiler.print_stats()
Total time: 2.21358 s
File: /home/nvbn/exp/complexity/t.py
Function: main at line 43

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    43                                           def main(alist):
    44      1000         1232      1.2      0.1      for passnum in range(len(alist) - 1, 0, -1):
    45    500499       557623      1.1     25.2          for i in range(passnum):
    46    499500       729304      1.5     32.9              if alist[i] > alist[i + 1]:
    47    243075       278005      1.1     12.6                  temp = alist[i]
    48    243075       334224      1.4     15.1                  alist[i] = alist[i + 1]
    49    243075       313194      1.3     14.1                  alist[i + 1] = temp

Which works fairly well for this task.

And as for guessing complexity we almost always need only the magnitude of growth, we can just calculate all hits, like:

def _get_hits(fn, *args, **kwargs):
    profiler = LineProfiler(fn)
    profiler.runcall(fn, *args, **kwargs)
    hits = 0
    for timing in profiler.get_stats().timings.values():
        for _, line_hits, _ in timing:
            hits += line_hits
    return hits


for size in range(100, 1000, 100):
    print(size, '=>', _get_hits(main, _get_data_set(size)))
100 => 17257
200 => 69872
300 => 159227
400 => 279847
500 => 429965
600 => 622175
700 => 852160
800 => 1130426
900 => 1418903

So to proof that it’s correct, we can visualize it with matplotlib:

import matplotlib.pyplot as plt


result = [(size, _get_hits(main, _get_data_set(size)))
          for size in range(100, 1000, 100)]

plt.style.use('ggplot')
plt.plot([size for size, _ in result],
         [hits for _, hits in result])
plt.title('Bubble sort: O(n^2)')
plt.show()

On the plot we can obviously see that the complexity is quadratic:

Bubble sort: O(n^2)

Finding complexity

For finding most the appropriate complexity, we need to have a special functions like def get_score(hits: List[Tuple[int, int]]) -> int. And a complexity with the lowest score will be the most appropriate.

Let’s start with constant complexity:

def constant(xs):
    result = 0
    for n in range(5):
        result += n
    return result
    
    
hits = [(size, _get_hits(constant, _get_data_set(size)))
        for size in range(100, 1000, 100)]
[(100, 13), (200, 13), (300, 13), (400, 13), (500, 13), (600, 13), (700, 13), (800, 13), (900, 13)]

Constant: O(1)

We can see, that hits count not depends at all on the size of the data set, so we can calculate the score by just summing diff with the first hit:

def get_constant_score(hits):
    return sum(abs(hits[0][1] - hit) for _, hit in hits[1:])
    
    
get_constant_score(hits)
0

The next is linear complexity:

def linear(xs):
    result = 0
    for x in xs:
        result += x
    return result
    

hits = [(size, _get_hits(linear, _get_data_set(size)))
        for size in range(100, 1000, 100)]
[(100, 203), (200, 403), (300, 603), (400, 803), (500, 1003), (600, 1203), (700, 1403), (800, 1603), (900, 1803)]

Linear: O(n)

We can notice that the order of growth is linear, so we can use the similar approach as with constant complexity, but with growth instead of hits:

def _get_growth(hits):
    for hit_a, hit_b in zip(hits[:-1], hits[1:]):
        yield abs(hit_a - hit_b)


def get_linear_score(hits):
    hits = [hit for _, hit in hits]
    growth = list(_get_growth(hits))
    return sum(abs(growth[0] - grow) for grow in growth[1:])
    
    
get_linear_score(hits)
0

Now it’s time for quadratic complexity:

def quadratic(xs):
    result = 0
    for x in xs:
        for y in xs:
            result += x + y
    return result


hits = [(size, _get_hits(quadratic, _get_data_set(size)))
        for size in range(100, 1000, 100)]
[(100, 20203), (200, 80403), (300, 180603), (400, 320803), (500, 501003), (600, 721203), (700, 981403), (800, 1281603), (900, 1621803)]

Quadratic: O(n^2)

It’s a bit more complicated in this case, the order of growth of the order of growth (like second derivative) should be the same:

def get_quadratic_score(hits):
    hits = [hit for _, hit in hits]
    growth = list(_get_growth(hits))
    growth_of_growth = list(_get_growth(growth))
    return sum(abs(growth_of_growth[0] - grow)
               for grow in growth_of_growth[1:])


get_quadratic_score(hits)
0

With similar approach detection of O(log(n)), O(n*log(n)) and etc can be implemented.

Usable public API

The code used above is only usable in REPL or a similar interactive environment. So let’s define simple and extensible API. First of all, complexity assumptions:

ComplexityLogEntry = NamedTuple('ComplexityLogEntry', [('size', int),
                                                       ('hits', int)])


class BaseComplexityAssumption(ABC):
    title = ''

    def __init__(self, log: List[ComplexityLogEntry]) -> None:
        self.log = log
        self.hits = [hit for _, hit in self.log]

    @abstractmethod
    def get_score(self) -> int:
        ...

So, for example, an assumption for constant complexity will be like:

class ConstantComplexityAssumption(BaseComplexityAssumption):
    title = 'O(1)'

    def get_score(self) -> int:
        return sum(abs(self.hits[0] - hit) for hit in self.hits[1:])

The next part of public API is Analyzer:

class Complexity:
    def __init__(self, title: str, log: List[ComplexityLogEntry]) -> None:
        ...

    def show_plot(self) -> 'Complexity':
        ...


T = TypeVar('T')


class BaseComplexityAnalyzer(Generic[T], ABC):
    title = ''

    @abstractmethod
    def get_data_set(self, size: int) -> T:
        ...

    @abstractmethod
    def run(self, data_set: T) -> None:
        ...
        
    @classmethod
    def calculate(cls, sizes: Iterable[int]) -> Complexity:
        ...
     
    ...

So we can analyze some quadratic algorithm very easily:

class SomethingQuadratic(BaseComplexityAnalyzer[int]):
    title = 'Something quadratic'

    def get_data_set(self, size) -> List[int]:
        from numpy.random import randint
        return list(randint(1000, size=size))

    def run(self, data_set: Iterable[int]) -> None:
        result = []
        for x in data_set:
            for y in data_set:
                result.append((x, y))
                
                
SomethingQuadratic \
    .calculate(range(100, 1000, 100)) \
    .show_plot()

And it works:

Assumption result

Gist with sources.

Robert Love: Linux Kernel Development



book cover Lately, I was interested in Linux internals, so I decided to read Linux Kernel Development by Robert Love. It’s an interesting book, it explains stuff from interrupts to vfs and memory management. It’s easy to read and contains information why historically something was done like it was.

Although the book is more for beginner kernel developers and contains a lot of code samples of the kernel internals. It’s worth reading if you just interested in how it works inside.

Telegram bot that speaks like a gopnik



book cover TLDR bot – http://t.me/swear_bot (speaks only in Russian)

Saturday night I thought that it would be nice to write a telegram bot that will speak like a gopnik, and sometimes can be used in a conversation (inline mode) to find a relevant reply. So a few beers later I implemented it. As a language I chose JavaScript, but probably Python would be better for this kind of project because of a high variety of NLP related libs. Although for Russian in js exists Az.js, which I’ve used in the bot, and this library has almost everything needed by the bot.

Worth to mention as the bot speaks in russian, examples below are in russian too, I add translation in brackets, but sometimes translation doesn’t have much sens.

The bot/telegram API part isn’t much interesting, it’s straightforward and implemented with node-telegram-bot-api.

Move to the interesting part, how it generate replies? The first option is to find trigger word and reply appropriately, like:

user: Хочу новую машину! (I want a new car!)
bot: Хотеть не вредно! (It doesn’t harm to wish)

First of all, we have a list of trigger words regexp and replies, like:

export const TRIGGERS = [
  [/^к[оа]роч[ье]?$/i, 'У кого короче, тот дома сидит!'],
  [/^хо(чу|тим|тят|тел|тела)$/i, 'Хотеть не вредно!'],
];

Now we need to extract words from our text, it’s easy with Az.Tokens, and check if a word matches any trigger regexp:

const getWords = (text: string): string[] =>
  Az.Tokens(text)
    .tokens
    .filter(({type}) => type === Az.Tokens.WORD)
    .map(({st, length}) => text.substr(st, length).toLowerCase());
  
const getByWordTrigger = function*(text: string): Iterable<string> {
  for (const word of getWords(text)) {
    for (const [regexp, answer] of constants.TRIGGERS) {
      if (word.match(regexp)) {
        yield answer;
      }
    }
  }
};

And that little generator would yield all appropriate replies to text based on trigger words.

The second option is to reply to question with an amusing question, like:

user: Когда мы уже пойдём домой? (When we’ll go home?)
bot: А тебя ебёт? (Why the fuck you care?)

So it’s very easy to implement, we just need to find a question mark at the end of text or question words such a “когда” (when), “где” (where) and etc:

const getAnswerToQuestion = (text: string): string[] => {
  if (text.trim().endsWith('?')) {
    return [constants.ANSWER_TO_QUESTION];
  }

  const questionWords = getWords(text)
    .map((word) => Az.Morph(word))
    .filter((morphs) => morphs.length && morphs[0].tag.Ques);

  if (questionWords.length) {
    return [constants.ANSWER_TO_QUESTION];
  } else {
    return [];
  }
};

It returns an array with amusing constants.ANSWER_TO_QUESTION if a message is a question, otherwise returns an empty array.

The third option is most interesting, is to reply with vulgar rhyme, like:

user: хочу в Австрию! (I want to go to Austria)
bot: хуявстрию (dickstria)
user: у него есть трактор (he have a tractor)
bot: хуяктор (dicktor)

So how it works, it just replaces first syllable in nouns and adjectives with “ху” and transformed vowel from the syllable, like “о” → “ё”, “а” → “я” and etc.

So, first of all, we need to extract the first syllable it’s very easy, we just need all characters before consonant that after the first vowel:

const getFirstSyllable = (word: string): string => {
  const result = [];

  let readVowel = false;

  for (const letter of word) {
    const isVowel = constants.VOWELS.indexOf(letter) !== -1;

    if (readVowel && !isVowel) {
      break;
    }

    if (isVowel) {
      readVowel = true;
    }

    result.push(letter);
  }

  return result.join('');
};

Then we need a function, that will replace the first vowel in a word:

const getRhyme = (word: string): ?string => {
  const morphs = Az.Morph(word);
  if (!morphs.length) {
    return;
  }

  const {tag} = morphs[0];
  if (!tag.NOUN && !tag.ADJF) {
    return;
  }

  const syllable = getFirstSyllable(word);
  if (!syllable || syllable === word) {
    return;
  }

  const prefix = constants.VOWEL_TO_RHYME[last(syllable)];
  const postfix = word.substr(syllable.length);

  return `${prefix}${postfix}`;
};

And then we need a function that extracts words from a text and returns possible rhymes:

const getRhymes = (text: string): string[] =>
  getWords(text)
    .map(getRhyme)
    .filter(Boolean)
    .reverse();

It’s pretty easy, right?

So the last option is to reply confused and aggressive when previous options aren’t working, like:

user: wtf
bot: Чё? (What?)

And it’s straightforward and implemented in a function, that aggregate all options:

export default (text: string): string[] => {
  const answers = uniq([
    ...getByWordTrigger(text),
    ...getAnswerToQuestion(text),
    ...getRhymes(text),
  ]);

  if (answers.length) {
    return answers;
  } else {
    return constants.NO_ANSWERS;
  }
}

That’s all. Telegram bot, sources on github.