Jim Webber, Ian Robinson, Emil Eifrem: Graph Databases



book cover white Recently I wanted to read something about graph databases and in the Humble Book Bundle, I found Graph Databases by Jim Webber, Ian Robinson and Emil Eifrem. The book is mostly focused on neo4j but has a bit of information about other databases. It has examples of graph data models and real world use cases, also the book contains an information about theoretical parts and neo4j internals. And there’s a lot about Cypher language.

Although in some chapters the book can be described by anything’s a graph if you’re brave enough.

Building a graph of flights from airport codes in tweets



A lot of people (at least me) tweet airports codes like PRG ✈ AMS before flights. So I thought it will be interesting to draw a directed graph of flights and airports. Where airports are nodes and flights are edges.

First of all, I created a twitter application, authorized my account within it and got all necessary credentials:

TWITTER_CONSUMER_KEY = ''
TWITTER_CONSUMER_SECRET = ''
TWITTER_ACCESS_TOKEN = ''
TWITTER_ACCESS_TOKEN_SECRET = ''
USER_ID = ''

As a special marker I chose airplane emoji:

MARKER = '✈'

Then I tried to receive all my tweets with that marker but stuck with a huge problem, twitter REST API doesn’t work with emojis in a search query. So I decided to receive a whole timeline and filter it manually. So only the last 3200 tweets will be parsed. Working with twitter API is very easy with tweepy:

import tweepy


def get_tweets():
    auth = tweepy.OAuthHandler(TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET)
    auth.set_access_token(TWITTER_ACCESS_TOKEN, TWITTER_ACCESS_TOKEN_SECRET)
    api = tweepy.API(auth)
    cursor = tweepy.Cursor(api.user_timeline,
                           user_id=USER_ID,
                           exclude_replies='true',
                           include_rts='false',
                           count=200)
    return cursor.items()
>>> for tweet in get_tweets():
...     print(tweet)
... 
Status(_api=<tweepy.api.API object at 0x7f876a303ac8>, ...)

Then I filtered tweets with in its text:

flight_texts = (tweet.text for tweet in get_tweets()
                if MARKER in tweet.text)
>>> for text in flight_texts:
...     print(text)
...
ICN ✈️ IKT
IKT ✈️ ICN
DME ✈️ IKT

As some tweets may contain more than one flight, like LED ✈ DME ✈ AUH, it’s convenient to extract all three letter parts and build flights like LED ✈ DME and DME ✈ AUH:

def get_flights(text):
    parts = [part for part in text.split(' ') if len(part) == 3]
    if len(parts) < 2:
        return []

    return zip(parts[:-1], parts[1:])


flights = [flight for text in flight_texts
           for flight in get_flights(text)]
uniq_flights = list(set(flights))
>>> uniq_flights
[('ICN', 'IKT'), ('IKT', 'ICN'), ('DME', 'IKT')]

From edges in uniq_flights it’s very easy to get all nodes:

airports = [airport for flight in flights
            for airport in flight]
uniq_airports = list(set(airports))
>>> uniq_airports
['ICN', 'IKT', 'DME']

So now it’s possible to create a graph with networkx and draw it with matplotlib:

import networkx
from matplotlib import pyplot


graph = networkx.DiGraph()
graph.add_nodes_from(uniq_airports)
graph.add_edges_from(uniq_flights)
networkx.draw(graph, with_labels=True, node_size=1000)
pyplot.draw()
pyplot.show()

The graph is very ugly:

But it’s simple to improve it by using different colors depending on nodes and edges weight, and by using graphviz.

from collections import Counter
from matplotlib import cm


def get_colors(all_records, uniq_records):
    counter = Counter(all_records)
    max_val = max(counter.values())
    return [counter[record] / max_val
            for record in uniq_records]


networkx.draw(graph, 
              with_labels=True,
              node_size=1000,
              width=1.5,
              pos=networkx.nx_pydot.graphviz_layout(graph, prog='neato'),
              cmap=cm.get_cmap('Pastel1'),
              edge_cmap=cm.get_cmap('Pastel2'),
              edge_color=get_colors(flights, uniq_flights),
              node_color=get_colors(airports, uniq_airports))
pyplot.draw()
pyplot.show()

So now it’s much nicer:

Gist with sources.

Baron Schwartz, Peter Zaitsev, Vadim Tkachenko: High Performance MySQL



book cover white Apart from using Cloud SQL, I haven’t touched MySQL for a while, so I decided to freshen up things and read High Performance MySQL by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko. The book feels solid, it explains how MySQL works (and worked before) inside, what problems storage engines/parser/optimizer/etc have and how to leverage them. It’s kind of nice that a big part of the book is about MySQL scaling. And it’s also good that the book has a lot of information about troubleshooting, debugging, profiling and some MySQL related tools.

Although the book is probably a bit outdated, it covers MySQL versions up to 5.5, but nowadays the latest version is 5.7.

A year in Prague



Prague

From September 2016 to August 2017 I was living in Prague, it was almost a year. I was studying the Czech language there because I was planning to continue my education and finish university here. But I decided that studying and working full time simultaneously is too much for me, so I’m moving to the Netherlands for work. Got B2 in Czech language and got accepted to two universities in Prague though.

I was living in the country with a student visa. CR student visa is fairly easy to obtain, it just requires confirmation from a university and around €5000 on a bank account. With this kind of visa, you can entry Czech Republic multiple times and travel all over Schengen Area.

Because I was studying, I managed to live in a students campus, but in something like an apartment, but with a shared kitchen for 7500 CZK (€300). The only problem that it was far away from the city center. But the city is not too big, and as a student, I’ve got a travel card almost for free. So the location wasn’t a big problem.

Internet was included in my rent, I had something around 50 Mbit/s. Mobile internet and mobile operators in the Czech Republic are the worst. I was using Vodafone and initially, I was paying 519 CZK (€20) for 4GB, but somehow in June this option disappeared and I started to pay 399 CZK (€15) for just 1.5GB.

Food is cheap there, I was spending around 800 CZK (€30) per week for everyday chicken/meat/fish. Although in Prague there’s very small selection of fish and almost no seafood. The booze is very cheap there, 15-25 CZK (€0.5-1) for a fairly good beer in a supermarket and around 30 CZK (€1.2) in a pub.

Summing up everything, Prague is a very nice city with a lot of cultures, attractions, and events. It’s in the middle of Europe and it’s very easy to travel from there. And I kind of glad that I was living there.

AR* Geolocation Snake



instruction     gameplay

I like rollerskating, but sometimes it’s kind of boring to skate by the same routes. I was using Pokemon GO for making the route more fun, but pokestops have fixed locations and catching the pokemons after a few months is kind of boring too. So I thought that it can be interesting to randomly select places to skate. And snake game makes it even more interesting and challenging because I need to select a more complex route for avoiding snake’s tail and not losing the game.

Although sometimes the app puts candies on the other side of the road or requires me to ride on a sidewalk with intolerable quality, I solved it with an option to regenerate the candy.

TLDR: the app, source code.

What’s inside

The app is written in JavaScript with flow, React Native and redux with redux-thunk. For the map, I used react-native-maps which are nice, because it works almost out of the box. So mostly the game is very simple.

The first challenging part is candies generation. As a first attempt the app uses nearby search from Google Places API (hah, it’s already deprecated) with a specified radius, filters places with the radius greater than minimal and selects random place. As we can’t just use coordinates for filtering by distance, I used node-geopoint library.

const generateCandyFromPlacesNearby = async (
  position: Position,
): Promise<?Position> => {
  const positionPoint = new GeoPoint(position.latitude, position.longitude);

  const response = await fetch(
    "https://maps.googleapis.com/maps/api/place/nearbysearch/json?" +
      `location=${position.latitude},${position.longitude}` +
      `radius=${config.CANDY_MAX_DISTANCE}`,
  );
  const { results } = await response.json();

  const availablePositions = results.filter(({ geometry }) => {
    const point = new GeoPoint(geometry.location.lat, geometry.location.lng);

    return positionPoint.distanceTo(point, true) > constants.CANDY_MIN_DISTANCE;
  });

  return sample(availablePositions);
};

If there’s no place with appropriate distance in the specified radius, the app just chooses a random latitude and longitude offset within specified bounds.

const generateCandyFromRandom = (position: Position): Position => {
  const point = new GeoPoint(position.latitude, position.longitude);
  const [minNE, minSW] = point.boundingCoordinates(
    constants.CANDY_MIN_DISTANCE,
    undefined,
    true,
  );
  const [maxNE, maxSW] = point.boundingCoordinates(
    constants.CANDY_MAX_DISTANCE,
    undefined,
    true,
  );

  switch (random(3)) {
    case 0:
      return {
        latitude: random(minNE.latitude(), maxNE.latitude()),
        longitude: random(minNE.longitude(), maxNE.longitude()),
      };
    case 1:
      return {
        latitude: random(minSW.latitude(), maxSW.latitude()),
        longitude: random(minNE.longitude(), maxNE.longitude()),
      };
    case 2:
      return {
        latitude: random(minNE.latitude(), maxNE.latitude()),
        longitude: random(minSW.longitude(), maxSW.longitude()),
      };
    default:
      return {
        latitude: random(minSW.latitude(), maxSW.latitude()),
        longitude: random(minSW.longitude(), maxSW.longitude()),
      };
  }
};

And the last complicated part is detecting if the player touches snake’s tail. As we store tail as a list of coordinates, the game just checks if the head within aspecified radius of the tail parts.

export const isTouched = (
  a: Position,
  b: Position,
  radius: number,
): boolean => {
  const aPoint = new GeoPoint(a.latitude, a.longitude);
  const bPoint = new GeoPoint(b.latitude, b.longitude);

  return aPoint.distanceTo(bPoint, true) <= radius;
};

export const isSnakeTouchedHimself = (positions: Position[]): boolean =>
  some(positions.slice(2), position =>
    isTouched(positions[0], position, constants.SNAKE_TOUCH_RADIUS),
  );

Play Store, Github.

* like Pokemon GO without a camera.

Martin Kleppmann: Designing Data-Intensive Applications



book cover white Recently I wanted to read something about applications design and distributed systems, so I found and read Designing Data-Intensive Applications by Martin Kleppmann. Overall it’s a nice book with a bit of theoretical and practical information. It explains how a lot of things work inside, like databases, messaging systems, and batch/stream processing. The book is a bit high-level, but I guess because of that it’s easy and interesting to read.

Although the last chapter is a bit strange, a tinfoil hat kind of strange.

Kevin R. Fall and W. Richard Stevens: TCP/IP Illustrated, Volume 1



book cover white Recently I was interested how networks work and everywhere I found recommendations to read TCP/IP Illustrated by Kevin R. Fall and W. Richard Stevens. And it’s an educative book, which explains networks even on a physical level. Almost every chapter also contains information about possible problems and vulnerabilities, which is nice and interesting.

Although reading about the same stuff twice is kind of boring, almost everything is explained for IPv4 and IPv6. And not so small part of the books looks like a reference of packets layout and headers.

But I guess it’s nice to have at least a basic knowledge about networks and this book is more than enough.

How I was Full Stack and wrote a mediocre service for searching reaction gifs



screenshot

A while ago I discovered that people often use MRW as a reply in messengers. And not knowing that nowadays even Android keyboard have an option to search gifs, I decided to write some service with a mobile app, a web app and even a Telegram bot, that will do that.

The idea was pretty simple, just parse Reddit, somehow index gifs and allow users to search and share reaction gifs in all possible ways. The result was mediocre at best.

MRW result is mediocre at best

result is mediocre at best

The first part is the parser, it’s pretty simple and can do just two things:

  • index n top of all time posts from r/MRW on an initial run;
  • index n top today posts every 12 hours.

While indexing it gets appropriate links from Reddit’s own images hosting or Imgur, got additional information from nlp-service and put everything in ElasticSearch.

I decided to write it in Clojure because wanted to. The only problem was that Elastisch, a Clojure client for ElasticSearch, wasn’t (doesn’t?) work with the latest version of ElasticSearch. But ElasticSearch REST API is neat, and I just used it.

MRW library doesn’t work with the latest elasticsearch

library doesn't work with the latest elasticsearch

The next and the most RAM consuming part is the nlp-service, it’s written in Python with NLTK and Flask. It also can do just two things:

  • sentiment analysis of a sentence, like {"sentiment": "happiness"} for “someone congrats me”;
  • VADER, which is a some sort of sentiment analysis too, like {"pos": 0.9, "neg": 0.1, "neu": 0.2} for the same sentence.

It doesn’t work very well, because I’m amateur at best in NLP, and had a too small dataset. I was and still planning to make a better dataset with Mechanical Turk in the future.

MRW I have too small dataset

I have too small dataset

The last non-client part is the public facing API, it’s also very simple, written in Clojure, Ring and Compojure. It has just one endpoint /api/v1/search/?query=query. It just requests additional information for the query from nlp-service and searches appropriate gifs in ElasticSearch. Nothing interesting.

MRW public facing api is boring

public facing api is boring

The first client is the web app (source). It’s neat, has just one text input for query and written with ClojureScript and reagent. And it’s so small, that I don’t even use re-frame here.

MRW the web app is neat

the web app is neat

The second client is the mobile app (source). It can search for reaction gifs and can share found gifs to other apps. It’s written with React Native in JavaScript and works only on Android. Yep, I managed to write non-cross-platform RN app, but at least I’m planning to make it cross-platform and publish it to the AppStore.

MRW I managed to write non-cross-platform RN app

I managed to write non-cross-platform RN app

And the last and the most hipsterish client is the Telegram bot (source). It has three types of responses:

  • to /mrw query with appropriate reaction gif;
  • to just /mrw with famous Travolta gif;
  • to /help with obviously help message.

And it’s written in JavaScript with Node.js Telegram Bot API.

MRW I can’t find Travolta gif

I can't find Travolta gif

The last part is deploy. Everything is deployed on docker-cloud. I somehow managed to configure everything a few days before swarm mode announce, so it’s just stacks. But it wouldn’t be a problem to migrate to new swarm mode. The service is deployed as eight containers:

  • ElasticSearch;
  • nginx proxy;
  • letsencrypt nginx proxy companion;
  • the nlp-service;
  • the public API;
  • the parser;
  • the web app (data container);
  • the Telegram bot.

Almost everything worked out of the box, I only changed nginx proxy image to simplify serving assets of the web app. And it’s more than nice, when I push changes to github, docker-cloud rebuilds images and redeploys containers.

MRW almost everything works out of the box

almost everything works out of the box

Summing up everything, it was a totally full stack experience from the cluster on docker-cloud with microservices to the Telegram bot and the mobile app. And the result isn’t the worst part, the worst part is that as a part of my studying I’ve made a presentation for future Software Engineers about that service in Czech.

MRW I’ve made a presentation

I've made a presentation

Sources on github, the web app, the mobile app, the Telegram bot.

Import packages depending on Python version



While working on py-backwards and it’s setuptools integration I found an interesting problem, how to transparently allow people with different Python versions to use different packages. So, imagine we have a package with a structure:

package
  __init__.py
  _compiled_2_7
    package
      __init__.py
      main.py
  _compiled_3_3
    package
      __init__.py
      main.py
  _compiled_3_4
    package
      __init__.py
      main.py
  _compiled_3_5
    package
      __init__.py
      main.py
  _compiled_3_6
    package
      __init__.py
      main.py

And, for example, on Python 2.7 we need to use the package from _compiled_2_7, on 3.3 from _compiled_3_3 and etc.

First of all, we know that if we import package.main, package.__init__ would be imported first:

# __init__.py
print('init')

# main.py
print('main')

# REPL
>>> import package.main
init
main

>>> from package import main
init
main

# shell
 python -m package.main
init
main

So it works all the time. And now if we put special bootstrap script, which would change sys.path depending on current Python version, we can easily import the package that we need:

import sys
import os

VERSIONS = {'_compiled_2_7': (2, 7),
            '_compiled_3_0': (3, 0),
            '_compiled_3_1': (3, 1),
            '_compiled_3_2': (3, 2),
            '_compiled_3_3': (3, 3),
            '_compiled_3_4': (3, 4),
            '_compiled_3_5': (3, 5),
            '_compiled_3_6': (3, 6)}
root = os.path.abspath(os.path.dirname(__file__))


def _get_available_versions():
    for version in os.listdir(root):
        if version in VERSIONS:
            yield version


def _get_version():
    available_versions = sorted(_get_available_versions())[::-1]
    for version in available_versions:
        if VERSIONS[version] <= sys.version_info:
            return version


# We should pass `__name__` as an argument, because
# we can't access `__name__` after module deletion
def _import_module(name):
    version = _get_version()
    version_path = os.path.join(root, version)
    sys.path.insert(0, version_path)
    del sys.modules[name]
    __import__(name)


_import_module(__name__)

Let’s try it on our package, but first, we need to fill _compiled_*/package/__init__.py with:

print('init 2.7')  # replace with required version

And _compiled_*/package/main.py with:

print('main 2.7')  # replace with required version

So now we can try it with Python 2.7:

# REPL
>>> import package
init 2.7

>>> import package.main
init 2.7
main 2.7

>>> from package import main
init 2.7
main 2.7

# shell
 python2 -m package.main
init 2.7
main 2.7

And with Python 3.5:

# REPL
>>> import package
init 3.5

>>> import package.main
init 3.5
main 3.5

>>> from package import main
init 3.5
main 3.5

# shell
 python3 -m package.main
init 3.5
main 3.5

It works!