Daniel Kumor

Scientist in Training

Visualizing Causal Models

I found it pretty annoying to quickly generate pictures of causal models in my research, for inclusion in presentations and latex documents.
To fix this problem, I’ve used the visjs library to generate pngs of publication-quality causal models.

You can try it yourself here.
The code, as always, is on github

ConnectorDB

For the past 2 years, I have been working with Joseph Lewis on a platform for quantified self data. Today, it is finally ready for use.

There are many apps that gather data for you. On your laptop, there is selfspy. For phones, there are multiple, each integrated with varying amounts of fitness trackers. Unfortunately, this means that if you wanted to track the details of your life, you’d need to find several services, and try downloading their data periodically. This isn’t even mentioning the security implications of letting companies have a detailed view of your every move.

I personally wanted something with the following properties:

  • Self-Hosted - I want to keep very personal data. I don’t want anyone to have access except me, and I am willing to set up a server to do it!
  • Open Source - Closed source solutions are ultimately tied to their creators. Once the maintainers are done, you’re out of luck, unless it is OSS.
  • Multi-Device - while single-focus apps exist for almost every device, combining data is not a fun task. The “ultimate quantified self solution” must be able to integrate data from multiple sources - from all of my computers, my phone, and any sensors I might get.
  • Automated - 2 years ago, I built a web-app using which I tracked details of my life manually. It was tedious work, and I quickly forgot about adding data. A real solution to data gathering must be automated - most data must be gathered without any of my input, so that I can forget about it for weeks at a time.
  • Easy analysis - Data is useless if it isn’t actually being used. A real solution must make it easy to perform custom analysis using my programming language of choice. The goal here is not to do analysis myself, but to figure out a way to do Machine Learning on my data!

With these goals in mind, we’ve created ConnectorDB. It isn’t much yet - it doesn’t do any analysis or visualization for you. But, it gathers data. A lot of data.

And it is easy to add devices, so that it gathers even more data.

Let me show you ConnectorDB!

I have an ubuntu server set up on DigitalOcean, where I have ConnectorDB running.

After logging in, ConnectorDB shows me my input screen. On this page, I can rate how well my life is going - these ratings are all saved, and can be accessed when performing analysis, so that my data can be correlated with my actual mood!

You can set up any ratings you like by clicking the star icon - these are the ones I found to be useful. Of course, star ratings are just one form of input - you can manually track however many things you’d like by creating custom streams (the + icon).

As I mentioned before, though, manual data entry quickly becomes tedious, so it is important to set up ConnectorDB so that it gathers data for you!

ConnectorDB comes with a python laptop-logger app and an android app. The laptop logger works on both linux and windows. It gathers data in the background of my laptop, saving the number of keys I pressed, the titlebars of the currently active window, and other such metrics.

The laptoplogger syncs with ConnectorDB every hour, and the data shows up in ConnectorDB immediately after sync.

Notice that this page is fairly empty. This is because ConnectorDB does not yet have any visualization! In the future, this page would offer simple ways to see and analyze your data.

ConnectorDB also has an associated Android app, which gathers all sorts of goodies from my phone:

Admittedly, this isn’t much. There already exist many apps that do visualization much better, and allow you to see much more detail.
Hopefully this can be fixed quickly - the GUI was only recently created, and still has much work in store.

Python API

Of course, this would be useless if I couldn’t access my data. I am particularly proud of the Python API, since I think it allows really easy access and analysis of data.

For example, this is all the code you need to create a custom device that syncs to ConnectorDB once an hour:

def getTemperature():
    #Your code here
    pass

from connectordb.logger import Logger

def initlogger(l):
    l.apikey = raw_input("apikey:")
    l.addStream("temperature",{"type":"number"})
    l.syncperiod = 60*60

l = Logger("cache.db", on_create=initlogger)
l.start()

while True:
    time.sleep(60)
    l.insert("temperature",getTemperature())

All of your IoT devices can be included into one large dataset with only a couple lines of code!

What do you think?

ConnectorDB is definitely a work in progress. But hopefully, it can quickly gain missing features to become a one-stop location for quantified self data!
You can set up your own copy of ConnectorDB by following the installation instructions on the website.

An Update

In middle 2014 I started thinking about a data-gathering tool for quantified-self info, with the explicit purpose of treating life as a machine learning problem. I am happy to say that for the past year, Joseph Lewis and I have been working on something I think is quite incredible.

We are very close to having a first version of the software we wrote available. I just wanted to say that I can’t wait to show off what we have accomplished. Our hope is to have a nice demo running by January!

Milestone 1

In the last week of September I started on a very interesting personal project. I was finished with homework for the week, and decided that it couldn’t hurt to play around with an idea that was bothering me for a while.

“It should take me no more than a couple days to get something up and running”, I thought.

One month later…

Nothing is up and running yet.

For the past 4 weeks I have spent every second of my free time (and some not-so-free time stolen from other obligations) to work on this project.

It wasn’t all for nothing, though. I have finally reached the first milestone. What does that mean? Well, I have the code that underlies the thing that will underlie the actual code that does stuff. In effect, I have finished a very convoluted and complex boilerplate. So about 500 more milestones, and I might have the equivalent of “hello world”.

Nevertheless, I am very proud of myself, despite not having much to show.

Now it is time for a day or two of break, so that I can catch up with my obligations, and then on to milestone 2!

The Great Balancing Act

There are several methods for reading papers. There is the “read-on-a-laptop” method, which gives a good idea of what is going on, and then there is the “print-the-paper-and-spend-6-hours-going-through-it-word-by-word” method. The second method seems to inevitably require referencing textbooks to look up the derivation of key concepts. Frequently, it also requires constructing a dependency graph based upon the paper’s citations, such that to truly understand one paper, one has to read 10 others.

The same can be said of anything, really. Doing something right requires a huge dedication of time. And therein lies the problem.

I have never really learned how to balance my time - if I focus on classes, then I spend my waking hours immersed in the class materials, searching the internet and textbooks for supplemental content. If I have a personal project, I disappear from civilization for weeks, skipping classes and assignments. When I get to certain parts of my research projects, I forget that the sun exists, and my sleep schedule rotates around the clock as I come home from the office later and later each day (…just let me finish this one last thing!).

Such a mindset is great when all I have in my life is research, or classes. It becomes less so with several things competing for my attention. It becomes horrible once deadlines get involved (my undergraduate transcript can attest to that).

A couple of days ago, I realized that I was spending all of my time on homework and obligations. Classes are important to me now, especially since I have switched disciplines from Physics to Machine Learning. Ultimately, though, it is not classes that truly excite me. I am here to do research!

I found myself falling extremely quickly into a “deadline-chasing” mindset - working on assignment after assignment, going through one thing on my to-do list after another, focusing only on the short-term, but fixed due-dates.

It is clear to me that this is not the optimal way to go about spending my time. There exists a point at which long-term goals without fixed and recurring deadlines (like research) trump the circle of recurring “things-I-should-do”. I have already learned the hard way that I can’t just ignore my duties at any time, but on the other hand, there will always be something waiting to be done. So how do I fulfill my duties adequately, while still dedicating time to long-term goals?

My claim is that there exists an “optimal schedule” of work. There exists a balance for each person - an optimization point where duties are fulfilled, but time spent on the “important things” is maximized.

Such a schedule is not easy to calculate. Perhaps it takes 3 hours to shift attention from one subject to another. Perhaps it is best to take care of light duties right after eating, or attack the difficult problems after midnight? And maybe not starting the day off with something intellectually stimulating leads to motivational difficulties for the rest of the day? Perhaps leaving an assignment unfinished until near a deadline allows me to work with much greater efficiency? Or maybe it is the opposite? And what about sleep? Will a nap during the day to allow greater focus in the afternoon? How will it affect my efficiency the next day? What about tuning motivation? Perhaps it is best to intersperse the hard stuff with the simple things, such that I get some motivation from feeling that I am getting somewhere?

How exactly do I schedule things around my deadlines to both maximize my efficiency, and fit as much research and papers/books in as possible?

All of these are extremely difficult questions to answer. But answering them in a rigorous, data-driven way promises a pathway to immense efficiency.

meDB

Over the summer I created a simple web-app called “meDB”, which I have been using for the past few months to gather data about myself, such as the amount of sleep I am getting, when, and what I eat, and how much time is spent on specific tasks, along with subjective ratings such as my mood and progress towards goals. The goal of the app was to find specific correlations in the data - how much does sleep affect my performance? Does eating fatty food lower my mood? How long will the effects of an all-nighter last?

I have recently been thinking that it might be a good idea to spend some of my (yet to be found) time in attempting to create a scheduling algorithm which uses this data to optimize both my well-being, and the amount of things I can accomplish.

Sounds like the perfect use case for some machine learning!


For the forseeable future, I still need to learn how to balance my obligations manually… So inefficient.

But what can ya do ¯\_(ツ)_/¯

Might as well get started.