Original plan - Oct 2010

The first question from whom saw the e-log system might belike "What we can do with archived data? Does it deserve the efforts? Or does your system smart to answer my question per archived data?" Can't be clear enough, just accumulating everything in a digitized form can never answer such question neither is that fantastic job to do everyday. We, as the researcher, should provide the answer clear enough for the user that their gain will be much higher than their investment by any means.

Case study

e-Log case study using two dimension.

As of October, 2010, I am collecting 10 types of sensor streams (in a mean that they are digitized events occurring time-to-time). The below table shows the first 100 questions that e-Log can answer to the user. This table lists the question set using a set of two input streams from the sensor stream table on the left. The questions in the table however is not that trivial ones but could be very meaning in providing the important answer for the user. In fact, by extending this approach, e-log can answer 10^10 questions or n^n question, where n is the number of input event streams.

Research topic

The question is then whether we have a proper information management system to handle streams of heterogenous data to detect events of user interests to retrieve the answer. This is the research topic of the e-Log project.

Technical implementation

Sensor data processing

iPhone image data

The EXIF information attached at the head of an image file gives a great details on the context of a photo. iPhone image data processing describes how to extract the complete EXIF information from the iPhone photos.

Time synchronization over multiple time zones

In life-logging application collecting or monitoring heterogenous sensor streams, the time synchronization is the critical issue in managing the data and events. Such issue becomes bigger when a user travels around the world over multiple time zones, which easily breaks up the temporal relations without correction. So I apply the common timezone (UTC) to all timestamps attached to the sensing data (UTC is the time zone used in the flight schedule management). UTC timezone provides the technical explanation on the server-side local timestamp management that I actually use in my e-Log system.

Import contacts

I use the google contact as the way to identify people contacts, their emails and/or face recognition. For off-line processing, export the contacts from the gmail contact interface, select "All contacts" and "Google CSV format". After, import the CSV file into the Google Document then export it:

  • Excel format if using Microsoft Window. Finally import them into the MySQL table using EMS mysql manager or simliar tools that support the XLS format (Note: UTF-8 encoded string were broken when using EMS mysql manager).
  • Or CSV format for Mac. Use Sequal Pro to import CSV. There is no UTF problem here.

Above a bit complex procedures are due to the lack of "MEMO" field line support when Google export user's contacts. Google even changes the contact fields time-to-time. So it is mandatory to check the fields in the CSV file and accordingly modify the contact table structure.

String comparison

Obviously life logging needs to handle heterogenous data. I mean heterogenous by their diversity and blah as well referred many times. In other words for technical implementation, this situation always happens at the very low data level where it is even difficult to directly compare string components of data sets coming each from different sources.

I have been suffered due to that problem using UTF-8 encoded string for comparison: one from google contact and the other from Mac directory name (which means the person name exported by Picasa Mac face detector). The problem is that one UTF-8 looking-same character can be encoded in many different ways that invalidates any byte-comparison text comparison method.

So here we need to learn the concept of unicode normalization which MySQL does not support yet (before we plug-in the extension).

As a mean time solution for my simple case, I wrote PHP script using PHP Normalizer which will externally read and update the unicode string for normalization (See the below code snippet). It works like charm now though the server-side support would be much better. For PostgreSQL, there is some work around.

while($obj = $result->fetch_object()) { 
   $person_name = $obj->name;
   $person_name_normalized = Normalizer::normalize($person_name, Normalizer::FORM_C);

   $person_name = $mysqli_update->real_escape_string($person_name);
   $person_name_normalized = $mysqli_update->real_escape_string($person_name_normalized);

Data structure

See the e-Log data structure.

Sensor calibration

Time sync

In this experiment, I mainly use two devices: ViconRevue and iPhone. ViconRevue, which is still in the early stage of development, embeds multiple sensors of which outputs need some calibration and complex equations for semantically meaningful event interpretation. iPhone, which I use to log the location and capture images (sometimes), works well but the clock of iPhone is not synchronized with that of ViconRevue. Even it is technically not feasible to sync the iPhone clock with the lack of API supports, whereas ViconRevue clock is synched with that of computer to which it is connected to download images data data from it. Since most computer clock is synched with internet time server, the actual location of GPS captured using iPhone is not matched with what ViconRevue actually sees. So we did a simple calibration using the iPhone alarm clock app to match the exact time between two devices.

iPhone clock calibration

Sensecam and iPhone clock sync error

As a result, iPhone clock goes 22 seconds on average earlier than ViconRevue. Let us apply this result to match the ViconRevue image with the actual GPS location.

ViconRevue image captured at 10:05:07 AM

The GPS points captured using the iPhone. The later blue point (46.071597, 11.120559) is the correct position

GPS time-sync calibration result

The matched GPS point (46.071597, 11.120559 in blue color) is 22 seconds earlier than ViconRevue clock, which is exactly matched with the calibration result.

In a similar manner, we tested on the old data set captured before manually adjusting the iPhone clock, the gap was 48 seconds on average where iPhone clock is slower than the that of sensecam.

UPDATE iphone_gps
SET gps_timestamp = SUBTIME(gps_timestamp, "00:00:48")

Magnetometer

ViconRevue sample data for calibration changed from north, east, south and west matched with iPhone compass App.

MAG,2011/01/07 23:54:00,-921,302,520
MAG,2011/01/07 23:54:01,-924,290,532
MAG,2011/01/07 23:54:13,-929,300,525
MAG,2011/01/07 23:54:14,-926,299,525
MAG,2011/01/07 23:54:25,-1229,328,303
MAG,2011/01/07 23:54:27,-1223,325,317
MAG,2011/01/07 23:54:38,-1243,330,329
MAG,2011/01/07 23:54:49,-1231,309,324
MAG,2011/01/07 23:55:00,-1033,389,-52
MAG,2011/01/07 23:55:11,-1014,343,-39
MAG,2011/01/07 23:55:22,-1014,349,-41
MAG,2011/01/07 23:55:24,-1012,342,-37
MAG,2011/01/07 23:55:36,-702,320,179
MAG,2011/01/07 23:55:47,-703,316,184
MAG,2011/01/07 23:55:58,-716,341,176
MAG,2011/01/07 23:56:00,-709,320,186

Event detection

Detecting events is the process of converting the raw-data streams into the sequence of meaningful information. This conversion process is a sort of filtering designed to detect the moment of interests and related contents from the source. For instance of life logging, such filetering process is very necessary for the data regularly sampled from the sensor like GPS or accelerometer which without filters may contain too-long stationary data without changes. Such filters may be designed proprietarily for each sensor stream. Or we may introduce the generalized approach based on the entropy theory that this work will use for sensor stream event detection.

Privacy protection

Using ViconRevue means that I am taking photos of people around me. Thus it is mandatory to protect their privacy before we release any data for the public. See Privacy protection for our approach.

Reverse geocoding

From the collected GPS points, we perform the reverse geocoding using the Google Geocoding API. The problem is that Google provides the limited service of 2,500 geolocation requests per day, whereas we have 33,920 points by Feb 9th, 2011. So we performed the GPS points classification using K-mean method. See K-mean GPS points classification.

Event timeline

People on the timeline (Timezone: UTC)

Research collaboration

Build the elifelog.org community.