söndag 29 juni 2014

Approximating data in Gnuplot with the fit function and time as xdata.

Gnuplot is the shit. Normally, the biggest problem is finding out (i.e. Google-hunt for examples) how to do what you want and not to determine if it at all possible. However, I just ran into a problem where Gnuplot was buggy. I had a data-series with time-stamps (full dates) on the x-axis and my measurements (two different) on the y-axis and I wanted to approximate the trend with a simple least-square f(x) = l*x + m line. This was seemingly pretty easy to do:

#!/usr/bin/gnuplot -persist
set xdata time

set xlabel "Time"
set ylabel "Measurements" font "Arial,12"

set autoscale
set timefmt "%Y-%m-%d"

set term png xFFFFFF size 1200,700
set output "/tmp/measurements.png"
set datafile separator ";"

f(x) = a*x + b
fit f(x) './measurements.dat' using 1:2 via a, b
g(x) = c*x + d
fit g(x) './measurements.dat' using 1:3 via c, d
plot \
"./measurements.dat" using 1:3 title "Thingie A" with linespoints, \
"./measurements.dat" using 1:6 title "Thingie B" with linespoints, \
f(x) title "Least-squares f(x) = a*x + b approximation", \
g(x) title "Least-squares g(x) = c*x + d approximation"


For the first data-set ("Thingie A"), I got a believable approximation line - i.e., the line looked like it did approximate the general trend of the data-points of the set. However, where the first data-set had on average increasing data-points, the second data-set had generally decreasing points. Yet, the approximation line (g(x)) was increasing.  How could that be? Was my impression of a general decreasing trend an optical illusion? Or was Gnuplot's fit-function somehow buggy? It turned out to be the latter.

The fit-function uses "an implementation of the nonlinear least-squares
 (NLLS) Marquardt-Levenberg algorithm" (cited from the built-in help) and, apparently, that algorithm (or implementation) doesn't do well when the x and y values are very different in size. My y-points in the second data set was in a narrow range from circa 105 to 100 but what was the x values? 


Unlike Unix, Gnuplot doesn't use an Epoch starting with 00:00 January 1st 1970 but actually January 1st 2000:

gnuplot> plot strftime("%Y-%m-%d %H:%M", 0)
         warning: Skipping unreadable file "2000-01-01 00:00"
         No data in plot


Thus, since I have collected my data during this year, all my x-values where around  450.000.000 and, yes, that is quite a lot bigger than y-values around 100...

What to do? Well, the simplest thing to get the x-values closer to the y-values is to divide with the number of a seconds a day (86400) to have the fit-implementation work with x-values in the 5000 range instead.

To do this, I changed the following lines:

 fit g(x/86400) './measurements.dat' using 1:2 via c, d
...
plot \
...
f(g/86400) title "Least-squares g(x) = c*x + d approximation", \
... 

And that was enough, now the approximation line was declining just like my impression of the data-points. Success!

torsdag 8 maj 2014

About that virtual factory reset...

... it did work out in the end, after a few problems. The DVD images worked great all the way to the subsequent reboot. Then all one got was a black screen with a non-blinking low cursor in the upper-left corner of the screen. Thanks a lot - I guess the factory reset never bothered to install any MBR...

Oh well, let's boot up the virtual machine again, but this time with a Linux live CD iso image attached (Linux Mint 16 "Petra"), so that we can - from the trustworthy Linux prompt - install the mbr package ('sudo apt-get install mbr') and use it to install an MBR on the harddrive of the virtual machine (/dev/sda in my case, checked in the output of dmesg): 'sudo install-mbr-i n -p D -t 0 /dev/sda' (check the man-page of install-mbr to identify the details of those standard options, they're basically no interaction, use first bootable partition, and use a timeout of zero). (We could have used other Linux means as well, for example the mbr.bin image provided by the syslinux package.) Now let's try another reboot.

Well, it was a so called "fall forward". Instead of the solid, low-cursor it now stood "MBR" in the upper-left corner and then we got to an error screen, complaining about winload.exe being 
missing or corrupt ("Status 0xc000000e", "Windows failed to start. A recent hardware or software change might be the cause.", "File: \Windows\System32\winload.exe", "Info: The selected entry could not be loaded because the application is missing or corrupt."). Bugger... Out of ideas, I turned to Google and found this eminent guide. By luck, I already had a Windows XP virtual guest system I could attach the new machine's virtual harddrive to - otherwise, I would probably not been able to get any further. Now, I could, from XP, follow the guide, i.e., in my case, the new harddrive was mapped as E:, so I could run E:\Windows\System32\bcdedit /store F:\Boot\BCD /enum to check if I suffered from the same situation as he had. I sure did. with three unknowns. I changed them all to "boot" according to his recipe (E:\Windows\System32\bcdedit /store F:\Boot\BCD /set {X} Y boot, where (X, Y) were (bootmgr, device), (default, device), and (default, osdevice), respectively.

Upon next boot, the new machine started-up and begun configuring up itself, completing the factory reset. Cool! 

onsdag 7 maj 2014

I Stand Humbly Corrected...

So, I've been dabbing with trying to boot a set of "XYZ" recovery DVDs in a VirtualBox guest, to see if I can get a virtual factory reset running.

The first problem I ran into is that it seems that my USB-hub cannot support both the external USB-harddrive the virtual host resides on and the USB DVD reader at the same time - and, of course, I have too little space left on my internal drive and too few USB ports to have them directly attached (well, I could direct attach one with the inconvenience of unplugging the external mouse and keyboard, but we can't have that, can we?).

Thus, I needed to convert the 4.2 GB DVD to an ISO-image - but both Copy Disk in Brasero and old, honsest dd only gave me a 120 MB small image - probably the actual boot image on the DVD or something, while mkisofs gave me a full 4.2 GB image but a non-bootable one. How annoying!

By despair (or to be systematical), I also tried a tip I had up to that point considered too naive and fault-prone: to just cat the DVD-reader device to a file (i.e., 'cat /dev/sr0 > /tmp/recovery1.iso') and do you know what? It worked! VirtualBox booted the virtual host right of it and let me choose to do a factory reset (It is currently at 65%, so I don't know if it ultimately will work yet.)

Who would have thought that simple cat would be so true to the underlying bit device that it rendered the copied image as such a perfect copy? I am sure that given the right options to mkisofs, I could get the image bootable (it has a ton of different options to set this-and-that type of bootable disk to boot from a certain file, but I would rather not have to learn how to analyse the configuration of the DVD at hand enough to be able to pick all the right mkisofs options to make a one-to-one copy of it. Shouldn't mkisofs really be able to do that itself?

Video Woes

In McGuff's and Little's excellent "Body by Science", they advocate for strength-training with slow lifts (to remove any helping momentum) and to measure the Time Under Load rather than just number of repetitions, to get a more fine-grained control over your progress (or lack thereof). However, to concentrate on the weights and the clock at the same time is, of course, a drag. Better to film oneself and analyse the video after the workout - and which is the most ubiquitous movie camera around these days? Right, your mobile phone.

This scheme worked well for a number of workouts until the camera app in my phone crashed while recording, giving me a video no player will play. Why? Because the 3gp format of my phone (basically a mp4 variant with more compression) slaps on the frame index and codec information last in the file - and when the app crashes in mid-recording, no such index is added and the file ends up an inaccessible pile of junk data...

So, would I be forced to write-off the workout as a session without any data to track? Not without a fight. Some googling revealed that Federico Ponchio's Untrunc and Grau's Video Repair Software were the two most promising candidates. Both produced something from my broken video - unfortunately, Untrunc slapped on some codec all my players were missing (and that wasn't easily installable) but, luckily, Grau's program worked well, at least for the video - the audio become unsynced.

Alas, not only are the video and audio out of sync, it seems the video is somewhat in slow motion as well, so it turned out to be unusable for my post-workout analysis anyway. Too bad, but at least I got a story to tell out of it...

lördag 19 april 2014

Fun with openwrt

I wanted to extend my network at home, with one router by the fibre-node downstairs and one by our desks upstairs. Also, I wanted to tryout DD-WRT. However, since there are two different versions of TP-Link WE1043ND floating around and I happened to get the black v2.1 version that are currently not supported by DD-WRT, I opted for OpenWRT instead.

It was quite scary to flash the new router with the OpenWRT image, but it worked flawlessly and in minutes I had configured it to do my bidding - despite OpenWRT looking seemingly overwhelming with all the choosable bells and whistles.

onsdag 22 januari 2014

Scraping Fund Data of the Internet - an Orgy in Different Tools

Previously, I've used a combination of Bash and curl to scrape pages with fund statistics from investment sites and then Perl scripts to data-mine the scraped pages and analyse the data. Good so - small, customized, targeted Bash and Perl scripts.

However, recently, I wanted to also use the Norman value (an estimate on how much a fund will cost over 10 years of keeping it in a investment portfolio). The only source for these value was one popular fund comparison site, so I set out to bash and curl my way to a loop that would scrape the data off it - only to run into serious trouble...

Using the Firebug plugin in Mozilla Firefox, the built-in Developer's Tool in Google Chrome, and eventually resorting to Web-Charles to really be able to see what was going on, I was able to exactly pin-point what my browser sent the site when I navigated the pages - but I was unable to mock the request in my scripts. Unfortunately, the site relied just too much on Javascripts dynamically building form-data to be posted to the site with every request. So I had to think of something else.

Enter Selenium, the Web Automation Tool. Since it is using an ordinary browser to perform the surfing, dynamic Javascript mumbo-jumbo is no match for it. But I still wanted to script my usage and eventually choose Watir for the job, once again being surprised over how much joy it is to program in Ruby!

So, writing a short Ruby program using the Watir framework to operate on Selenium, I was able to get the data I wanted from the site in question.

However, I begun with restricting the funds to just those of the bank I am a customer off (it was my funds there that was most important to re-evaluate and perhaps exchange for others at the turn of the year). When I for kicks tried to scrape all of the 2 500++ funds of the site, I ran into new problems. The site was simply not stable enough. Now and then, one would either end up on the very first fund presentation page with a subtle "An error occurred"-message, or actually a full-fledged "An error occurred"-page, or one would seemingly be on the right track, getting to the next of the 100++ pages of 20 funds at a time, but then realizing that it had silently thrown you from the target tab back to the default tab. To battle this, I had to go heavy on error handling, turning my short and elegant Ruby script into a less nice collection of rescue blocks for this and that exception (some that I defined and raised myself, for example if browser.text.visible?("An error occurred") is true).

All in all, a quite educational and rewarding exercise.