How to get a time difference of two log entries

Posted on March 23, 2021 by Adrian Wyssmann ‐ 5 min read

I recently was tasked with a small problem on how to get a time difference of two log entries

Problem

Considering we have the following log and we are interesting in the amount of seconds the demoengine process until it’s up and running:

Mar  2 06:00:00 demoengine[2335]: [DEMO|CallbackManager.cpp:102|0x59280b40] 1.5GB of virtual memory used
Mar  2 07:00:00 demoengine[2335]: [DEMO|CallbackManager.cpp:102|0x59280b40] 1.5GB of virtual memory used
Mar  2 08:00:00 demoengine[2335]: [DEMO|CallbackManager.cpp:102|0x59280b40] 1.5GB of virtual memory used
Mar  2 09:00:00 demoengine[2335]: [DEMO|CallbackManager.cpp:102|0x59280b40] 1.5GB of virtual memory used
Mar  2 10:32:57 demoengine[1287]: [DEMO|DemoEngineMain.cpp:122|0xb88ec60] starting demoengine 1.0.0.0 (x86) from [/usr/bin/demoengine]
Mar  2 10:33:00 demoengine[1287]: [DEMO|DataStorage.cpp:1370|0xb88ec60] loading database [/var/demoengine/demoengine.db]
Mar  2 10:33:00 demoengine[1287]: [DEMO|DataStorage.cpp:2043|0xb88ec60] database loaded in 38ms
Mar  2 10:33:00 demoengine[1287]: [DEMO|CallbackManager.cpp:102|0x594a4fc0] 1.4GB of virtual memory used
Mar  2 10:33:00 demoengine[1287]: [DEMO|DemoEngineMain.cpp:318|0xb88ec60] demoengine is running
Mar  2 11:00:01 demoengine[1287]: [DEMO|CallbackManager.cpp:102|0x594a4fc0] 1.4GB of virtual memory used
Mar  2 12:00:00 demoengine[1287]: [DEMO|CallbackManager.cpp:102|0x594a4c60] 1.4GB of virtual memory used
...
Mar  4 09:32:58 demoengine[1287]: [DEMO|DemoEngineMain.cpp:122|0xb88ec60] starting demoengine 1.0.0.0 (x86) from [/usr/bin/demoengine]
Mar  4 09:33:00 demoengine[1287]: [DEMO|DataStorage.cpp:1370|0xb88ec60] loading database [/var/demoengine/demoengine.db]
Mar  4 09:33:00 demoengine[1287]: [DEMO|DataStorage.cpp:2043|0xb88ec60] database loaded in 38ms
Mar  4 09:33:02 demoengine[1287]: [DEMO|DemoEngineMain.cpp:318|0xb88ec60] demoengine is running
...
Mar  6 09:32:58 demoengine[1287]: [DEMO|DemoEngineMain.cpp:122|0xb88ec60] starting demoengine 1.0.0.0 (x86) from [/usr/bin/demoengine]

Two things which are important:

  • starting demoengine indicates that the process is starting.
  • demoengine is running indicates that the process has started to run.

The result should be displayed in the following format:

Start	Cycle: 1 Duration: HH:MM:SS
Start	Cycle: 2 Duration: HH:MM:SS
...
Start	Cycle: n Duration: HH:MM:SS

How to solve?

I want to solve this using bash and default tools which you usually have available on Linux. So first we are interested in the respective lines, as mentioned above. This can be easily achieved with grep.

grep -E '.*starting|running.*' demoengine.log
Mar  2 10:32:57 demoengine[1287]: [DEMO|DemoEngineMain.cpp:122|0xb88ec60] starting demoengine 1.0.0.0 (x86) from [/usr/bin/demoengine]
Mar  2 10:33:00 demoengine[1287]: [DEMO|DemoEngineMain.cpp:318|0xb88ec60] demoengine is running
Mar  4 09:32:58 demoengine[1287]: [DEMO|DemoEngineMain.cpp:122|0xb88ec60] starting demoengine 1.0.0.0 (x86) from [/usr/bin/demoengine]
Mar  4 09:33:02 demoengine[1287]: [DEMO|DemoEngineMain.cpp:318|0xb88ec60] demoengine is running
Mar  6 09:32:58 demoengine[1287]: [DEMO|DemoEngineMain.cpp:122|0xb88ec60] starting demoengine 1.0.0.0 (x86) from [/usr/bin/demoengine]

Ultimately we only care about the time stamps, so another grep -o extracts them:

grep -E '.*starting|running.*' demoengine.log | grep -Eo '^.*[0-9]{2}\:[0-9]{2}\:[0-9]{2}'
Mar  2 10:32:57
Mar  2 10:33:00
Mar  4 09:32:58
Mar  4 09:33:02
Mar  6 09:32:58

Now as whe have the respective information, we can use awk to process the remaining log statements - one can probably write the whole thing in awk without using grep, but still my solution works fine. We have to differentiate two cases

  1. the app is started and running i.e. both statements are present
  2. the app is started but not yet running i.e. only started statement is present, but not the running (see also last line of the example log)

For 2. as we only have the lines necessary in our input for awk we can easily say that when the total line of numbers is odd, the last line cannot have a duration:

END {
    if((NR%2)) print "Start Cycle: " (NR +1)/2 " Duration: inconclusive "
} 

The END will be executed at the end, when all input is exhausted. Now awk processes line by line using statements which are encapsulated in { }. Now we care on each odd line and it’s subsequent next line. Using the first field $0 we can show the starting and running times as follows

!(NR%2) {
    print "Start Cycle: " NR/2 " Starting from " $0 " to " p
}
{ p = $0 }`

Which would output

Start Cycle: 1 Starting from Mar  2 10:33:00 to Mar  2 10:32:57
Start Cycle: 2 Starting from Mar  4 09:33:00 to Mar  4 09:32:58

However, what we care of is the difference between starting and running thus we would need to calculate the time difference. awk has time functions built in, but none of these is really suitable. First of all they work with timestamp which is Seconds since Jan 01 1970. We however have something like MM DD HH:MM:SS which has to be converted in a format we can calculate the difference. We can use date function but how? Looking at the documentation of awk we can find this

`system(cmd-line)

Execute the command cmd-line, and return the exit status. (This may not be available on non-POSIX systems.)

I had a look but I find it very hard to use for my purpose, so I looked further and found this stackoverflow post which relates to the awk-documentation:

expression | getline [var]

Read a record of input from a stream piped from the output of a command (see Input/Output and General Functions)

So I can use a bash function which handles the calculatoion of the time differences, assuming we pass both times:

getStartupTime() {
    startTime=$2
    endTime=$1
    let DIFF=`date -d "$endTime" +%s`-`date -d "$startTime" +%s`
    h=$(( DIFF / 3600 ))
    m=$(( ( DIFF / 60 ) % 60 ))
    s=$(( DIFF % 60 ))
    printf "%02d:%02d:%02d\n" $h $m $s
}

export -f getStartupTime

The function transforms the date/time to seconds and subtracts startTime from endTime and then convert it to HH:MM:SS. Important is also the export -f statement so that the function is available to awk. We can the use the output from the function getStartupTime() using the pipe and getline as mentioned above

cmd = "getStartupTime \"" $0 "\" \"" p "\""
cmd | getline var1

Final solution

Now that we have the bits and pieces, we have to bring this all together in a script get-startup-time.sh as follows:

#!/usr/bin/env bash
LOG_FILE="$1"

getStartupTime() {
    startTime=$2
    endTime=$1
    #echo "$startTime - $endTime"
    let DIFF=`date -d "$endTime" +%s`-`date -d "$startTime" +%s`
    h=$(( DIFF / 3600 ))
    m=$(( ( DIFF / 60 ) % 60 ))
    s=$(( DIFF % 60 ))
    printf "%02d:%02d:%02d\n" $h $m $s
}

export -f getStartupTime

function printStartupTimes {
    grep -E '.*starting|running.*' "$LOG_FILE" | grep -Eo '^.*[0-9]{2}\:[0-9]{2}\:[0-9]{2}' | \
    gawk '
        END {
            if((NR%2)) print "Start Cycle: " (NR +1)/2 " Duration: inconclusive "
        }
        !(NR%2) {
            cmd = "getStartupTime \"" $0 "\" \"" p "\""
            cmd | getline var1
            close(cmd)
            print "Start Cycle: " NR/2 " Duration: " var1 "s"
        }
        { p = $0 }
    '
}

printStartupTimes

Running the script against our example log file will output what we initially expected:

./get-startup-time.sh ./example.log
Start Cycle: 1 Duration: 00:00:03s
Start Cycle: 2 Duration: 00:00:04s
Start Cycle: 3 Duration: inconclusive