Sunday, January 15, 2012

Regular Expressions: Learning by doing

Why
Regular expressions are a behemoth of syntactical nightmares, that is at least the opinion I have always had of regular expressions in the past. The problem with learning it, in my opinion, is the lack of guided examples that are similar to what I want to accomplish. The tutorials tend to just be very generic explanations of the syntax and when you end up trying to use it to fit your purposes it all collapses on your head.
So I want to solve a scenario I need the help of regular expressions for and hopefully it will be helpful for someone else down the line.
This is just a very basic guide to a simple scenario and I will not go in depth into what exactly everything does. The hopes is just that it will bring a little clarity to other lost souls out there.

What
There is a need for remaking the logging logic for a system I am working on, since it was very basic and very outdated (timestamps didn't record milliseconds, no severity of the output in the logs and little to no context to the content of the print). After having that done I was curious about the aspect of developing a small python script which could extract useful information from the massive amounts of logs, to aid the people that actually have to read the logs when problems arise.
Some example situations might be as follows.

Only show entries for
  • A certain time/date interval.
  • Specific processes.
  • A specific process id.
So we need to be able to extract the different elements of a log entry so that the script then can utilize them in any way we might need them.

The log format I want to process is similar to the following:
[2012-01-13 13:00:00.250,    Error <   MyServer:8071  > [ContextInProcess|LocalInfo] Gosh! Something went wrong around here
[2012-01-13 13:00:00:270,     Info <otherserver:2871  > [BullshitDetector] MyServer was just joking.
Go ahead and ignore it.

What we can see here is that the information is padded to help with readability (since these logs are made for humans, so they need to be easy to read) and multi-line entries are allowed as well.

So this post will be focused on parsing out the different elements of the log entries so they can be processed later.

How
Since we have some different elements to the log entries we want to be able to extract these and put names to them so we can easily utilize them later.

So since this is quite a daunting task, for the regex-phobic, we take this one step at a time and start very simple.
First we create a simple python script which just reads in my log sample from above and runs it through re.compile and then we can continue building from there.
import re
def main():
    file = open("Log.txt")
    com = re.compile(r'^(?P<everything>.*)', re.X)
    for line in file:
        match = com.match(line);
        if match:
            print match.groupdict()
    file.close()
    pass

if __name__ == '__main__':
    main()

The python code itself should be quite straight forward.
We open Log.txt, then read it one line at a time and run it through the regular expression we set up, if it matches we just print out the resulting dictionary, then we close the file.
One interesting thing about the python code is re.X, this enables verbose mode for the regex compiler.
This basically means that it will ignore any white spaces we insert into the expression (with some exceptions), and allows us to comment the expression with #, which allows us to make the expression readable in our code.
So this is what it would look like if we tried to structure the above expression a little bit better.

re.compile(r'''
    ^                   # from the start of the line
    (?P<everything>     # we create a group called "everything"
        .*              # the group will match anything
    )                   # end of group     
    ''', re.X)
It will give the exact same result, but now it is so readable that I doubt I will have to explain the elements of this expression.
So now lets tackle the expression itself. After running the script you would get the following output.


{'everything': '[2012-01-13 13:00:00.250,    Error <   MyServer:8071  > [ContextInProcess|LocalInfo] Gosh! Something went wrong around here'}
{'everything': '[2012-01-13 13:00:00:270,     Info <otherserver:2871  > [BullshitDetector] MyServer was just joking.'}
{'everything': 'Go ahead and ignore it.'}

We got three prints, one for every line in the file. And we can see that in every print we have a dictionary with only one key (everything) and it contains the entire row from the log. This should be pretty obvious from the commented version of the expression why it would result in this.

So this is all fine and dandy, but it is not very usable in this manner. We would like to group the different elements in the log entries so we get a nicely formatted dictionary at the end with all the relevant keys that we would need.
We also want to ensure that the entries correctly formatted. So that if there were ever a multi-line log entry printed which looks like this:
[2012-01-13 15:00:00.980,    Error <   MyServer:8071  > [ContextInProcess|LocalInfo] Interesting stuff on the next line.
[something, other < whatever: nice > [ hmm ] ok
It should fail since we would expect the first parameter to be a timestamp and not a string. Then we should know that this line is actually a part of the last log entry we parsed out and it should be appended to it.

Since we have delimiters in the format of the log it is quite simple to extract the different groups we want.
In this next version we have just extracted the different elements of our log entry and given them appropriate names.

com = re.compile(r'''
        ^\[                 # the line starts with a [
        (?P<timestamp>      # timestamp group
            .*
        )
        , # delimiter
        (?P<severity>       # severity group
            .*
        ) 
        < # delimiter
        (?P<process>        # process group
            .*
        )
        : # delimiter 
        (?P<pid>            # process id group
            .*
        )
        >\ \[ # delimiter for "> ["
        (?P<context>        # context group
            .*
        )
        \] # delimiter
        (?P<entry>          # the log entry group
        .*
        )
    ''', re.X)
This gives us:
{'severity': '    Error ', 'process': '   MyServer', 'timestamp': '2012-01-13 13
:00:00.250', 'pid': '8071  ', 'context': 'ContextInProcess|LocalInfo', 'entry':
' Gosh! Something went wrong around here'}
{'severity': '     Info ', 'process': 'otherserver', 'timestamp': '2012-01-13 13
:00:00:270', 'pid': '2871  ', 'context': 'BullshitDetector', 'entry': ' MyServer
 was just joking.'}
Now we have all the groups we need, but no validation for any of the fields. This code is also quite simple. We just specify our groups and then select our delimiters between the groups.
Note that we have to escape special characters and spaces. So at line 18 you see ">\ \[" but it means "> [" and that is the three character delimiter between the process id group and context group that we have in our format. The results of the script have not trimmed the excess white spaces from the elements, but this is outside of the scope of this post since that is very easily handled in python later.

We could of course stop here since now we do have the data that we want in the format that we want.
However the special case mentioned earlier about the multi-line entry would give a false positive with the code we currently have. What we can do to avoid this is to specify the expected contents of the different groups we have. So that if it doesn't match what we expect, then it wont return anything and we will take this as a hint that this line is a continuation of the last match we had.

So here comes a final version of our expression.
com = re.compile(r'''
        ^\[                 # the line starts with a [
        (?P<timestamp>      # timestamp group
            [0-9]{4}    # year
            -
            [0-9]{2}    # month
            -
            [0-9]{2}    # day
            \ # whitespace
            [0-9]{2}    # hour
            :
            [0-9]{2}    # minute
            :
            [0-9]{2}    # second
            .
            [0-9]{3}    # millisecond
        )
        , # delimiter
        (?P<severity>       # severity group
            [A-za-z ]+
        ) 
        < # delimiter
        (?P<process>        # process group
            [A-Za-z ]+
        )
        : # delimiter 
        (?P<pid>            # process id group
            [0-9 ]+
        )
        >\ \[ # delimiter for "> ["
        (?P<context>        # context group
            .*
        )
        \] # delimiter
        (?P<entry>          # the log entry group
            .*
        )
    ''', re.X)
What we have done here is set up some rules for some of the groups (timestamp, severity, process and pid) which makes it harder to get false positives.
If we take a look at the timestamp group we see the following type of statements:
[0-9]{4}
[0-9] means that we want to match any number between and including 0-9.
{4} means that we want there to be exactly 4 of such numbers.
So the above one would indicate a year, since it has four digits.
In this manner we have defined up the entire timestamp in such variations, with the appropriate delimiters in between. This will not impact our original group, since the outer group remains unchanged, we have just changed the pattern with which it detects a valid timestamp group.

In the severity group we have:
[A-Za-z ]+
[A-Za-z ] This is just a variation of the one from the timestamp one. It means that we want all letter ranges between upper-case and lower-case A-Z. Also notice the white space added, this is because we are not trimming white spaces, so if there is a white space in there it would fail if we got there. Note that we do not have to escape the white space here.
+ Means one or more. So if one of the fields were to be blank it would fail. If we wouldn't care if they were empty we could replace them with * which is zero or more.
The context and entry groups we leave as they were since they can be very diverse and we don't want any restrictions there.

Afterthought
With the discovery of the verbose flag for regex it sure became a lot easier to read these kinds of expressions, so that was an encouraging discovery.
There are still some things that could be done with the expression. For example the context group could be entirely optional which would impact the delimiters etc, so that might be material for a second post in the future if I end up going there.
My opinions of regular expressions have slightly improved after doing this, but I still find it rather crude when you want to do some more "complex" things so there is still a long way to go before I become a convert.


Let me know in the comments what you thought. Did it give you anything even though it was very basic?

Thursday, December 16, 2010

Portfolio Entry: Mind Sway

About

Mind Sway is a prototype Facebook game developed by me for xDelia, my employer May 2010 - February 2011.


Purpose
Simply put, the goal of this game is to help the person that plays it become more financially responsible. To facilitate this goal Mind Sway is developed as a platform for testing and improving financial capability in people. The desire is that the game will contain an avatar which is semi-autonomous and can be controlled by the player, but which can end up making horrible decisions due to high impulsiveness, and other factors. The player can then opt to participate in minigames aimed at the factors the avatar is poor at and at the same time improve this factor in themselves as well, by implementing scientifically tested tasks which has the desired results.


Technology
  • Implemented in Unity3d
  • Utilizes the Behave library for behavior trees used by all characters in the game.
  • A* Pathfinding library for all pathing in the game.
  • MySQL for the storing of all user data.
  • Facebook API for identifying users and getting any pertinent information (such as finding out which friends the player has whom are playing the game as well)
Media

Monday, December 6, 2010

Cooking for UDK

I attempted to cook an alpha version of my UDK Game, Zombie Road, this weekend but soon ran into problems with missing scripts and packages. So here is a small check list of things that needs to be done to fix it, beyond adding things to the .ini files.

  1. Compile game in frontend.
  2. Cook game, don't forget adding your maps to the maplist.
  3. Go to your UDKGame\Scripts folder and copy your game's .u file, in my case ZombieRoad.u, to UDKGame\CookedPC
  4. Copy all used packages from UDKGame\Content to UDKGame\CookedPC which would be things like maps, weapons, vehicles etc.
  5. Package game in Frontend.
  6. Done

Up to step 2 is easy to find. Step 3 you can find on the UDK forums if you do the correct searches. Step 4 is trickier.
Hopefully this saves someone else the trouble of finding this out the hard way.

Tuesday, November 23, 2010

Portfolio Entry: Zombie Road [WIP]


Foreword
A collaborative project by me and Rasmus Welin done with UDK, Unreal Development Kit. I mainly handle all the code and Rasmus is in charge of of modelling, level editing etc. This post will represent the current state of the game and will be updated with time as new features are added to the game. For posts about the continued development on this game see the Zombie Road category.


Purpose
The purposes for doing this project are several.
  • Having a proper game in my portfolio
  • Gaining experience with Unreal Script
  • Gaining more experience in general game programming, since prior experiences have been with low level implementations.
Concept
Zombie Road is a co-op first person shooter. It is inspired by Killing floor, Left 4 Dead and other first person Zombie shooters, but also things from games like Ultima Online when it comes to character progression. Rather than using the level system it will be based upon improvement of individual skillsets. It will not contain player vs player gameplay and will be purely focused on co-op survival.

Features & Technology
This is a list of features that currently exists in the game and any particular technologies used to implement them.
  • Scaleform UI
    • Menus
    • HUD
  • Persistent user data
    • User data is stored on GAE, Google App Engine (character progress etc). GAE is also used for the server listings for the server browser.
    • UDK's steam integration is used to fetch the player's account ID. This is what is used to keep track of unique players without the need of our own account logic.
  • Server Browser
    • Implemented my own server browser with the help of GAE and scaleform. The servers are posted to and fetched from a GAE datastore and presented in a scaleform menu.
  • Wave based gameplay
    • Waves are easily configured in the UDK editor so you can specify things like:
      • Wave count
      • Amount of mobs for each wave (goal condition)
      • How many mobs are allowed to be spawned at any one time
      • Types of mobs to spawn
      • Frequency of mobs. Either by percent or a fixed number (for example 30% for certain mobs and that only 1 of a certain type of mob is allowed to be spawned (boss)
Media


Editing wave information in editor









Feel free to comment below.

Friday, April 23, 2010

Compiling a run-time generated DirectX 11 shader

I recently got the idea to try to create a procedural shader system for DirectX 11. This basically entails that one simply throws a mesh at the shader generator and it generates a customized shader for this mesh that suits it perfectly.
I knew it is being done in high-end game engines but I had no idea how one actually does it. I spent quite a lot of time on google trying to find this, but it basically looks like some universal secret even though it actually is quite simple. So I played around with it a bit and I managed to get it to work in the end. So the follow code snippet is the procedure I used for getting it to work, probably a whole load of other ones out there.
Just hope this saves other people the same frustrations I had for a while.
void GenerateShader()
{
    ostringstream shader;

    // Basic Vertex Shader
    shader << "float4 BasicVS( float4 Pos : POSITION ) : SV_POSITION" << endl;
    shader << "{" << endl;
    shader << " return Pos;" << endl;
    shader << "}" << endl;

    // Basic Pixel Shader
    shader << "float4 BasicPS( float4 Pos : SV_POSITION ) : SV_Target" << endl;
    shader <<"{" << endl;
    shader <<" return float4( 1.0f, 1.0f, 1.0f, 1.0f );" << endl;
    shader <<"}" << endl;

    // A standard DirectX 10 technique
    shader << "technique10 DefaultTechnique" << endl;
    shader << "{" << endl;
    shader << " pass p0" << endl;
    shader << " {" << endl;
    shader << "  SetGeometryShader(NULL);" << endl;
    shader << "  SetVertexShader(CompileShader(vs_4_0, BasicVS()));" << endl;
    shader << "  SetPixelShader(CompileShader(ps_4_0, BasicPS()));" << endl;
    shader << " }" << endl;
    shader << "}" << endl;

    // This is where the "magic" is at. Grab the char* from your stringstream
    // and feed it into D3DCompile() along with whatever other parameters you usually send it.

    ID3DBlob* errorBlob;
    unsigned int shaderSize = shader.str().size() * sizeof(char);
    HRESULT hr = D3DCompile(shader.str().c_str(), shaderSize, "none", 0, 0, "DefaultTechnique", "fx_4_0", D3D10_SHADER_ENABLE_STRICTNESS, 0, &blob_, &errorBlob);

    if( FAILED(hr) )
    {
        OutputDebugStringA( (char*)errorBlob->GetBufferPointer() );
    }

    SAFE_RELEASE( errorBlob );
}

So now all that is left is 99.9% of the work to actually switch case together a shader that supports all the things your different meshes might need.
Maybe I will post something on that later when I manage to make a dent in that daunting task :p

Wednesday, February 3, 2010

Portfolio Entry: Relief & Parallax Mapping



I implemented this during a weekend, mostly because relief mapping has always interested me. Decided to toss in parallax mapping as well when I was "in the neighborhood", and a little normal mapping for good measure as well. This is the basic implementation of both techniques with no optimizations or artifact corrections.

Here is a small code snippet. Nothing special, just don't like to not show code.
void GenerateShader()
float ray_intersect_relief(in float2 orig, in float2 dir)
{
    const float max_num_linear_steps = 20;
    const float max_num_binary_steps = 5;
    float depth = 0.0f;
    float size = 1.0f / max_num_linear_steps;

    // Attempt to find the first intersection against the relief map with a linear search.
    // This is because binary search can easily miss topography.
    for(int i = 0; i < max_num_linear_steps - 1; i++)
    {
        float4 t = gTexRelief.Sample(textureSampler, orig + dir * depth);
        if(depth < t.r)
            depth += size;
    }

    // A binary search to try and find the "edge" of the closest intersection point
    for(int j = 0; j < max_num_binary_steps; j++)
    {
        size *= 0.5f;
        float4 t = gTexRelief.Sample(textureSampler, orig + dir * depth);
        if(depth < t.r)
            depth += 2*size;
        depth -= size;
    }
    return depth;
}

Saturday, January 9, 2010

100 days later (give or take a few)

So... It's been a while.

Life is good, excluding the torment of a dying molar, and not a whole lot has happened in this time of silence. Work has been good, but has not really been related to what I wanted to write about in this blog so I opted to stay quiet for a while.

Lately I started looking at available positions in various game companies and it was somewhat upsetting to notice that you are not really suited for filling any position other than as a junior. So I basically decided that I really needed to do something about my portfolio. It was a while back that I decided this, but there was Christmas and stuff in the way so I haven't really started doing stuff until now. Now it has begun though.

The general idea is that I will simply implement all the various things I have always wanted to try and then post the results in the portfolio section (hopefully with some binaries as well). The types of thing I will be implementing will mostly be 3D graphics related, since this is the area I always seem to find myself drooling over. But there will also be some other stuff like resource management and so on.

I have already implemented the first thing I will be putting on the portfolio (relief mapping) and it will be going up there shortly (probably not today though). Other things to be expected, in the very near future, are things like, just to name a few: Parallax mapping, Screen Space Ambient Occlusion, Deferred rendering.

If one refuses to work for any other industry one can only blame themselves for not doing everything in their power to be worthy of working there.