17 November 2011

WordWhacker V0.3

PublishedActual
Birmingham New Street 07:10 07:11
London Euston 08:30 08:26

Today is a strange one, there were about 14 inspectors/observers on the train platform watching trains arrive/leave. The funny thing is that today most of the trains have arrived and left late so I’ve been trying to decide if, at some paranormal level, one has influenced the other in some way.

Anyway, as previously covered I have been trying to prove that there are some words and phrases that are equivalent to others in some way.

v0.3

The WWW is a wonderful resource for finding information and resources that can be used freely, after some quick searching I found the FreeBSD wordlist which is a simple list of words.

My idea now was to generate the hash for every word in the word list, I would then compare the hash of the company name against the hashed word list to find matching words.

I transformed the file into a usable state, a quick addition to the Java code and I had a method which read in the text file, generated the numeric value per word and then stored it in a Map ready for use.

package words;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

/**
 * Class to generate numerical values for words and compare equivalence to other words.
 *
 * @author a
 */
public class WordWhackerV03 {

    public enum Charset {
        ASCII, UNICODE, POSITIONAL
    }

    public static Map<Character,Integer> letters = new HashMap<Character,Integer>();
    public static Map<Character,Integer> asciiletters = new HashMap<Character,Integer>();
    static {
        asciiletters.put('A', 65);asciiletters.put('B', 66);asciiletters.put('C', 67);
        asciiletters.put('D', 68);asciiletters.put('E', 69);asciiletters.put('F', 70);
        asciiletters.put('G', 71);asciiletters.put('H', 72);asciiletters.put('I', 73);
        asciiletters.put('J', 74);asciiletters.put('K', 75);asciiletters.put('L', 76);
        asciiletters.put('M', 77);asciiletters.put('N', 78);asciiletters.put('O', 79);
        asciiletters.put('P', 80);asciiletters.put('Q', 81);asciiletters.put('R', 82);
        asciiletters.put('S', 83);asciiletters.put('T', 84);asciiletters.put('U', 85);
        asciiletters.put('V', 86);asciiletters.put('W', 87);asciiletters.put('X', 88);
        asciiletters.put('Y', 89);asciiletters.put('Z', 90);

        asciiletters.put('a', 97); asciiletters.put('b', 98); asciiletters.put('c', 99);
        asciiletters.put('d', 100);asciiletters.put('e', 101);asciiletters.put('f', 102);
        asciiletters.put('g', 103);asciiletters.put('h', 104);asciiletters.put('i', 105);
        asciiletters.put('j', 106);asciiletters.put('k', 107);asciiletters.put('l', 108);
        asciiletters.put('m', 109);asciiletters.put('n', 110);asciiletters.put('o', 111);
        asciiletters.put('p', 112);asciiletters.put('q', 113);asciiletters.put('r', 114);
        asciiletters.put('s', 115);asciiletters.put('t', 116);asciiletters.put('u', 117);
        asciiletters.put('v', 118);asciiletters.put('w', 119);asciiletters.put('x', 120);
        asciiletters.put('y', 121);asciiletters.put('z', 122);asciiletters.put(' ', 20);

        letters.put('a', 1); letters.put('b', 2); letters.put('c', 3);
        letters.put('d', 4); letters.put('e', 5); letters.put('f', 6);
        letters.put('g', 7); letters.put('h', 8); letters.put('i', 9);
        letters.put('j', 10);letters.put('k', 11);letters.put('l', 12);
        letters.put('m', 13);letters.put('n', 14);letters.put('o', 15);
        letters.put('p', 16);letters.put('q', 17);letters.put('r', 18);
        letters.put('s', 19);letters.put('t', 20);letters.put('u', 21);
        letters.put('v', 22);letters.put('w', 23);letters.put('x', 24);
        letters.put('y', 25);letters.put('z', 26);letters.put(' ', 0);
    }

    public Map<String,Integer> words = new HashMap<String,Integer>();

    BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));
    public Charset useCharset = Charset.ASCII;

    /**
     * @param args a {@link java.lang.String}[] of program arguments
     */
    public static void main(String[] args) {
        WordWhackerV03 whacker = new WordWhackerV03();
        whacker.driveApp();
    }

    /**
     * Utility method to control the flow of the application
     */
    public void driveApp() {
        this.createWordlist();
        String[] strings = new String[]{"Happiness", "Eternal Happiness",
                                        "Perpetual Happiness", "Happy Employees",
                                        "Motivational Happiness", "Creative Happiness",
                                        "Boundless Creativity"};
        for(String val : strings) {
            int value = this.getWordValue(val);
            Set<String> matchingWords = this.getKeysByValue(words, value);
            for(String word : matchingWords) {
                System.out.println(val + " = " + word);
            }
        }
    }

    /**
     * Read in a file and store in a Map
     */
    private void createWordlist() {
        File dictFile = new File(System.getProperty("user.dir")+"\\dict\\dict.txt");
        if(dictFile.exists()) {
            createWordList(dictFile);
        }
    }

    /**
     * Utility method to get all the matching Keys of a Map by the given value
     *
     * @param <K> the key object
     * @param <V> the value object
     * @param map a Map to search through
     * @param value the value to search for
     *
     * @return a set of keys which match the given value
     */
    private <K, V> Set<K> getKeysByValue(Map<K, V> map, V value) {
         Set<K> keys = new HashSet<K>();
         for (Entry<K, V> entry : map.entrySet()) {
             if (entry.getValue().equals(value)) {
                 keys.add(entry.getKey());
             }
         }
         return keys;
    }

    /**
     * Method to populate a Map of words and values from a given word list file
     *
     * @param file pointer to the word list file
     */
    private void createWordList(File file) {
        BufferedReader bufferedStream = null;
        try {
            bufferedStream = new BufferedReader(
                             new InputStreamReader(
                             new FileInputStream(file)));
            String line = "";
            while((line = bufferedStream.readLine()) != null) {
                words.put(line, getWordValue(line));
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if(bufferedStream != null) {
                try {
                    bufferedStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    /**
     * Method to return the numeric value for a given word
     *
     * @param word a {@link java.lang.String} containing the word
     * @return an int representing the words numeric value
     */
    private int getWordValue(String word) {
        int returnable = 0;
        char[] chars = word.toCharArray();
        for(char theChar : chars) {
            Integer charValue = null;
            switch(useCharset) {
                case ASCII:
                    charValue = asciiletters.get(theChar);
                break;
                case UNICODE:
                    charValue = Character.getNumericValue(theChar);
                break;
                case POSITIONAL:
                    charValue = letters.get(Character.toLowerCase(theChar));
                break;
                default:
                break;
            }
            if(charValue != null) {
                returnable = returnable + charValue;
            }
        }
        return returnable;
    }
}

I had created a program that would create a list of words that are ‘numerically equivalent’ to a given input, this was good but had a couple of downsides, as the input length grew the number of matches decreased as it was only matching to single words, also the appropriateness of results decreased to a point where the results list is just scientific terms. To demonstrate the downsides here are some of the words that are generated using the code above:

Happiness = advisable
Happiness = firebreak
Happiness = mechanics
Happy Employees = encephaloscope
Happy Employees = modificability
Happy Employees = archaeogeology
Eternal Happiness = archecclesiastic
Eternal Happiness = hypophosphorous
Eternal Happiness = anatomicomedical
Creative Happiness = hystricomorphous
Creative Happiness = hysteroproterize
Perpetual Happiness = micromineralogical
Boundless Creativity = facioscapulohumeral
Boundless Creativity = bacteriotherapeutic

Feeling slightly like the idea was good but the results currently weren’t I delivered the word lists to my friend and went on holiday…

09 November 2011

WordWhacker V0.2

PublishedActual
Birmingham New Street 07:10 07:10
London Euston 08:30 08:30

As previously covered, I have been trying to prove that there are some words and phrases that are equivalent to others in some way.

v0.2

The update to the process would be to generate the hash based on the ASCII charset, this added some extra complexity as spaces have a value and a capitalised letter is 32 smaller than its lower case equivalent, the ideal solution would be to write a bit of code, bring on the Java. The program I created was functional, in no way was it meant to be elegant, in no way was it supposed to live up to OO design, it’s a quick and dirty script.

package words;

import java.util.HashMap;
import java.util.Map;

/**
 * Class to generate numerical values for words and compare equivalence to other words.
 *
 * @author a
 */
public class WordWhackerV02 {

    public enum Charset {
        ASCII, UNICODE, POSITIONAL
    }

    public static Map<Character,Integer> letters = new HashMap<Character,Integer>();
    public static Map<Character,Integer> asciiletters = new HashMap<Character,Integer>();
    static {
        asciiletters.put('A', 65);asciiletters.put('B', 66);asciiletters.put('C', 67);
        asciiletters.put('D', 68);asciiletters.put('E', 69);asciiletters.put('F', 70);
        asciiletters.put('G', 71);asciiletters.put('H', 72);asciiletters.put('I', 73);
        asciiletters.put('J', 74);asciiletters.put('K', 75);asciiletters.put('L', 76);
        asciiletters.put('M', 77);asciiletters.put('N', 78);asciiletters.put('O', 79);
        asciiletters.put('P', 80);asciiletters.put('Q', 81);asciiletters.put('R', 82);
        asciiletters.put('S', 83);asciiletters.put('T', 84);asciiletters.put('U', 85);
        asciiletters.put('V', 86);asciiletters.put('W', 87);asciiletters.put('X', 88);
        asciiletters.put('Y', 89);asciiletters.put('Z', 90);

        asciiletters.put('a', 97); asciiletters.put('b', 98); asciiletters.put('c', 99);
        asciiletters.put('d', 100);asciiletters.put('e', 101);asciiletters.put('f', 102);
        asciiletters.put('g', 103);asciiletters.put('h', 104);asciiletters.put('i', 105);
        asciiletters.put('j', 106);asciiletters.put('k', 107);asciiletters.put('l', 108);
        asciiletters.put('m', 109);asciiletters.put('n', 110);asciiletters.put('o', 111);
        asciiletters.put('p', 112);asciiletters.put('q', 113);asciiletters.put('r', 114);
        asciiletters.put('s', 115);asciiletters.put('t', 116);asciiletters.put('u', 117);
        asciiletters.put('v', 118);asciiletters.put('w', 119);asciiletters.put('x', 120);
        asciiletters.put('y', 121);asciiletters.put('z', 122);asciiletters.put(' ', 20);

        letters.put('a', 1); letters.put('b', 2); letters.put('c', 3); letters.put('d', 4);
        letters.put('e', 5); letters.put('f', 6); letters.put('g', 7); letters.put('h', 8);
        letters.put('i', 9); letters.put('j', 10);letters.put('k', 11);letters.put('l', 12);
        letters.put('m', 13);letters.put('n', 14);letters.put('o', 15);letters.put('p', 16);
        letters.put('q', 17);letters.put('r', 18);letters.put('s', 19);letters.put('t', 20);
        letters.put('u', 21);letters.put('v', 22);letters.put('w', 23);letters.put('x', 24);
        letters.put('y', 25);letters.put('z', 26);letters.put(' ', 0);
    }

    public Charset useCharset = Charset.ASCII;

    /**
     * @param args a {@link java.lang.String}[] of program arguments
     */
    public static void main(String[] args) {
        WordWhackerV02 whacker = new WordWhackerV02();
        whacker.useCharset = Charset.POSITIONAL;
        String[] strings = new String[]{"Happiness", "Eternal Happiness",
                            "Perpetual Happiness", "Happy Employees",
                            "Motivational Happiness", "Creative Happiness",
                            "Boundless Creativity"};
        for(String val : strings) {
            System.out.println(val + " " + whacker.getWordValue(val));
        }
    }

    /**
     * Method to return the numeric value for a given word
     *
     * @param word a {@link java.lang.String} containing the word
     * @return an int representing the words numeric value
     */
    private int getWordValue(String word) {
        int returnable = 0;
        char[] chars = word.toCharArray();
        for(char theChar : chars) {
            Integer charValue = null;
            switch(useCharset) {
                case ASCII:
                    charValue = asciiletters.get(theChar);
                break;
                case UNICODE:
                    charValue = Character.getNumericValue(theChar);
                break;
                case POSITIONAL:
                    charValue = letters.get(Character.toLowerCase(theChar));
                break;
                default:
                break;
            }
            if(charValue != null) {
                returnable = returnable + charValue;
            }
        }
        return returnable;
    }
}

This script was useful for generating the values of input strings quickly but meant that I still had to think of phrases to compare against - surly an improvement to this would be to use a dictionary to search for numerically equivalent words.

08 November 2011

Dave Gorman's Powerpoint Presentation

PublishedActual
London Euston 17:03 17:05
Birmingham New Street 18:27 19:45

Oh dear, very late last night, I almost missed Dave Gorman. The show was due to start at 20:00 and I walked through the door with 5 minutes to spare, just enough time to find the seat.

The warm-up act was a guy called Jay Foreman, he was very funny with some witty songs and he was right, moon chavs has stuck in my head. The main event was amazing, if Dave Gorman’s Powerpoint Presentation is coming to a theatre near you go and see it, it’s been a long time since I laughed at a stand-up comic so much.

I suppose the delay did have one upside, it gave me time to start another post which will be finished and appear in the coming week.

07 November 2011

MarkDown

PublishedActual
Birmingham New Street 07:10 07:10
London Euston 08:30 08:33

I was speaking to one of my current colleagues the other day about writing documents and about how we approach it. For most cases my instinct has been to write plain text in a text editor, this is simple, clean and means that I can concentrate on the content rather than how it looks. If I wanted to format at the same time then I would mark the text up using HTML but this means that there is a temptation to prettify rather then write. My colleague told me that, like me, he writes plain text and faced similar frustrations until he started using MarkDown.

MarkDown allows a text document to be annotated in a way which, when processed, creates a well formatted and valid HTML document. There are other packages that I have used in the past, like LaTeX and Xilize, that do a similar job to MarkDown but I have found them to be too feature rich. This is a problem as instead of writing my inquisitive mind becomes distracted by trying to use the different features, this is where MarkDown is ideal as it has a small set of mark up. Another benefit of using MarkDown over other systems and HTML is that the actual mark-up syntax is small meaning a MarkDown document is still readable in raw format.

MarkDown Example


MarkDown source...

# Level 1 Heading #
This is some MarkDown with *italic* and **bold** text

## Level 2 heading ##
This section contains some `text that is source code related` and also a [link to the BBC](http://www.bbc.co.uk)

Generated HTML...

<pre><code>
  <h1>Level 1 Heading</h1>
  <p>
    This is some MarkDown with <em>italic</em> and <strong>bold</strong> text
  </p>
  <h2>Level 2 heading</h2>
  <p>
    This section contains some <code>text that is source code related</code> and also a <a href=&quot;http://www.bbc.co.uk&quot;>link to the BBC</a>
  </p>
</code></pre>

How it actually looks...

Level 1 Heading

This is some MarkDown with italic and bold text

Level 2 heading

This section contains some text that is source code related and also a link to the BBC


As you can see, the MarkDown script is cleaner and requires less keystrokes to type. To actually turn MarkDown into an output document you will need to install a transformer program; as previously posted, I use jEdit to write my text documents, this has a handy plug-in that allows the MarkDown text to be previewed in a browser or transformed into HTML for saving.

Another tool I use for MarkDown editing on my ipad is NOCS, this is a text editor that displays the document as rendered MarkDown by default but reverts to raw source for editing, it also integrates with Dropbox so that I can easily pick up my MarkDown documents wherever I am.

The final option for MarkDown processing is to install a tool like MultiMarkDown, this can be used from the command line to generate output in a number of different formats; MultiMarkDown is a superset of MarkDown and introduces extra features.