17 November 2011

WordWhacker V0.3

PublishedActual
Birmingham New Street 07:10 07:11
London Euston 08:30 08:26

Today is a strange one, there were about 14 inspectors/observers on the train platform watching trains arrive/leave. The funny thing is that today most of the trains have arrived and left late so I’ve been trying to decide if, at some paranormal level, one has influenced the other in some way.

Anyway, as previously covered I have been trying to prove that there are some words and phrases that are equivalent to others in some way.

v0.3

The WWW is a wonderful resource for finding information and resources that can be used freely, after some quick searching I found the FreeBSD wordlist which is a simple list of words.

My idea now was to generate the hash for every word in the word list, I would then compare the hash of the company name against the hashed word list to find matching words.

I transformed the file into a usable state, a quick addition to the Java code and I had a method which read in the text file, generated the numeric value per word and then stored it in a Map ready for use.

package words;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

/**
 * Class to generate numerical values for words and compare equivalence to other words.
 *
 * @author a
 */
public class WordWhackerV03 {

    public enum Charset {
        ASCII, UNICODE, POSITIONAL
    }

    public static Map<Character,Integer> letters = new HashMap<Character,Integer>();
    public static Map<Character,Integer> asciiletters = new HashMap<Character,Integer>();
    static {
        asciiletters.put('A', 65);asciiletters.put('B', 66);asciiletters.put('C', 67);
        asciiletters.put('D', 68);asciiletters.put('E', 69);asciiletters.put('F', 70);
        asciiletters.put('G', 71);asciiletters.put('H', 72);asciiletters.put('I', 73);
        asciiletters.put('J', 74);asciiletters.put('K', 75);asciiletters.put('L', 76);
        asciiletters.put('M', 77);asciiletters.put('N', 78);asciiletters.put('O', 79);
        asciiletters.put('P', 80);asciiletters.put('Q', 81);asciiletters.put('R', 82);
        asciiletters.put('S', 83);asciiletters.put('T', 84);asciiletters.put('U', 85);
        asciiletters.put('V', 86);asciiletters.put('W', 87);asciiletters.put('X', 88);
        asciiletters.put('Y', 89);asciiletters.put('Z', 90);

        asciiletters.put('a', 97); asciiletters.put('b', 98); asciiletters.put('c', 99);
        asciiletters.put('d', 100);asciiletters.put('e', 101);asciiletters.put('f', 102);
        asciiletters.put('g', 103);asciiletters.put('h', 104);asciiletters.put('i', 105);
        asciiletters.put('j', 106);asciiletters.put('k', 107);asciiletters.put('l', 108);
        asciiletters.put('m', 109);asciiletters.put('n', 110);asciiletters.put('o', 111);
        asciiletters.put('p', 112);asciiletters.put('q', 113);asciiletters.put('r', 114);
        asciiletters.put('s', 115);asciiletters.put('t', 116);asciiletters.put('u', 117);
        asciiletters.put('v', 118);asciiletters.put('w', 119);asciiletters.put('x', 120);
        asciiletters.put('y', 121);asciiletters.put('z', 122);asciiletters.put(' ', 20);

        letters.put('a', 1); letters.put('b', 2); letters.put('c', 3);
        letters.put('d', 4); letters.put('e', 5); letters.put('f', 6);
        letters.put('g', 7); letters.put('h', 8); letters.put('i', 9);
        letters.put('j', 10);letters.put('k', 11);letters.put('l', 12);
        letters.put('m', 13);letters.put('n', 14);letters.put('o', 15);
        letters.put('p', 16);letters.put('q', 17);letters.put('r', 18);
        letters.put('s', 19);letters.put('t', 20);letters.put('u', 21);
        letters.put('v', 22);letters.put('w', 23);letters.put('x', 24);
        letters.put('y', 25);letters.put('z', 26);letters.put(' ', 0);
    }

    public Map<String,Integer> words = new HashMap<String,Integer>();

    BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));
    public Charset useCharset = Charset.ASCII;

    /**
     * @param args a {@link java.lang.String}[] of program arguments
     */
    public static void main(String[] args) {
        WordWhackerV03 whacker = new WordWhackerV03();
        whacker.driveApp();
    }

    /**
     * Utility method to control the flow of the application
     */
    public void driveApp() {
        this.createWordlist();
        String[] strings = new String[]{"Happiness", "Eternal Happiness",
                                        "Perpetual Happiness", "Happy Employees",
                                        "Motivational Happiness", "Creative Happiness",
                                        "Boundless Creativity"};
        for(String val : strings) {
            int value = this.getWordValue(val);
            Set<String> matchingWords = this.getKeysByValue(words, value);
            for(String word : matchingWords) {
                System.out.println(val + " = " + word);
            }
        }
    }

    /**
     * Read in a file and store in a Map
     */
    private void createWordlist() {
        File dictFile = new File(System.getProperty("user.dir")+"\\dict\\dict.txt");
        if(dictFile.exists()) {
            createWordList(dictFile);
        }
    }

    /**
     * Utility method to get all the matching Keys of a Map by the given value
     *
     * @param <K> the key object
     * @param <V> the value object
     * @param map a Map to search through
     * @param value the value to search for
     *
     * @return a set of keys which match the given value
     */
    private <K, V> Set<K> getKeysByValue(Map<K, V> map, V value) {
         Set<K> keys = new HashSet<K>();
         for (Entry<K, V> entry : map.entrySet()) {
             if (entry.getValue().equals(value)) {
                 keys.add(entry.getKey());
             }
         }
         return keys;
    }

    /**
     * Method to populate a Map of words and values from a given word list file
     *
     * @param file pointer to the word list file
     */
    private void createWordList(File file) {
        BufferedReader bufferedStream = null;
        try {
            bufferedStream = new BufferedReader(
                             new InputStreamReader(
                             new FileInputStream(file)));
            String line = "";
            while((line = bufferedStream.readLine()) != null) {
                words.put(line, getWordValue(line));
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if(bufferedStream != null) {
                try {
                    bufferedStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    /**
     * Method to return the numeric value for a given word
     *
     * @param word a {@link java.lang.String} containing the word
     * @return an int representing the words numeric value
     */
    private int getWordValue(String word) {
        int returnable = 0;
        char[] chars = word.toCharArray();
        for(char theChar : chars) {
            Integer charValue = null;
            switch(useCharset) {
                case ASCII:
                    charValue = asciiletters.get(theChar);
                break;
                case UNICODE:
                    charValue = Character.getNumericValue(theChar);
                break;
                case POSITIONAL:
                    charValue = letters.get(Character.toLowerCase(theChar));
                break;
                default:
                break;
            }
            if(charValue != null) {
                returnable = returnable + charValue;
            }
        }
        return returnable;
    }
}

I had created a program that would create a list of words that are ‘numerically equivalent’ to a given input, this was good but had a couple of downsides, as the input length grew the number of matches decreased as it was only matching to single words, also the appropriateness of results decreased to a point where the results list is just scientific terms. To demonstrate the downsides here are some of the words that are generated using the code above:

Happiness = advisable
Happiness = firebreak
Happiness = mechanics
Happy Employees = encephaloscope
Happy Employees = modificability
Happy Employees = archaeogeology
Eternal Happiness = archecclesiastic
Eternal Happiness = hypophosphorous
Eternal Happiness = anatomicomedical
Creative Happiness = hystricomorphous
Creative Happiness = hysteroproterize
Perpetual Happiness = micromineralogical
Boundless Creativity = facioscapulohumeral
Boundless Creativity = bacteriotherapeutic

Feeling slightly like the idea was good but the results currently weren’t I delivered the word lists to my friend and went on holiday…