# The Earley Parser in NLTK

Recently I timed the earley parser included in NTLK. The results are quite surprising. This might have to do with the various ways to measure time: timeit, cProfile and datetime are solutions that are used all over the web.

This article is about parsing and measuring time. In the end I used datetime without timedelta, I only substracted two datetime objects.

I might however write a post about the other timing-tools as well.

## Data

The data I generated is very simple: I used the sentence the man saw the girl. Then I appended with the telescope. All the later executions append with the telescope, so the length of the sentence grows by 3 tokens. In addition I used a simple CFG, as shown in the code.

## Timedelta

timedelta is described as “A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.”

That sounds nice and I created a script which generates the following output:

Run Microseconds SentenceLength NormalizedMicroseconds
1 1567 5 313.4
2 2507 8 313.375
3 3781 11 343.727272727
4 5553 14 396.642857143
5 8172 17 480.705882353
6 10476 20 523.8
7 15129 23 657.782608696
8 23250 26 894.230769231
9 43597 29 1503.34482759
10 141894 32 4434.1875
11 434989 35 12428.2571429
12 669095 38 17607.7631579
13 428897 41 10460.902439
14 940391 44 21372.5227273


Where NormalizedMicroseconds might be visualized like this:

## Discussion

• find a polynom that fits the data
• analyze algorithm
• strace, cprofile, and friends

• profile python code with python -m cProfile myscript.py
• generate callgraphs with pycallgraph graphviz -- ./myscript.py