CSC148H1 SLOG

Wednesday 2 April 2014

Week 12 SLOG

So, this is the final week of lectures for CSC148. Time sure passes really quickly.

From what a few of my friends told me about upper year compsci courses, it would seem that this semester would probably the last time I do an assignment in Python. Since this will be the final entry of my course blog, I think it would be fitting to share my thoughts about the Python programming language.

My first impressions about Python is not exactly favourable. I remembered when I first used the Python IDE, I remarked that it runs at the pace of a tortoise compared to another programming language I am used to - C++. However, over time, I began to really appreciate the simplicity and the intuitive features in Python - no #include <iostream> and #include <string> for you, which allows the coder to concentrate on the code itself instead of the technicalities, which, I would like to warn anyone without prior knowledge of C based languages and is planning on doing 2nd year csc courses, can be a huge pain.

Of course, that is not all - since Python itself runs on C, there is a plethora of safety features that prevents, say an accidental wipe of your whole hard drive due to faulty code (C is particularly infamous for that). This aspect of Python really makes life so much simpler.

However, when all is said and done, Python is still not a programming language that is string enough to be used for serious coding. That said, Python is a very good scripting language that can be used in conjunction with other programming languages (if the compiler supports that feature).

Wednesday 26 March 2014

Week 11 SLOG

Since Timsort is not covered in class, this blog seems like the perfect place to write a few lines about the built-in sorting algorithm in Python. Timsort is a relatively new sorting algorithm, invented by Tim Peters for the Python programming language, and is arguably one of the best sorting algorithms invented so far.

The first thing to note about Timsort is that it is a hybrid sorting algorithm, utilizing both merge sort and insertion sort, and exploiting their respective strengths to achieve a worst case runtime of O(nlogn), which is best a modern sorting algorithm can achieve without the help of customized data structures or hash tables.

One of the central concepts of Timsort is the idea that any random list has a high probability of containing at least one sorted sublist, called 'run', which can be either non-descending or strictly descending. And then there is 'minrun', which is the minimum length of run required by the implementation of Timsort.

First, the algorithm searches for a run, if the run is shorter than the minrun, then items are removed from the remaining list and inserted into the correct position in the list. This process is repeated until all runs are obtained. After that, insertion sort, which is best for short lists, is used to sort the runs. When all of that is done, merge sort is used to merge (and sort) the runs and the remaining list.

Of course, the usage of merge sort and insertion sort in optimal conditions alone will not itself guarantee efficiency of the Timsort algorithm. The part that makes it fast depends on the size of the minrun and the management of the remaining list.

Wednesday 19 March 2014

Week 10 SLOG

Regular Expressions (regexes) are found everywhere in computer interfaces, even if they seem to be obscured from the typical user. Our Csc148H1 assignment 2 is all about working with them, and since then, I have developed a deep respect for the people involved in the coding of regexes, because while the programming process seems to be easy at first glance, it is actually very complicated. There are always cases you have not considered and even when you do, you might get it wrong. For example, the star '*' case might seem very easy to code for at first, but it actually involves creating slices of the original string to be compared with the remaining regex.

When I mentioned that regexes are sometimes obscured, I'm talking about how search engines such as Google Search parses strings in the form 'r1 r2 r3' as r1.r2.r3. When you think about it, there is a lot going underneath the shiny interfaces that we encounter in our daily virtual lives.

Thursday 13 March 2014

Week 9 SLOG

This week's class was primarily about the concept, implementation, and the performance advantage of the sorted binary tree data structure when it comes to searching. But while the worst case O(logn) runtime is impressive, there is actually another search method that eclipses the search power of the sorted binary tree - searching by means of a hash table.

First, a hash table is a array-like data structure that has a key and an associated value (similar to the dict type in Python). the nature of the value varies depends on the usage of the hash table. In the case of storage of raw data, the value is the memory address of the file/data. For the purpose of, say storing a list of numbers, the value *can* be the numbers themselves. So what are the keys? To obtain a key, a hash function is used to designate each value a unique number. So, when a user is searching for a particular number, the search engine only has to use the hash function to get a key, which can then be used to get the value directly from the table.

With hash tables, the worst case runtime is in the O(1) - constant time. Of course, the implementation is not easy and I would not be surprised if I only get to learn them in my third year.

Wednesday 5 March 2014

Week 8 SLOG

During the reading week, I stumbled upon metaclasses while looking for some ideas/inspiration for the second assignment. What that began as a misunderstanding (as always) of the requirements of the first part of a2 (make public attributes read-only) became a really (valuable?) learning experience when I found out that not only are all classes in Python objects, they are instances of metaclasses, the most common being the metaclass type. type(), for the uninitiated actually has 2 functions.

The first, of course, is to return the data type (note that since Python does not have data types in the traditional sense, the term does not necessarily apply) of an object. To do that, only one argument, the object itself has to be passed to type().

What most Python users don't know is that type can alternately act as a class 'factory', accepting three arguments, and returning a class (which is actually an object itself). The first is a str with the desired name of the class to be returned, the second is a tuple of the classes to inherit from, and the third is a dict of the desired class attributes. Note that class attributes are the variables that can be accessed by calling <class name>.<variable name>, and the embedded methods in the class.

So what exactly do metaclasses do? Well, when used outside of class definitions, it just returns a class from the arguments passed. The interesting part is when a metaclass argument is passed to class defintions. To do so, Python users can either write a function (or class method - very complicated) which returns a class and then add the argument metaclass = <desired metaclass> to the class defintion. When the class is evaluated prior to code execution, the metaclass will override the __new__ method in the class (__new__ is the default metaclass, which returns a basic class object), thus modifying the class before instances of the new class can be created.

Of course, that is not all Python users can do with metaclasses. Should the metaclass be defined as a class itself, there are many other things that can be done. But as I said, doing that is extremely complicated (no kidding!) and I'm not necessarily confident enough to write it here yet.

Wednesday 26 February 2014

Week 7 SLOG

This week, I am supposed to talk about recursion. Since I have already did an entry on the basics of that particular technique of coding, this week I shall discuss about a question that is frequently asked in computer science - Are there algorithms that can only be solved recursively?

Well, it turns out that the answer is yes and no.

No, theoretically, an advanced programmer may be able to make use of a Last-in-first-out (LIFO) stack that keeps a running list of the remaining tasks to handle the whole recursive process in a while loop which continues to push and pop tasks until the list is empty. This method is the most common among all the other workarounds, and is recognized by making the observation that recursion in programming is really just a special case of iteration.

On the other hand, in mathematical terms, yes, there are indeed certain problems that require recursion to arrive at an answer. A simple example is the Newton-Raphson method (learned in mat137). Another famous example is the Ackermann function. Of course, as mentioned earlier, in coding, one can always manage a stack to emulate the recursive calls on the function itself.

Thursday 13 February 2014

Week 6 SLOG

This week we finally learned about trees. So why do we need to learn trees and other recursively defined data structures? Well, there are actually quite a few reasons why people went through all this trouble to come up with the idea. Here I've compiled a short list of the benefits of using trees and other recursive data structures:

Simpler and more readable code.
No need to preserve state on each iteration.
More often than not more memory-efficient.
Easier to search for items, since tree traversals can be easily coded.