Tag Archives: programming

SQL Clearly Explained 3rd Edition – Jan L. Harrington

sqlclearlyexplained

學習架設網站,要一步步慢學,不要妄想一步登天。原本學完Ruby後,打算直接學寫RAILS,然後發現自已不懂SQL。沒有SQL的基本功夫,不能有效地架設網站,任何網站都要用database來儲存資料啊。說起database程式,很多年前我學過dBase III,在中學電腦科被迫學了一個學期。當年覺得學一個過時的軟件很浪費時間,想不到database程式設計的最基本慨念,廿幾年後會從封塵的記憶中找出來有用。

我挑選了好幾本學習SQL的書,思前想後到底用那一本好,最後決定用這本SQL Clearly Explained為主幹,再用兩本O’Reilly的書為輔助參考。很多SQL的書只會教某一個SQL server的應用,很容易見樹不見林,分不清到底那些是SQL語言本身,那些是某一個server的syntax。這本書從SQL標準入手,第一章不是教你安裝軟件,而是很有系統地講解relational database的理論。只要基本知識清楚明白,學syntax很簡單,可以邊用邊學,查document摸摸下就上手。那兩本O’Reilly的書,一本教SQLite,另一本教MySQL,兩大常用的database。那兩本書不是好的入門書,內容太著重講syntax,不過卻是十分有用的reference書。

51vowxhiiul learning_sql_2nd_ed

Relational database的理論,說難不難,說易不易,領悟到就一理通百理明。Database是什麼,不外乎一堆table。Table就好似Excel的spreadsheet咁,打橫打直一行行,每格就是一項資料。每一行有一格係primary key,用來look up那一行。一行之又可以有foriegn key,連結另外一個table的primary key,表明table與table之間行與行的關係,謂之relation是也。檢查搜尋database只有6個基本操作,所有都只是萬變不離其中。6個操作包括,filter行(WHERE)或filter列(SELECT),把兩組行的行加起來(UNION),找出相同的行(INTERSECT),找出不同的行(EXCEPT),而最重要的操作是JOIN,即係連結起primary key同foreign key既relation。

此書的第二部份教SQL Syntax,實習第一部份的SQL理論。這本書的另一個優點,是書中附帶現成的database,一間二手書店的存貨買賣資料,不似其他書只講空談syntax。我把書中的例子輸入了SQLite檔案,並附上各table的csv檔案,有興趣的朋友可以在此下載SQL Clearly Explained Sample Database。學習SQL的次序,最先學搜查,如何在database中找出你想要的資料,其次學如何更新資料,最後才學如何建立設計database的table。所以有一個現成的database,再跟著書中的不同應用例子,學搜查很方便,事半功部。更新資料的syntax很簡單,有很輔助function讓你可以寫少幾行code,最重要的慨念是ACID transaction,說穿了就是multi-threads的synchronization problem,只是改了個好聽的acronym。設計table那一章教CREATE的syntax,不過我想沒有人會用SQL去直接起table,不是有其他GUI工具更好用嗎。起一個有效率的table是門很高深的學問,入門書教syntax只是教了table design的皮毛。

餘下的第三部份和第四部份,我只是快速略讀。SQL本身是一個turing-complete的程式語言,第三部份教SQL寫一般程式的syntax,只是我不明白有什麼人會用,SQL的PROCEDURE和TRIGGER只是search filter的輔助工具,真正要做heavy lifting資料運算,為什麼不把資料讀出來後,用host功能更強大的programming langauge去process。第四部份教SQL中XML的功能和Object Relational Data Model,不過除了Oracle或Microsft那些商用級的server,SQLite和MySQL都不支援這些功能,不過要搜查更新XML,用DOM不是更方便嗎?我想不到有什麼理由,要把XML直接當數據儲存在database中,為什麼不先解讀了XML中的資料,再有系統地儲放入table中呢?Open Source的server中,只有PostgreSQL支援Object Relational Data Model,簡單來說就是把OOP中的object儲存在database中,懂OOP的人學Object Model很簡單。Object Relation就在table的一格內,可以儲存一個object,或array,set等等,原本table的一格只可以儲存一個數值。另外一行可以用pointer直接link去另一行度,比用foreign key更有效率,因為不用做search。

學完SQL,忽然引起我對database的興趣,暫時放下RAILS,(反正要等RAILS-5的新書出版,沒有理由走去學RAILS-4),決定先學其他No-SQL database,看看還有什麼其他儲存資料的其他方法,才決定網站該採用那種database最好。

Visual Quickstart Guide Ruby – Larry Ullman

ShowCover.aspx

這本書買了好幾年,一直提不起勁去認真看,此書當年出版時Ruby版本還是1.9,現在已經去到2.3。因為要架設網站,才臨急抱佛腳去讀書,先學慬基本的Ruby程式語言,才可以進階學習Ruby on Rails網站framework。架設網站為什麼不用近期流行的Node.js?一來我對Javascript沒有好感,Javascript周身刀冇張利,二來據說寫後台Ruby比Javascript寫起更得心應手,最重要是除了弄好個網站外外,我還想順便學習新的程式語言,做電腦這一行,若果不與時並進,很快便給淘汰了。

Ruby易不易學?我覺得很容易學,只用兩個星期左便上手,每晚看書太約一兩個小時。不過我的C/C++功夫扎實,加上因工作需要用了tcl這個非常古怪的程式語言十多年,Ruby的object orient programming部份有C++底子很易明,至於Ruby的dynamic programming部份,相信沒有任何程式語言比tcl更dynamic。Ruby的最大特點是所有東西都是object,包括任何variable,任何literal,甚至class和module definition都是object。因為所有東西都是object,在程式執行時,可以重寫任何method,加減class中的variables,寫code起來十分方便,不過一不小心亦很容易出錯。因為所有binding都是在run time進行,在傳統static或dynamic typing外外,Ruby獨有的duck typing,只要個object有同名的method,不理那個object是什麼就可以call。

這本書寫得十分淺白,書中的例子以irb實時執行,作者一邊教寫syntax,一邊用irb的output去解釋,為什麼執行這個指令,會得出這個結果。不過我認為Ruby並不適合初學程式者,因為它內建太多好使好用的魔法,不利學生觀測電腦如何執行程式。書中對一些高級Ruby魔法略過解釋,只是叫學生背書般不求甚解記下syntax和用法,如symbol,class attr,module include/extend等。我看書時完全看不明白,要另行上網看ruby.org的官方課本Ruby Monk的網上教材,結果我一口氣看了三本Ruby課本。真正讓我開竅,明白Ruby的精髓,是官方課本講ruby.h那一章。Ruby整個程式言是用C寫成,ruby.h便是C程式的源碼,讓開發員把Ruby程式連結其他語言的程式。當我看見ruby.h中,Ruby是如何create一個object,每一個object之間的關係,一份熟悉的感覺油然而生, ruby.h不正就是tcl.h的孖生兄弟嗎。

書中最後一章講Rails,不過Rails比Ruby本身進化得更快,書中教的那個舊版本2.x已完全過時,只有十幾頁紙完全不夠時間去教Rails,看完也是水過鴨背,還是去Rails的官方網站看user guide。其實學寫程式,真的不用俾錢上堂,網上有很多免費教材,買本書也十分便宜,自已跟著例子去玩下,寫下寫下便自自然然學識了。目標Ruby達成,下一個目標Rails。

The Terror of Code in the Wrong Hands

Here is a new term, software terrorist, who brings negative productivity to the team. I can attest that catching bug in poorly written code waste a lot more time than rewriting the code myself from scratch.

By Allen Holub, May 2005, SD Times

The 20-to-1 productivity rule says that 5 percent of programmers are 20 times more productive than the remaining 95 percent, but what about the 5 percent at the other end of the bell curve? Consider the software terrorist: the guy who stays up all night, unwittingly but systematically destroying the entire team’s last month’s work while “improving” the code. He doesn’t tell anybody what he’s done, and he never tests. He’s created a ticking time bomb that won’t be discovered for six months.

When the bomb goes off, you can’t roll back six months of work by the whole team, and it takes three weeks of your best programmer’s effort to undo the damage. Meanwhile, our terrorist gets a raise because he stays late so often, working so hard. The brilliant guy who cleans up the debris gets a bad performance review because his schedule has slipped, so he quits.

Valuable tools in the hands of experts become dangerous weapons in the hands of terrorists. The terrorist doesn’t understand how to use generics, templates and casts, and so with a single click on the “refactor” button he destroys the program’s carefully crafted typing system. That single-click refactor is a real time saver for the expert. Scripting languages, which in the right hands save time, become a means for creating write-only code that has to be scrapped after you’ve spent two months trying to figure out why it doesn’t work.

Terrorist scripts can be so central to the app, and so hard to understand, that they sometimes remain in the program, doubling the time required for all maintenance efforts. Terrorist documentation is a font of misinformation. Terrorist tests systematically destroy the database every time they’re run.

Terrorist work isn’t just nonproductive, it’s anti-productive. A terrorist reduces your team’s productivity by at least an order of magnitude. It takes a lot longer to find a bug than to create one. None of the terrorist code ends up in the final program because it all has to be rewritten. You pay the terrorists, and you also pay 10 times more to the people who have to track down and fix their bugs.

Given the difficulty that most organizations have in firing (or even identifying) incompetent people, the only way to solve this problem is not to hire terrorists at all; but the terrorists are masters of disguise, particularly in job interviews. They talk a good game, they have lots of experience, and they have great references because they work so hard.

Since the bottom 5 percent is indistinguishable from the rest of the bottom 95 percent, the only way to avoid hiring terrorists is to avoid hiring from the remaining 95 percent altogether.

The compelling reason for this strategy is that the 20-to-1 rule applies only when elite programmers work exclusively with other elite programmers. Single elite programmers who interact with 10 average programmers waste most of their time explaining and helping rather than working. Two elite programmers raise the productivity of a 20-programmer group by 10 percent. It’s like getting two programmers for free. Two elite programmers working only with each other do the work of at least 20 average programmers. It’s like getting 18 programmers for free. If you pay them twice the going salary (and you should if you want to keep them), you’re still saving vast amounts of money.

Unfortunately, it’s possible for a software terrorist to masquerade as an elite programmer, but this disguise is easier to detect. Programmers who insist on working in isolation (especially the ones who come to work at 4:00 p.m. and stay all night), the prima donnas who have fits when they don’t get their way, the programmers who never explain what they’re doing in a way that anyone else can understand and don’t document their code, the ones that reject new technologies or methodologies out of hand rather than showing genuine curiosity—these are the terrorists.

Avoid them no matter how many years of experience they have.

Software terrorism is on the upswing. I used to quote the standard rule that the top 10 percent were 10 times more productive. The hiring practices prevalent since the dot-com explosion—which seem to reject the elite programmers by design—have lowered the general skill level of the profession, however.

As the number of elite programmers gets smaller, their relative productivity gets higher. The only long-term solution to this problem is to change our hiring practices and our attitudes toward training. The cynic in me has a hard time believing that either will happen, but we can always hope for the best.

Chief Programmer Team

In software development, there are many different models in how to organize the team structure. I have read the Chief Programmer Team model in Rapid Software Development and Mythical Man Month and I always wanted to try it. The Chief Programmer Team model is base on the fact that the best programmer is often much more productive than an average programmer. The idea is to amplify the productive of the superstar by organizing the development team around him. The chief programmer is the brain of the team, he architect the code and write the most complex part, leaving all the supporting, secondary or mundane tasks to other team members.

I found this development model works exceptionally well with Indian contractors. I have been auditing their code does not meet our quality. I tried explaining what they need to do in email and over the phone how to fix their code, but somehow they just could not get it exactly right. It is to the point of frustration that it would simply faster for me just fixing their code. However, I only have two hands and I couldn’t not do all the work myself. I decided to try the chief programmer team model. I clean up the structure of their code and take care of make sure all the pieces works coherently. Then I wrote instruction in the code and get my team clean up any syntax error or careless logical error and most important finish the mundane wiring work.

I found I am super productive using this work model. I can focus my mind on solving the big problem and let my guys take care of the boring details. When I am writing and qualify my own code, I can work on one file at a time. Now I can work on 3-4 files at the same time. It is like I have an AI automated code generator or I have a few extra pairs of invisible hands help me type the code. I just specify the flow and structure of the code, jot down some high level instruction and the code is auto generated the next morning. I gave enough information in the code so my guys only need to fill in the blanks like in high school programming assignment. I estimate I am 3-4 times more productive by having 3-4 contractors serves as my remote fingers and low-level brain. If I leave those contractors figuring out the code on their own, they won’t even be half as productive as me.

The only problem of chief programmer team model is hard to implement in a typical N.America work environment where everyone is more or less equal in the hierarchy. No one wants to be the supporting programmer who carry out all the boring grunt work while the chief program gets the fun of creativity and all the glory. Moreover the role of supporting programmer seems like a dead end job with no career perspective, so naturally no one want to stick around doing it. The supporting programmer has to reach at least the basic competency level or he won’t be any use, but at the same time he much not be very competent or he will seek a greener pasture other than working as a supporting programmer. The biggest challenge of the chief programmer team model is find some stable decent supporting programmers.

Big endian vs little endian

The endian is one of the most confusing concept in computer design. I remember it took me so long to remember their difference in year 1 computer course and then I forget which one is which shortly.  The endian answers the question what is the proper byte order inside the computer, to be more specify, which bit should travel first. The big endian thinks the most significant bit should go first, while the little endian believe the least significant bit should go first.

The term big endian and little endian is coined by David Cohen in his legendary paper On Holy Wars and a Plea for Peace. The terms are originated from Swift’s Gulliver’s Travels. In the tale, there are two countries Lilliput and Blefuscu go on war for what is the right way to break an egg, on the big end or on the little end. They are big endian for the former one and little endian for the latter one. In the tale, Swift is satirizing the holy wars between religions with the egg war. Which end to break the egg is such a silly question that people should eat their egg any way they like. However if two computers want to communicate, they have to share the same endian. So, it is some sort of holy war in the computer world, and either side is not going to give up easily.

The little endian have their idea coming from daily language, such as English. We write characters from left to right, start with the first character, second character and so on. Thus, they think it is natural to send the least significant bit first. The big endian are inspired by the mathematicians. In maths, we write from left to right begins with the most significant digit. Each camp has its merits and has been rallying troops for the endian war.  The paper was published in 1981, almost a quarter century ago. Today, the computer world is still split along the line of endian with different protocols, architectures on each camp.  This give us lots of headache when we are building chips to bridge different protocols.

It seems the endian war will never end. As Cohen said, agreement upon an order is more important than the order agree upon. Shall we toss a coin?