Monday, December 21, 2015

My best guess as to one major aspect of Supreme Leader Snoke's back story: He is Yoda's (twin?) brother, father, or son

I am guessing that Supreme Leader Snoke (whose image and voice we hear a few times in the new Star Wars flick) is quite likely a close relative of Yoda's, either his (twin?) brother, his father, or his son -- but even if not a relative, surely of the same species. Of course he is not a giant -- all we see of him in this first movie of the new trilogy is a huge holographic projection which gives no sense of his true scale.

The more I consider this possibility, the more I am convinced that this MUST be what the creators of the new trilogy have in mind (or if it isn't what they have in mind, it SHOULD have been what they had in mind).

Further along this line of thought: Is Maz Kanata also of the same species (perhaps also related to Yoda and his evil relative), and will she play an active counterweight to Snoke in one of the future episodes?

All shall surely be revealed...

Edit (24 hours after original post): Just created a theory posting (linked to this blog posting) on the "Star Wars speculation" subreddit. We'll see whether it generates any comment and further speculation...

Thursday, August 20, 2015

Regarding the demotion of Saint Atticus

A couple of weeks ago I finished up a Harper Lee double-header, starting with a reread of her classic, "To Kill a Mockingbird", followed immediately with a reading of its newly-published sequel, "Go Set a Watchman". Before I began reading the sequel, I set out to studiously avoid exposing myself to any of the myriad of opinion pieces that sprang up just after its publication.

It was obvious from the headlines, though, that some folks were aghast at the prospect of Atticus Finch being presented as a "racist" in the newly published book. "Aghast" may be too weak a word: these people seemed REALLY pissed off! Despite my intentions to avoid them, I confess that some of the click-bait headlines lured me in to skim over the text of a few of the angrier jeremiads. (I figured that the biggest plot twist had already been spoiled for me, and I wondered what everybody was so worked up over.) Those that I skimmed did not seem to be expressing anger at Harper Lee herself; instead their ire was directed at Lee's relatives and publishers, as if they had somehow betrayed her (and all the rest of us) by conspiring to make this work public. One op-ed piece that I read, in the New York Times, seemed to go so far as to suggest that "Go Set a Watchman" was actually written BEFORE "To Kill a Mockingbird" -- that "Watchman" was some kind of shoddy first draft that a sagacious editor encouraged Lee to rework into the modern-day classic that we all know and love.

Well, now that I've read the work, I can unequivocally say that "Go Set a Watchman" could NOT have been written before "To Kill a Mockingbird". Major parts of "Watchman" simply make no sense unless they are PRECEDED by the character development (and the crucial first-person impressions of the young narrator, Scout) in "Mockingbird". But what is much more important to me than debunking conspiracy theories is to get to the question of why there would be such a vitriolic reaction to "Go Set a Watchman", which does indeed present, as its central dilemma, how Scout (now mostly called by her grown-up name of "Jean Louise") deals with the realization that her father is NOT the paragon of absolute virtue she had always taken him for. So as I was reading "Go Set a Watchman", I had two overriding questions at the forefront of my mind: (1) Why did Harper Lee take the story of Scout and Atticus in this direction, and (2) why would people be SOOOO upset about this?

Here's my guess: at some point following the publication and brilliant success of her debut novel, "To Kill a Mockingbird", it occurred to Harper Lee (either via a slow dawning of awareness, or perhaps in some thunderclap of sudden, distraught realization) that her book, and in particular her central character of Atticus Finch (very effectively drilled into all of our noggins by Gregory Peck's iconic portrayal), rather than inspiring people to carry on and fight the great fight to broaden and deepen civil rights for all, instead gave a completely unintended solace to the wrong people, inadvertently inspiring a sense of complacency in the very people she intended to spur to action. Horror of horrors, rather than sounding the bugle and heading into battle, some of Lee's target audience of folks-who-wanna-do-right-by-other-folks instead identified so thoroughly with Saint Atticus, that icon of ultimate understanding and benevolence, that their internal monologues might have gone something like this: "I am just like Atticus Finch. I AM Atticus Finch. I have evolved beyond racism. Within me, the American Civil War (which carried on for far too long after Appomattox) has finally arrive at a brilliant and triumphant denouement. The Great Work is now done."

Is it possible that Harper Lee came to just such a realization, and almost simultaneously came to the realization that somehow "Saint Atticus" must die, and be replaced with just plain old Atticus Finch, a real and imperfect human being who's still got some very hard work to do to truly set things right?

I wonder whether much of the ruckus and consternation regarding the publication of "Go Set a Watchman" is actually due to how closely a lot of us (particularly those of the white, privileged persuasion) identify with Atticus. For me, I see the upsetting of the Atticus apple cart to make perfect literary (and Jungian) sense. It HAD to be that way, and Lee does quite a good job of leading Jean Louise (and us) to the realization of her father's true character, in a completely sensible way!! But others may see the dethroning of Atticus to be an unacceptable last straw, a fiction-based confirmation that the brutal, complacency-shattering true events of our times (from Ferguson, to the tragedy of Sandra Bland, and beyond) offer proof-too-positive that the Civil War that we had hoped was over, still rages on. And that's a very difficult thing to have to wake up to.

Atticus Finch, in "To Kill a Mockingbird", despises individual white racists, referring to them explicitly as "trash"; but that same Atticus Finch staunchly supports institutional racism in "Go Set a Watchman". This SO TOTALLY MAKES SENSE -- it is the only direction that Harper Lee could have reasonably taken the narrative, and it makes this pair of books, with their fuller realization of the story of Scout and Atticus, into a powerful allegory for the challenges of our time.

Saturday, June 13, 2015

A "Hello World" for HBase (a brief tutorial Java program)

As I continue my research and development efforts in the general vicinity of HBase, it's become clear to me how difficult it is to find good introductory resources -- even something as simple as a "Hello World" program -- to serve as the launchpad for a newbie's explorations into the workings of the HBase client API. Not finding any such thing (at least nothing that was usable), I decided to cook up my own.

Without further ado, here it is...

(Use the horizontal scroll bar down at the bottom of the frame, or you might find it easier to view on its original page on Github.)

Friday, May 29, 2015

Chaos Wrangler project, Phase 1: Coming up with a simplified picture of the HBase "Data Model"

Public domain image - Wikipedia, Jason Hise
It's only been a few days since I've launched my "Chaos Wrangler" project -- to see what I can do to help bring HBase metadata into the realm of controllability (or even just plain knowability) for enterprises who are beginning to use HBase as the back-end repository for their "Big Data" projects!

While my mind is already racing far down the road, envisioning what would be required to create both passive and active metadata discovery mechanisms and repositories to plug into HBase, it is vital that I take some time in these early days to be sure that I thoroughly understand the structures of HBase (both physical and conceptual). While I did find a useful resource or two, I found nothing that boiled things down into terms that I felt I could fully internalize, so I decided to trudge through the main reference guide on the Apache HBase site and derive for myself a simplified picture of what the HBase data model consists of and how it works.

As part of this exploratory process, I've begun writing a Java-based "analogy" of the HBase data model. I'm taking the concepts that are presented and discussed in the reference guide (e.g., "table", "column prefix", "cell", etc.) and building them into a small set of interrelated Java classes. This is proving useful for my understanding, and when I publish the code soon to, my hope is that it might help to explain HBase to anybody who has prior understanding of basic Java and RDBMS concepts.

EDIT 2015-06-04: The code snippet mentioned in the preceding sentence is now available on Github

In the comments above my Java code, I began to write a few explanatory paragraphs, which soon included a character-based diagram of my conception of the HBase data model. Here are those comments, including the character-based diagram. The comments stand alone, apart from the Java code, as my first offering to the HBase community from the Chaos Wrangler project. My explanation is provided as a potential complement to more rigorous and comprehensive expositions of the HBase data model.

The following is a Java-oriented analogy to the structures that comprise the HBase data model. It is presented under the assumption that the reader has a basic mastery of both Java and RDBMS concepts.

The task of coming to an understanding of HBase data structures is unfortunately made much more difficult than it otherwise might be, due to the fact that HBase uses the terms TABLE, COLUMN, and ROW, but the structures that are referred to by these names bear little resemblance to their RDBMS namesakes.

In fact, an HBase ROW is completely misnamed to the extent that the common term "row" is usually associated with a one-dimensional container of singular instances of data, whether it be a row in a spreadsheet or a row in an RDBMS table. In stark contrast, an HBase ROW could be conceived of as a container of an array of arrays of arrays!! (No wonder people run into difficulty understanding HBase data structures!)

Here is a conceptual representation of the hierarchy of structures in an HBase table. (Note that all relationships shown below are one-to-many -- e.g., one TABLE contains multiple COLUMN_FAMILY_PREFIX instances, one CELL contains multiple CELL_ENTRY instances, etc.):

A TABLE and its component COLUMN_FAMILY_PREFIXes are immutably defined when the TABLE is created. (While a TABLE could theoretically have an unlimited number of COLUMN_FAMILY_PREFIXes, the official reference says that the physical realities of the HBase architecture enforce a practical maximum of no more than two or three per TABLE!)  All lower-level constructs (from ROW on down) are created and maintained by an application via the HBase "put" method. Thus, crucially, all of these lower-level constructs (including so-called COLUMN_QUALIFIERs) are treated as application-managed DATA, and not as database-managed METADATA!!

There are a number of immediately apparent ramifications of this "shifting" of column-name-maintenance (not to mention column-datatype-maintenance) from the metadata-realm to the data-realm: not only does this potentially upend certain basic assumptions about application development and metadata management, but it could also call for a redefining/realignment of classical roles within IT organizations, which may otherwise be accustomed to an explicit separation of duties between DBAs (who are the traditional custodians of metadata) and developers (who are traditionally responsible for building applications that manipulate only data - not metadata!).

Saturday, May 23, 2015

Yes Mr. Bloor, there IS a Hadoop "metadata mess", but it ain't "waiting in the wings" -- IT'S HERE (and here's what I intend to do about it)

I've spent the last week getting a great introduction to Hadoop technologies by studying the excellent book, HADOOP: THE DEFINITIVE GUIDE, by Tom White. At this point on my trip up the learning curve, I'm both impressed and distressed:

  • impressed by the straightforward architecture Hadoop (and the technology stack built upon it) uses to manage both distributed data storage and data processing, but...
  • distressed by the apparent lack of any means of documenting or even passively discovering fundamental information about the data being stored. 

It boils down to an apparent absence of any technology to serve as either an active or passive metadata repository (data dictionary, to use the old parlance).

Unfortunately, it may well be the case that a majority of the individuals who have bothered to read this far don't even know I'm talking about, perhaps not even being sure what "metadata" means in this context and why anyone would be worried about the lack of a means to manage it. If that's where you find yourself, please take a couple of minutes and read Robin Bloor's concise and clear exposition of the problem in his October 2013 analysis piece "Hadoop: Is There a Metadata Mess Waiting in the Wings?"; then please come on back here and read my remaining few paragraphs -- it's okay, I'll wait.

Given that the majority of my IT career up until 2008 was spent in the creation of tools for metadata extraction, transformation, and mapping between disparate architectures, I am particularly horrified at the state of metadata management in HBase (a distributed database that rides atop Hadoop and HDFS structures, and is somewhat reminiscent of ADABAS in its internal structures). Not only is it apparent that no tools exist to discover and maintain a directory to the fundamental structures of any HBase table (i.e., its columns), but there appears to be no awareness that such a thing is even needed (much less that it is, in my opinion, supremely vital). The only mention of such things that I've been able to find on the Web so far is in a posting to a HortonWorks Q&A page, in which someone very sensibly asks how one might obtain a list of the columns in an HBase table. They are very succinctly informed by one of HortonWork's experts that it simply CANNOT BE DONE!!

Yes, at this point in the evolution of HBase, apparently you can put anything you want to into it (in huge, exceedingly diverse quantities), but when the next person comes along and wants to utilize the data you've put there, you'd better be sure to hand them all those cocktail napkins upon which you jotted down the names and structures of the columns in your HBase tables. Otherwise you may be looking at a future where all those exabytes of data may just have to sit there as folks idly stare at it, scratch their heads, and wonder "what the hell is all this?".

I say all this by way of explanation of my soon-to-commence R&D project: to build a passive data dictionary for HBase which will discover the names (and where feasible, the structures) of the columns in any given HBase table, storing the results in (of course!) an HBase table. My working name of this open-source project is "Chaos Wrangler".

EDIT 2016-07-17: The open-source project, ColumnManager for HBase, is now in beta-02, with binaries and documentation available both on GitHub (where the project is housed) and via the Maven Central Repository!

Wednesday, May 20, 2015


Here is the entry I just submitted to the Khan Academy Talent Search...

What subject do your videos teach, and what level do they target?

Long division, 3-12 math

Video #1

Video #2

Why are you interested in sharing your videos with the world through Khan Academy?

Usually I measure success in teaching on a fairly personal level, when I'm communicating concepts to an individual or a small group, and I can see the glint of recognition in human eyes (sometimes prefaced by the crinkled brow of confusion) right before me. But YouTube and Khan Academy give another way to find success in educating people. While I could just measure it in the raw "number of views" that a tutorial video gets, with YouTube I can also get direct feedback from people, sometimes with students asking for clarification, or requesting that I do another video on another topic that they're learning about. (See the comments section on my "Long Division With a Two-Digit Divisor" video to see what I'm talking about.) My presumption is that getting my content out on Khan Academy as well as YouTube would heighten the possibilities for this kind of interaction.

But here's what I really LOVE about this medium of learning: People come to get it when THEY need it, at the precise moment that they WANT to. Hungry, self-directed minds like these are the most open and receptive to a concise and intriguing explanation that can quickly lead to that "a-ha, now I get it" moment!

Is there anything else you'd like us to know?

The first video that I'm submitting, "Long Division With a Two Digit Divisor," was literally made for a single classroom-full of Grade 4 students that I was then working with. I posted it publicly on YouTube because I figured it might be of benefit to somebody else besides my students. After it started getting lots of views I decided to make two "prequels" explaining the basics of long division (one of which is my second submitted video, "Why Long Division Works").

In further-flung-yet-curiously-related matters:
Edit 2015-05-26:
Here's a cocktail napkin sketch of some ideas for projects I'd like to work on in the future (with folks like those at Khan Academy, or on my own) --
  • Numeracy project = learning resources for self-tutoring in numeracy skills -- geared toward adults who are at the very lower end of the numeracy spectrum. Would need to be built with an eye to overcoming (or steering around) the fears that drive the innumerate to remain innumerate. (We dare not underestimate the overwhelming, self-defeating power of the "I'm no good at math" mantra.) Goal: elimination of innumeracy in the adult population.
  • Literacy project = same as numeracy project, but with focus on literacy.
  • Object Oriented version of MIT’s Scratch = for learners of all ages, a game-building mechanism like MIT’s Scratch, but one that is built upon valid Object-Oriented (OO) principles and which has a direct path into working with genuine, mature programming languages in a real IDE (integrated development environment). ScratchOO must get junior programmers working with valid OO concepts from day one of playing in the environment. Given that the path leads toward mature programming in a real IDE, perhaps the learning environment should be available as a plug-in to an IDE (like Eclipse or NetBeans). If, for example, ScratchOO were built on top of Java, then all the ScratchOO tools and widgets would be built in Java. The beginner would work with a simple set of these tools to build games, but as they advance, then can get into the actual Java code that lies "behind" the simple tools they've been working with, to customize and extend them, and become fully competent Java programmers.

Thursday, May 14, 2015

Two new open source Java projects launched on GitHub & Sourceforge

In the world of Software Engineering, I've heard it said that "code is the new resume". 

If that's so, then I've just burnished up two fresh software engineering "CVs" and posted them to GitHub and SourceForge.

IndexedCollection: A Java class library which extends the standard Java collections framework to provide the IndexedCollection class, which offers simple, automatic, in-memory NoSQL composite-indexing of a standard Collection of objects. In many situations it could be a nice alternative (or a simpler complement) to complex ORM implementations, making for higher efficiency and lower TCO in the development and maintenance of Java applications. 
Gist (simple usage example code): 
Side note -- I've been trying to benchmark my IndexedCollection against another package called CQEngine, but unfortunately that other package's documentation is so sparse, I can't even figure out how to use it for the simplest of queries. The sample code that is published for it won't run against the current release, and both the explanatory text on Google Code and its Javadocs documentation are also out-of-sync with the actual code! So, from a usability perspective, the project seems to have been abandoned, even though the package itself was updated as recently as early 2014. Oh well...

LibriVox Explorer: From a technical point of view, this Windows/Mac/Linux desktop application makes extensive use of the above-mentioned IndexedCollection Java class library, providing the best currently-available example of how IndexedCollections are intended to work. From the end-user viewpoint, LibriVox Explorer is a new way to experience the LibriVox collection of Public Domain audiobooks, and it takes full advantage of the "eye candy" potential of that collection's vast body of intriguing cover-art.
Download it now, and take it for a spin on your desktop: