Monday, September 14, 2009

Homework 4

1. My original question was "When will a standardized markup language be implemented for data?". This was very ambiguous and was interpreted very differently than what I had envsioned. Some ways it could be seen are "So all data is being marked up and a standard set of rules is in place for this data. It seems highly rigid and unflexible, something difficult to implement and too cubersome to be efficient or useful in forwarding human knowledge." I would reword my question, in that I would specify that not all, every single string of data being made would be marked up. Also, there would be a set of rules to follow, but these would allow the marker to work with them and create his own tags and nesting if he so chooses. Basically, the standard would emcompass pertinent information, that which would be more useful to end-users if it could be easily accessed and searched through by means of an intuitive index. It just seems very interesting to me that a document spanning hundred of pages or maybe a hundred documents 1-2 pages long could be programatically filtered and output desired information, rather than a user having to wade through familiar, irrelevant, or "fluff" information. I can't reword my question, though, without the assumption that the receiver of the question at least has some common sense. E.g. one commented "but stuff written on napkins won't be marked up", that's just silly, of course things written on napkins won't be marked up; only relevant information, that which is important enough to warrant its examination or review, must be marked up or else the process would be counter-intuitive. By counter-intuitive, I mean that marking up trifle would take more time than it would save in the whole scheme, in my opinion at least. Only when computer processing and data accumulation reach a much higher power and the need arises for much greater data accumulation and processing would marking up all information would practical. I hold that explicit is always better than implicit. If everything can be analyzed logically, then there can be drawn more easily and efficiently informational ties and unity. If there is room for error, some backtracking may occur or any amount of backtracking may not solve the problem, leading to human intervention for testing ambiguity: ambiguity only resulting from human implicit thought. However, I think, that by the time computers would be able to mark-up and systematically analyze trife, then they should also be able to think rationally, as a human. If a computer has perfect unity and flow of information, it would only accumulate more and continue to grow and store empirical data.

2. For my project I would obtain a series of unmarked information. I would proceed to mark it up in XML and then parse what information I desired. I could calculate the time it took to have regularly searched the documents and the time it took to parse the documents. I would not include the time it would take to program and markup the information in my stand-alone case. However, I will try to find if this would cause an increase in efficiency of getting the information one needs from the documents. My theory is that it definitely would not be more efficient to program this for the documents if I only had a series spanning only a few pages and if I was the only one viewing the documents. However, I think that the more people who need information from the documents coupled with the increasing size of the information load, would increase efficiency exponentially, reducing time for each of the participants. The next step would be to search for documents that would suite my parsing plan.

No comments: