विक्शनरी चर्चा:समावेश निकष
विषय जोडाFrom Talk:Youve
[संपादन]Do we really want articles on spelling mistakes? This exists under the correct spelling, you've so I say it's time to delete. — Hippietrail 01:56, 27 Jun 2004 (UTC)
- It's not always a spelling mistake; sometimes it's intentional. Oscar Wilde apparently insisted on spelling it this way... in school we studied The Importance of Being Earnest written this way, complete with the altogether interesting form havnt—I am almost sure there was a note by him before the text explaining this very usage, but I can't find it online and my copy of the book is not in this town.
- Anyway, I think they should stay, but marked as "nonstandard"—they may be POVially regarded as mistakes, but they have been let through by editors into published works, so I think these words (youve and youre) satisfy Criteria for inclusion. —Muke Tever 02:37, 27 Jun 2004 (UTC)
- I've deleted it, along with "youre", "youll" and "youd". We can't be listing every misspelling or we won't be able to see the wool for the trees. Restore them if you wish, with discussion and disclaimers along the lines of what you have said. — Paul G 17:32, 23 Jul 2004 (UTC)
- PS - it's interesting that these were in but the correct spellings were not.
Invented language inclusion
[संपादन]Clearly all natural languages belong here, but I wonder about invented languages. The "criteria for inclusion" words seem to be focused on the idea of people submitting bogus words... Some invented languages are common, mature, and well-recognized enough to belong here, like Esperanto or Interlingua or Klingon, but a line should probably be drawn somewhere: to take an extreme case, a hoaxlang like E, aka E-Prime clearly doesn't belong. What about Tolkien's Nevbosh (famous conlanger, minor language), or Toki Pona (small community, has a Wikipedia), Brithenig (well-known conlang, spawned many imitators), or Atlantic? (That last one's mine. I don't intend to add it.)
Possibly the language as a whole should conform to one of the criteria here, although the last one ("three independently recorded instances") would be way too lax. Limit to conlangs appearing in published works? or what? Would I be justified in adding Nalian words, from The Edifice, for example?
Would the line different between artlangs and auxlangs? —Muke Tever 04:34, 29 May 2004 (UTC)
- well, we could just apply the criteria for inclusion to the un-natural languages like anything else (two published sources within a year, not counting dictionaries and similar). That would cut out a lot of the languages. But I'm not really sure. --Eean 07:28, 11 Dec 2004 (UTC)
Attributive sense?
[संपादन]Can someone clarify this?
- "Proper names may be included if... the name is used in an attributive sense."
I assume this means something like Lou Gehrig's disease (even though that should really redirect to amyotrophic lateral sclerosis), but in that case the entry isn't just a name. Under what conditions is an entry that's just a name (e.g., Homer) acceptable? - dcljr 05:54, 12 Apr 2005 (UTC)
- The name Shakespeare appears in Wiktionary all over the place as the author of a quotation. Those quotations are attributed to Shakespeare. Lou Gehrig's disease is probably OK, (as a proper noun of a specific disorder) while Lou Gehrig and Gehrig would probably get {{rfd}}'ed, and articles that link to them would be redirected to Wikipedia. --Connel MacKenzie 07:30, 12 Apr 2005 (UTC)
- That's not what's meant here. A noun in general is used attributively if it's used to modify another noun. For example, coffee is attributive in coffee cup. I believe the idea here is that a proper noun appearing attributively implies that it's so well known that it's become part of the general lexicon. For example, if someone says That was a David Beckham hairstyle. we know that ... well, actually we don't know precisely what that might be, but we know it's something eye-cacthing. This is really just a guideline. I can just as well say That was a Boog Highberger hairstyle, but that doesn't necessarily mean that Boog merits his own entry (but he might :-). -dmh 18:34, 12 Apr 2005 (UTC)
- That's an issue as well, but I think the idea intended is yet another interpretation of "attributive", namely things like using "Einstein" to refer to a smart person (or with irony, a stupid one) in general. (Or maybe this is just a subclass of dmh's example, except he means proper names used as modifiers, while I mean them used as substantives . . . )
- However I would ALSO submit (er, if I havnt already) that proper names be addable if they are subject to translation. For example the Greco-Roman hero Aeneas is the same in English and Latin, but in Spanish it's Eneas and in French, Énée. —Muke Tever 15:49, 13 Apr 2005 (UTC)
- Sounds good to me (on both counts). I'm not sure why proper names should be treated any differently from the rest of the lexicon, anyway. -dmh 04:37, 14 Apr 2005 (UTC)
- That's not what's meant here. A noun in general is used attributively if it's used to modify another noun. For example, coffee is attributive in coffee cup. I believe the idea here is that a proper noun appearing attributively implies that it's so well known that it's become part of the general lexicon. For example, if someone says That was a David Beckham hairstyle. we know that ... well, actually we don't know precisely what that might be, but we know it's something eye-cacthing. This is really just a guideline. I can just as well say That was a Boog Highberger hairstyle, but that doesn't necessarily mean that Boog merits his own entry (but he might :-). -dmh 18:34, 12 Apr 2005 (UTC)
- The name Shakespeare appears in Wiktionary all over the place as the author of a quotation. Those quotations are attributed to Shakespeare. Lou Gehrig's disease is probably OK, (as a proper noun of a specific disorder) while Lou Gehrig and Gehrig would probably get {{rfd}}'ed, and articles that link to them would be redirected to Wikipedia. --Connel MacKenzie 07:30, 12 Apr 2005 (UTC)
- They aren't. Take the infamous refresher course in Main Page (☺) and you'll note that we have two appendices, one for given names (Wiktionary Appendix:First names) and one for family names (Wiktionary Appendix:Surnames). The difference between us and the encyclopaedia isn't that we don't take names. It is that our article on a name isn't about a person with that name, but is about the name itself. Our article on Beckham won't tell us about footballers, as w:Beckham would, and our article on David won't tell us about statues, as w:David would; but our articles will tell us the etymologies of the names, their pronunciations, their translations, and their actual meanings (if they have any). The "attributive sense" description is, as a matter of fact, too narrow for what we actually do, and it is a point that I've long thought of bringing up. We aren't a genealogy database any more than Wikipedia is, but we are about words, and names are words. Uncle G 02:13, 21 Apr 2005 (UTC)
- Oddly enough, I was already aware of the lists of names. It seems to me like one of many examples of Wiktionary, given its nature, approaching a given topic from multiple angles. I wouldn't really infer much about inclusion criteria from the existence of an index, except that people would like to include the terms listed in the index. Note that given names and surnames are not inherently proper nouns, though they may be so used (e.g., Thatcher and Madonna — and there's a pair to draw to). Noting that David is derived from Hebrew דוד and that Beckham means (I'm guessing here) "village by the brook" is interesting and useful. So is noting that David Beckham is the name of an internationally famous footballer, with a link to Wikipedia for more detail.
- Personally, I'm not convinced that names (in either sense) need to be subject to any special criteria at all. -dmh 16:31, 21 Apr 2005 (UTC)
- Re-reading, I think I both mistook Uncle G's meaning and responded unclearly. We're probably in closer agreement than it may seem. I completely agree that the entries for David and Beckham should just talk about those words per se and link to Wikipedia for further info. In particular, our entries should not contain lists of famous Davids and such. This is in keeping with our treatments of words in general. We include them if they're properly attested and limit the definition to the word per se. E.g., there isn't and shouldn't be a list of famous oak trees under oak.
- As for proper names like David Beckham, we should include them under the same criteria as other words: They need to be sufficiently attested and the meaning should be non-obivous. The implications of these basic general principles are a bit different for proper names. I would argue that endless examples of usages like "David Beckham scored a brilliant goal in the 90th minute." aren't sufficient reason for a wiktionary entry (while they are reason enough for a Wikipedia entry). The key question is whether the name is being used in a sense other than the obvious one of "the person named ...". A good example would be the early 20th-century usage of "Mae West" for a life jacket.
- A borderline case would be the convention found in sports writing (and elsewhere) of using a a person's name to refer to that person's well-known attributes. For example, "This team needs a Michael Jordan, not a Shaquille O'Neal," meaning (roughly) "This team needs someone with MJ's skills and abilities, not Shaq's." On the one hand, one could use this construction with absolutely anyone, but on the other hand, it only tends to be used with well-known names. On the balance, I'd tend not to count such usages.
- I would, however, count any reference to David Beckham as a usage of both David and Beckham, for purposes of attestation.
- I can't quite articulate the general princples that account for everything I just said, but I think they're there, they're pretty clear cut, and they're not too far from the more or less de facto standard of "independent uses in running text with non-obvious meaning." -dmh 17:36, 22 Apr 2005 (UTC)
- Actually, it may be necessary to differentiate names by person, at least in some cases. One reason is that translating names used to be much more common than it is now: many historical figures have names in many different languages, but modern figures generally don't undergo anything more drastic than transliteration, if even that. For a concrete example, Homer the poet is Homère in French, but Homer the Simpson is still Homer. (For kings and popes the tradition still seems to be to translate; see, e.g., the list at it:Papa Benedetto XVI). —Muke Tever 02:26, 22 Apr 2005 (UTC)
- The more I think about this, the more I think there's not really any need for rules like "used attributively". Further, I don't think that that particular rule is useful even if we did need such rules in general.
- For example, "New York" is used in idioms like New York minute, New York pizza, New York bagel and so forth. Each of these is its own idiom. I can know quite a bit about New York without knowing what makes a New York bagel a New York bagel. The unit "New York" itself doesn't convey anything more than something like "associated with New York". To make an analogy, linguists don't consider "cran" to be a proper English morpheme, even though "cranberry" is clearly a compound with "berry" (as it happens, cranberries are crane berries just as gooseberries are goose berries). Similarly, the existence of "New York minute" doesn't argue for (or against) "New York" on its own.
- What does argue for New York is that some English speaker unfamiliar with the United States might run across "New York" and want to know what it was the name of. Interestingly, this argues more strongly for Hoboken, Healdsburg and Hovenweep than it does for "New York" since we might convince ourselves (most likely incorrectly) that everyone knows what "New York" means.
- The interesting thing about names is that it is generally clear from context that they are just arbitrary names. If I say "This is my good friend Chris." It's clear (even in speech) that Chris is a name, and one does not need to look in a dictionary to know what "Chris" means.
- In short, there's at least an argument to be made that Chris need not be in Wiktionary, but Chris Noth probably should.
- On the other hand, while one might not need to look up the meaning of Chris, one might very well want to look up its etymology, usual translation into other langauges and (in the case of more unusual names) pronunciation. This is clearly dictionary material, and CFI as it stands essentially says as much. This is also a good rationale for including the (relatively small) class of "phrasebook" entries that would not be included for other reasons. So perhaps we should expand the Prime Directive a bit to encompass more than just meaning. -dmh 21:18, 6 October 2005 (UTC)
- The actual state of the dictionary is at odds with the "attributive sense" entry for place names that is currently in the CFI. A search for "place names" http://en.wiktionary.org/wiki/Special:Search?search=place+names yields 1811 results. A look into place names in Florida shows 9 entries, only two of which Miami and Naples have attributive value. Similarly, Appendix:Place names in Ohio has 13 entries, only two of which, Lima and maybe Minerva (although that is also a name), have attributive value. It seems like we cannot have it both ways. Either 1) we allow non-attributive place names & change the CFI or 2) delete the current thousands of non-attributive place name entries that are there. I would argue for 1). Brholden 21:15, 29 June 2006 (UTC)
- In my view, the most important reasons for listing place names and personal names are the often fascinating etymology and the way the pronunciation has changed over the years. To take three examples:
- Rotherhithe [from OE hryther hyth cattle wharf] was a village on the Thames, now subsumed as a part of south-east London. Leading to the centre of Rotherhithe is Redriff Road. Redriff is merely a phonetic spelling of Rotherhithe according to the pronunciation in the 17th c, when the road was built.
- Also in the 17th c, Merton, [from OE mere tun farmstead by the pool] now a London borough, was sometimes called Marten. In the 20th c, an actor, Paul Martin, unable to register his name because it had already been used by another actor, decided on a stage name of Paul Merton because he had been brought up there, presumably unaware of its earlier spelling.
- Igornay is a French village from where a large number of Huguenots emigrated to escape persecution by the Catholic church. The Huguenots were noted as weavers, and many Huguenots finding refuge abroad [they are said to be the original refugées] lived by that trade. Sigornay was a surname indicating an origin in Igornay, but amongst those families who settled in New Holland [roughly now New York] it became spelt Sigourney and also became a Christian name. One of F Scott Fitzgerald's teachers was (ironically) a Jesuit priest, Monsignor Sigourney Fay. FSF appears to have named a female character in The Great Gatsby after him (Mrs Sigourney Howard). A girl, Susan Alexandra Weaver, read the book, and liked the name so much that at the age of 14 she started using it herself. Whether she was aware how appropriate it was to her surname, I do not know.
- I believe we should include such names even where they are not used attributively, because they are interesting words and people like me want to look them up to find their etymology and what they originally meant. I would be happy with an etymology, a pronunciation, and a one line definition with a link to Wikipedia where appropriate. Attribution may be one thing that can make a name interesting, but it certainly isn't the only one, or even the most common. --Enginear 13:26, 24 August 2006 (UTC)
- In my view, the most important reasons for listing place names and personal names are the often fascinating etymology and the way the pronunciation has changed over the years. To take three examples:
- Here's an example. A recent SI online article on the Sox/Sox series reads "I can't say we're happy about the situation," said center fielder Johnny Damon, who seems to have been in this same sorry boat more times than Phil Connors ran into Ned Ryerson on the streets of Punxsutawney. "But we'll be all right. I think we have a good-enough team to win."
- This doesn't support adding Johnny Damon, since he's clearly (from this sentence and the article as a whole) the Boston center fielder. But who are Phil Connors and Ned Ryerson, and where is Punxsutawney. The article itself explains this obliquely in the next paragraph: The Red Sox, of course, have been in these sad straits before, way too many times. More often, in fact, than you see lame references to Groundhog Day on the sports pages. but even then it's not crystal clear that Phil and Ned are characters in the film. To know that there's a connection, you have to know that Punxsutawney is associated with the American minor holiday Groundhog Day.
- What of this to include? Punxsutawney? Punxsutawney Phil? Most likely. Phil Connors and Ned Ryerson, probably not, at least not based on this. If they're ever mentioned outside the context of the film, then yes, but they're not so mentioned here. By contrast, Yoda and Clark Kent are definitely part of the lexicon, even if they are entirely fictional, as are Winston Churchill or Mickey Mantle. Note that even though Estee Lauder may be part of the lexicon, Josephine Esther Mentzer isn't. Both names should still be in Wikipedia, of course. -dmh 22:27, 6 October 2005 (UTC)
urbandictionary
[संपादन]Just outta curiousity, is there a page on here with some kinda reference to Urbandictionary?, cos thats got loads more protologisms and slang terms on. And could be a rival of some sort for wiktionary. Something like a wiktionary:how wiktionary is different to urbandictionary, with a welcome message for any urbandictionarians we could intice to contribute here instead of there. If it doesnt exist, I'll try to rustle up a semi-decent sort of welcome page on a subpage of my user page. --Wonderfool 09:10, 20 Apr 2005 (UTC)
- This is contentious. Much of the content of Urbandictionary consists of invented vanity terms that sound amusing but have little currency. It is not the aim of Wiktionary to include such terms, as far as I am aware. — Paul G 09:25, 21 Apr 2005 (UTC)
- As Paul points out, many of the terms on Urbandictionary fail the Wiktionary:Criteria for inclusion because they're not attested. An appearance in Urbandictionary is sometimes a good clue, but it's not enough on its own. Note that an appearance in Urbandictionary that just defines a term without using it isn't even a valid citation for purposes of the criteria for inclusion, not because it appears in Urbandictionary but because it's not used to convey meaning.
- As to competition, I think Wiktionary and Urbandictionary are playing in different spaces. Wiktionary aims to be comprehensive, Urbandictionary aims to be up-to-the-minute (and sometimes even ahead :-) in the field of slang. The two processes are not even that similar. E.g., there is no voting procedure in Wiktionary. -dmh June 29, 2005 16:55 (UTC)
- Half the point of Urabndictionary is its humourous content. --Cammoore 09:44, 15 August 2005 (UTC)
"running text"
[संपादन]This new notion of "running text" is about as well-defined as Neurocam is, at the moment. I predict that you'll have a hard time pinning it down, too. I think that it would be better to have criteria that actually say what it is desired to say directly, rather than use a new piece of jargon which then has to be defined. Perhaps this should be expressed in terms of context. Uncle G 03:47, 21 Apr 2005 (UTC)
- The term may be new, but the notion can't possibly be. The distinction here is between text like:
- I realized I needed to glork my transmission to get the car to run.
- and non-uses like
- No one knows what "glork" means.
- ... glamor glass gleam glimmer glork glory gloss ...
- A more subtle example is borrowings from other languages or even from English-based argots like the infamous leet. We choose to exclude leet as a whole (a choice I strongly support), but we do include a few leet-isms like w00t and pr0n exactly because they have been used in plain English text with no other leet-isms around. Further, we include only the few spellings (out of the many possible) that actually turn up in English contexts.
- Suggestions for better terminology are always welcome, but the distinction needs to be made in any case. -dmh 16:31, 21 Apr 2005 (UTC)
- "Running text" is a new notion? Google doesnt seem to think so. I've used the same criterion for marking neologisms in the Latin wiktionary (which in practice doesn't mean protologisms, but mostly Latinizations of proper names.) IME "running text" means exactly what dmh shows: use of the word in ... well, running text, as opposed to "the word glork"-type things (in addition, I would exclude words only appearing as foreign words in bilingual glossaries: if glorque only appeared in English-French glossaries written by English-speakers as the french for glork it wouldn't make it valid, unless it's actually French; for example, the whole of la Francophonie may be writing glorc...) —Muke Tever 02:44, 22 Apr 2005 (UTC)
I've recast the text here a bit. The phrase "properly formed and grammatical" is redundant with "ordinary" and liable to be interpreted narrowly, excluding perfectly good examples that happen to use a "non-standard" dialect or use some construction that a critic finds objectionable. I had added a pointed example to push at this, but I've removed it in the interest of decreasing the heat/light ratio.
The phrase "in a context that exemplifies its meaning" is unclear. There is a potential open issue here, namely whether it should be possible to discern at least a rough meaning from the context of the word. Since we're trying to establish that the term is used, I don't see how this is necessary. Suppose I find these three independent uses of a term:
- I was very sad that I lost my fleargle.
- This fleargle was unlike any I'd ever seen before.
- I had just acquired a fleargle, a small rodent with shiny fur.
Each of these is valid evidence that the word fleargle is used and expected to be understood, but only one gives any real clue as to the meaning. The new text clearly includes all three, while the old text might have excluded the first two. On the other hand, the first two might be using the word in a different sense, in which case the attestation of the rodent sense is called into question. I would say in such cases that there should be an entry for the word, but the rodent sense should be noted as possibly not well-attested. We know something's going on, and should record that, but we don't know precisely what's going on (and should note that, too). Fortunately, such borderline cases don't seem to come up much. -dmh 04:11, 30 May 2005 (UTC)
- I've completely removed the section, preferring instead to maintain a link to the ordinary meaning of the term. Although the term is well-known I was surprised to find only one other source that defined it. I'll keep my eye open for others.
- The removed material, "In the above, in running text means in ordinary sentences in which the meaning of the term must be known in order to understand the overall meaning." was serving to make the matter unclear. Dmh's "fleargle" rationalization is singularly unhelpful. The citations, if they exist at all, are obviously evidences that the word is used, but the first two alone are not strong enough evidence to support inclusion in a dictionary. In the absence of the third example, what does the word mean? Without any other evidence I'm drawn to conclude the a "fleargle" (I can at least say it's a noun) is a nonce word invented by the author because he didn't have any other word to put in that context. Are we really going to include every whimsical construction that comes along? Eclecticology 19:17, 31 May 2005 (UTC)
- There is indeed some ambiguity as to whether we're supporting terms or senses. But if we required individual senses to be attested, we'd throw out even more entries than if we uniformly required terms to be attested as a whole. Since all we're really trying to do is to build a prima facie case that a term is worth introducing into the Wiki process in the hopes of eventually developing a complete entry, three independent attestations, even if we're not sure what senses are in use, seems reasonable.
- Naturally, we would like to exclude one-time flights of fancy, but those are generally easy to discern from context. Assuming one actually goes so far as to look at the surrounding context. -dmh 20:54, 6 October 2005 (UTC)
Inflections
[संपादन]I don't see where WT:ELE says that we should include entries for regular inflections. It does say that we should include spellings for "inflections if any, particularly if these are irregular, or prone to other uncertainties auch as whether consonants should be doubled.". All this is saying is that words like target (verb form) should note the spelling of targeting and targeted (which is about 50 times as common as targetted), and that an irregular like goose should definitely note the spelling of geese.
This is all goodness. The text quoted above is a bit vague about spelling out completely regular forms. I'd prefer to see it sharpened, but I don't think there's a consensus in practice on whether to include them or not.
What the text does not say, as far as I can tell, is that there should be entries for regular forms. As a rule, I see no particular point in adding entries for walks, walked etc. that have only the obvious meaning (as opposed to walker, for instance, which should be defined). They might help the occasional user in finding a term, but IMHO we have better things to do with our collective time right now. I'd rather define related terms like widow's walk, perp walk or walk the talk than make sure every single regular inflection has its own entry redirecting back to the root.
Particularly useless is wikifying the spellings of regular inflections when they just redirect back to the root. -dmh 16:31, 21 Apr 2005 (UTC)
- Hello! This is the first well-stated objection to the practice of wikifying links that I've noticed. I strongly disagree with the notion that redirects are useless. External links, internal wikified links and searches all benefit from large numbers of redirects. (Dislaimer: this was not my idea, I'm just one of the more obvious people entering lots of redirects recently.) As others have said (elsewhere) the practice of replacing content with a redirect should be avoided fiercely (except perhaps in the case of vandalism.) In general, things should got the other way: replacing redirects with stub articles (when needed.) The premise of Wiktionary is "all words..." after all. Even without considering the benefits I mentioned before, that premise should be adequate justification for a whole lot more redirect entries than currently exist.
- I would also like to note that I did ask for comments before blasting redirects all over the place. I heard no objections at the time (the comments I got were mostly ambivalent: others wouldn't bother entering them, but had no objection to their presence.)
- I feel I should also point out the genesis of the redirection practice. This came about from discussions about including all senses of a word under a single headword article (like other dictionaries {shudder} do.) (That was not my idea either.) As that was shot down, adding redirects was suggested as an partial alternate approach.
- As a side note though, I had the intent of practicing writing a 'bot for the task of entering either the redirects or the stub articles. I immediately found that percentage-wise, very few articles (about five months ago) had the other senses indicated (wikified or bolded.) One beneficial side effect of the "redirect syndrome" is that many words now are getting these entered, for some future 'bot to deal with, perhaps. Also, the format of the other senses is becoming standardized, particularly with the recent addition of the inflection templates.
- --Connel MacKenzie 02:24, 20 May 2005 (UTC)
- Let me be a bit clearer about my objections, which are actually fairly narrow:
- I do object to a wiki link to something that just points back to where you linked from. It violates the "principle of least surprise" in that a link implies new information behind it.
- I don't object to adding entries for regular inflections. I'm not going to spend any time doing it by hand, or writing a bot to do it for me. I'd also prefer to see effort invested in making the search function smarter, but that's a different topic.
- I don't have a strong opinion on what form an entry for a regular inflection should take, whether a redirect (essentially making the search function smarter one special case at a time) or as an entry detailing what the derivation is. If the latter, we should use a template both for consistency and to ensure that, e.g., singing is listed as both progressive and gerund.
- I hope that clarifies the original comment. -dmh 04:26, 26 May 2005 (UTC)
'unverified'
[संपादन]Connel MacKenzie (in an edit comment) asks: "How is a /. comment page peer reviewed or subjected to editorial verification?"
- This is a dictionary, not an encyclopedia. Our job is to describe language as people use it. People do not require their words to be peer reviewed to engage in conversation. "Editorial verification" can't mean anything other than the imposition of an editor's POV, which is against the spirit of Wikimedia projects. We can't call any human's spelling, grammar, or usage right or wrong (which would be POV), though we can label it standard or nonstandard. —Muke Tever 23:54, 19 May 2005 (UTC)
This seems as good a place as any for you to question me on that comment, Muke. So I shall explain. The context of what I was doing pertains to a very specific change. My comment about editorial review is my comment, my view on what the old meaning used to imply. Oddly, you didn't quote your own verbose edit line comment. Perhaps that would help you to grok the context I was speaking in.
The old wording covered matierials basically that one could check out from a local public library. I somewhat agree with the assessment that that is a decent place in the sand to draw the line.
While Wiki* sites strive to be completely NPOV (Wikipedia much more so than here, based on many past discussions that I've read here) there has to be some point at which one cuts the cruft away. Just as we don't allow submissions of random keyboard pounding, we also have some obligation to not go overboard with it. Everyone contributing here is an editor. :-) And everyone keeping an eye on recent changes is contributing to the overall editorial review process. That is the very heart of Wiki, not the antithisis!
BTW, people do expect what they say to be editorially reviewed - if they wish to be understood. If they are talking merely to execise their jaw muscles, then perhaps, they might not desire much review of their output. But that would not be "engaging in conversation." :-)
Describing what a word means, or is accepted to mean by many people is hard to determine. Using previous attempts at just that (i.e. the entire body of published material available in the world today) seems like a fine place to draw the line. The Internet taken as a whole, has proven itself unreliable on many occasions. Technicalities cause weird terms (e.g. grok) to appear. Gradually, such terms enter the "mainstream" and are accepted as "valid" neologisms. But that is a very slow process, and most of those transient terms fade away.
What all that means to me, is that we should not accept absurd made-up words in Wiktionary. That seems to be the consensus around here (both before I got here, and now.) I'm sorry that you do not agree. --Connel MacKenzie 01:25, 20 May 2005 (UTC)
- I entirely agree. I threw out the old version of forno, for example. And I haven't defended any word that I don't personally recognize as a word. It just seems that my threshold is rather lower (or my netslang vocabulary rather more extensive) than yours. I don't agree that the "library line" is a good one, but if necessary, note smiley, I shall enter into talks with Google to produce The Big Book of the Internet, Volumes 1–∞, coming soon to a library near you. ;)
- As for NPOV, it is a wiktionary policy to hold it, however it's just that for most words, it doesn't really come naturally to apply a POV. There are a few words with controversial definitions out there (check the history of marriage, say); there are perhaps some imported etymologies from old sources that are a little free with terms like 'corruption of'; but that's about it.
- As for words entering the "mainstream" and becoming valid neologisms, I don't agree with that, and I think for a very important reason: many, many words, even in meatspace never achieve mainstream status. Around the rise of the English language many words were imported from Latin; these inkhorn terms were opaque to people who were not well versed in the classical language. The first English dictionaries were invented for words like these: their main focus was on the hard words of English, the unusual ones that people were not likely to know, not the mainstream, well-known ones—which is a focus that came later with the broadening of the audience of dictionaries to include those who are learning English. —Muke Tever 05:02, 20 May 2005 (UTC)
- Leet/netslang has been horrifically beaten down here at Wiktionary, repeatedly. I am sorry if I implied that that is my threshold; it is not. I have always had the impression that the Wiktionary goal was to have a respectable reference, not a free-for-all. --Connel MacKenzie 06:35, 20 May 2005 (UTC)
- Leet is not the same as netslang. Nobody has nor should have been been rejecting things such as IMHO, meatspace, LOL, interweb, BSOD, teh, fap, asshat... And it's entirely possible to have a respectable reference about things that aren't very respectable in themselves. It's not a free-for-all, as we do have lines drawn. —Muke Tever 14:25, 20 May 2005 (UTC)
- Yes, I agree; thanks for the correction/clarification. Your last sentence summarizes the core idea I was trying to convey; one person's arbirtrary "weakening" of the established guidelines would make it a free-for-all. --Connel MacKenzie 14:44, 20 May 2005 (UTC)
- Leet is not the same as netslang. Nobody has nor should have been been rejecting things such as IMHO, meatspace, LOL, interweb, BSOD, teh, fap, asshat... And it's entirely possible to have a respectable reference about things that aren't very respectable in themselves. It's not a free-for-all, as we do have lines drawn. —Muke Tever 14:25, 20 May 2005 (UTC)
Verifiability is certainly an important criterion. A lot of what we now have for definitions seems to be completely invented. We need to put more emphasis on citing sources as a means of developing our credibility. Leet and a lot of the other barbaric internet jargon that has been appearing should be more severely controlled. Perhaps it should all be kept on a Wiktionary:Internet jargon page in a manner similar to what was done with protologisms. Eclecticology 08:52, 23 May 2005 (UTC)
- Perhaps. I'm not sure I'd point to Wiktionary:List of protologisms as any kind of an exemplary model though. It is a pickle.
- It's a whole barrel of pickles. :-) -Ec
- The {{protologism}} and {{neolog}} templates (not sure of the spelling of the latter) might be a better approach/model. The What links here feature can be used to group such words (even if a category is not added to the templates) and the load on sysops for the maintenance of the deletion log might be significantly reduced.
- The problem with letting a lot of these things their own articles is that it gives them more credibility than they probably deserve. That just encourages more of them. Eclecticology 21:10, 25 May 2005 (UTC)
- Hmm. I've never been one for the "Barbarians at the Gate" theory of netslang. If a (presumably small) online community like alt.squirrelporn uses a bunch of specialized terms in its procedings, we don't really need to include any of it, any more than we need to include specialized terms used informally within Bill Goddard's reasearch group at Caltech.
- My repeated experience with terms people like to object to is that they're all too often quite easy to track down and assign at least rough definitions. This would include a couple (teh and asshat) on the list above that "no one has or should have been objecting to", but which have in fact produced objections. The problem is that they are often contributed in garbled and even ungrammatical form by anonymous parties who come and go in the night, and this lends a certain air of illegitimacy to what are in fact perfectly valid terms.
- Editorial review seems like a non-issue to me. I try to approach Wiktionary from a linguistic, almost anthropological view, and I'm as much interested in colloquial speech as in the written word. Colloquial speech, by definition, is not formally reviewed. On the other hand, formal review by an editor is a good (but not perfect) indicator that a term is widely understood in the sense in which the author is using it. It's certainly not hard to turn up various flavors of tripe in editorially reviewed text. To me, the question of whether an editor liked a particular usage is less interesting and significant than whether people use a given term consistently in a given way.
- For that matter, editorial tastes change, capriciously. The recent famous example is tidal wave vs. tsunami. Up until the recent tragedy, either term was acceptable, and indeed we see the BBC using tidal wave to describe the Boxing Day event. However, tidal wave is now fairly widely considered "incorrect" by editors, evidently for the completely spurious reason that tidal waves in the usual sense are not caused by the gravitational influence of the moon. All this happened in a matter of weeks, giving the lie to the notion that linguistic change is necessarily a slow and gradual process.
- While I'm on the topic of the speed of change, I'd like to take issue with the idea that adoption of a narrowly used term into the mainstream is a slow and gradual process. While a term can indeed linger for years in narrow usage, the transition from narrow to broad usage can be amazingly quick, thanks at least in part to the mass media. Again, tsunami would be something of a case in point, though not the most dramatic. I would expect that usage of the term increased by orders of magnitude in a matter of days as the story was picked up worldwide. This is not to say that tsunami was too narrowly used beforehand to merit inclusion, only that the process of adoption can move very quickly and most likely this has little to do with how long a term has been in narrow use.
- Meanwhile, back at Criterria for Inclusion, I don't have any problem relying solely on the internet for attestations, as long as it's clear that the term is used widely and consistenly enough that a speaker might expect it to be understood by a complete stranger. This is what the independence criterion is all about. By the way, I see that this criterion has been considerably expanded and improved since I last saw it. Thanks! -dmh 05:08, 26 May 2005 (UTC)
- Ah. Looking through the change log I now see what the original controversy was about. I'm not sure "(un)verified" is a good distinction to make. If someone posts a random comment on slashdot, it's quite verifiable that they did so, and if someone challenged me, I would consider a link to slashdot's archives sufficient verification.
- But this is a red herring. What we're trying to verify is not that someone used the term, but that the term was in sufficiently wide use that someone could use it in a widely-read forum like slashdot and reasonably expect to be understood. And even this is not quite enough. I could make up a word like slashdotifiability and use it in a random post and expect to be understood, even if no one else had ever used the word.
- Which brings us back to independence. We're really trying to establish something more like a reasonable certainty that separate communities of speakers (including purely written usage as "speech" here) use a term consistently and without knowledge of each other. This is obviously a hard notion to pin down, which is why this page is so long, but as far as I can tell it's not particularly relevant whether usage was in a published work, in someone's living room, on the internet, or someplace else. The internet is just easier to access online.
- I'm reasonably happy with the current formulation, that discourages but does not outright prohibit relying on chat rooms, blogs and email. I'm not so sure I'd include blogs, though. The main problem with chat rooms and email is that they're often limited to a closed community, and so are not strong indicators of independent usage. Blogs tend to be intended for public consumption. If someone uses a term on a blog without ever defining it, that strongly suggests that the term in wider use. OTOH, if a given term turns up, used in the same sense, in both a scrapbooking chatroom and an ice hockey chatroom and is clearly understood in both, that seems pretty indicative to me.
- The one thing I do get worked up about is the notion that speech in a chatroom is somehow less deserving of study, per se. Granted, the subject matter of many such venues is less than edifying, but we're doing lexicography here, not literary criticism. -dmh 05:33, 26 May 2005 (UTC)
- [Your score has gone up by ten points.] —Muke Tever 00:49, 27 May 2005 (UTC)
Wikifying terms we define specially here
[संपादन]I think I understand the idea behind wikifying attest and idiomatic in the "general guideline", but I don't think it helps. The current definition of "attest", in particular is fairly general (and maybe a bit musty), and while the 4th sense agrees with what we say here, it doesn't really add anything.
Given that we go to great pains to explain just what we mean in the article itself, linking to more general definitions doesn't seem particularly useful. Conversely, I don't see any need to try to pull the material on the page into the definitions, since it just amplifies the usual definitions (i.e., we're addressing "attested in what way?" and "how do you tell if a sense is idiomatic?" -dmh 19:12, 27 May 2005 (UTC)
- This is a good point. I do think that some of the definitions at attest are not accurate and that that article is in need of cleanup. (For example an attestation to a will is not done by the person whose will is under consideration, but by the person who witnesses his signature on the will.) I think that any definition of a word like that that we use on some other page may expand on the word, and how it can be3 applied to a particular environment, but it must not contradict the normal usage of the word. Eclecticology 07:35, 28 May 2005 (UTC)
Protologisms
[संपादन]In a tasty bit of irony, the meaning of the term "protologism" seems to be mutating in real time.
The original definition, itself protologistic, was aimed at cases where a particular person perceives a gap in the lexicon and invents a word to fill it. However, a second sense is evolving, namely a narrow usage that particular parties are trying to convince others to use more widely.
Both senses are noted in the entry for protologism, and both are in active use within Wiktionary. Notably, Wiktionary:List of Protologisms uses the first sense, while several entries on RFD use the second sense. To whatever extent they are trying to promote usage of this sense, the second sense is an example of itself, but I digress.
We should try to be clear what sense we use, or more preferably, use separate words for the two senses. Personally, I would prefer to see protologism restricted to the narrower first sense, where it is clear that one person has created a word and a definition together, the existing term neologism be used for terms which are clearly new and not yet in wide use, and perhaps "specialized term" for terms which are long-standing, but only within a narrow community.
It's a separate discussion under what circumstances we admit any of these. My personal opinion is that
- Protologisms belong on the protologism page until such time as they can be shown to have made it into wider use.
- Neologisms be admitted fairly liberally, and objectionable ones be marked plain rfd and not rfdProto.
- Specialized terms be admitted unless there is very clear reason not to. For example, protologism only appears to be used within Wiktionary, but it's been used for quite a while now.
-dmh 04:32, 30 May 2005 (UTC)
All your instruction are belong to this page
[संपादन]Sorry to be critical, but wow... talk about instruction-creep! This article has grown by 600% since I doubled its size back in April. I think it's actually too long to be useful anymore. I can well imagine most newbies seeing the table of contents and just giving up... (I haven't even read the whole thing yet myself.) - dcljr 4 July 2005 08:46 (UTC)
- Point taken, but perhaps it would help if we made it clearer that the page consists of two parts:
- "As a general guideline, a term should be included if it's likely that someone would run across it and want to know what it meant."
- A detailed gloss on that.
- I think that's really all the page is, and I don't think the general guideline is particularly intimidating. To the extent that the rest is useful in sorting out particular cases (and maybe it isn't, based on some of the RFD traffic), I think it's worthwhile. -dmh 7 July 2005 03:10 (UTC)
Spellings
[संपादन]Now that we've had the requisite edit skirmish (which I'll take the blame for starting), it's time to talk about how to handle variant spellings.
I have two objections to the current text:
- The notion that a misspelling can be more common than a correct spelling.
- The removal of the numeric rules of thumb, which were clearly marked as such and are based on actual observation (though not a detailed and rigorous study).
I'm also keen to avoid the usual prescriptivist quagmire of endless squabbles over whose notion of "correct" is correct. I chose dette/debt as a somewhat extreme example, but one that seems perfectly defensible in a world where prevalent spellings can be wrong and history and etymology are to be given weight over usage. Consider that:
- The word was originally borrowed from French dette, which is correctly spelled by French rules, which rules are still generally followed to this day for French borrowings (e.g., laundrette, baguette etc.). One could argue that the French dropped the ball here by removing the Latin "b", but c'est la vie.
- English has never pronounced a "b" in the word, and there is no general English rule for silent "b". This is in contrast with cases like initial "kn", and "wr", which reflect the Old English (and I think even Middle English) pronunciation.
- The spelling change can be traced to a particular source and date, before which the natural spelling "dette" was accepted evidently without controversy.
- This change was based on an arbitrary decision to bring the English spelling closer to the Latin root and not on any practical concern. By this reasoning, one might as well insist on spelling the word "debitum" while continuing to pronounce it as "dette".
In short, there is a reasonable case to be made that "dette" is the etymologically and historically correct spelling while "debt" is the aberration. The only real reason for choosing "debt" as the correct spelling is that it is the one overwhelmingly used in all but perhaps the earliest Modern English texts. This is good enough for me, but evidently this is a dangerously descriptivist attitude, liable to lead to "correct" spellings mistakenly being labeled "incorrect" and vice versa based solely on a hundredfold or so difference in prevalence.
This strikes me as yet another non-problem to be solved by adopting a nebulous and subjective notion of "correctness" over easily measurable empirical guidelines. -dmh 7 July 2005 03:58 (UTC)
- I've moved your latest argumentative addition here to the talk page:
- An interesting case is souped-up. The etymologically correct spelling is clearly suped-up, but souped-up is overwhelmingly common (about 10:1). Common sense suggests that suped-up cannot be a misspelling. On the other hand, calling souped-up a misspelling is wishful thinking. This is very similar to the debate over the meanings of hacker, and the solution is analogous: list the two spellings as alternates, possibly with a note on the relative prevalences and a note that some will take offense to the spelling souped-up.
- I don't see who's arguing with you about "souped-up". Your analysis about that is essentially correct. The "hacker" debate had nothing to do with the spelling. In any event the "policy" is about setting soft guidelines, not about arguing over questionable specifics.
Did I say that the hacker debate had anything to do with spelling? The point is that in both cases, a vocal minority considers the great majority of usage to be "incorrect". Endorsing either view would be POV. Instead, we note the state of affairs.
- Measurable guidelines don't exist without data. What would be the source of your data for making such measurements? I remain open to the possibility that a misspelling is more common than a correct form, but I'm not expecting that it will be a fruitful criterion for adding things to Wiktionary.
In cases where there are enough usages even to talk about prevalent spellings — and I'm thinking tens of thousands, at least — googling is enough for our purposes. Unlike the case for attestation and deriving meaning, we're just looking for orders of magnitude or at best factors of two or three. This doesn't capture regionality, which I leave as an easy exercise for the reader.
- There is no value to an obsession over dette or debt. Your obsolete form, "dette" is mentioned in the 1913 Webster with a reference to Chaucer. "Laundrette" and "baguette" have no relevance because the "-ette" there is a diminutive suffix. I can't see the point about the lack of an English rule. It's all very simple; all silend letters are not pronounced. This point is not particularly subtle, why doubt it? Let's not get stuck in bdellium over it. Eclecticology July 8, 2005 07:05 (UTC)
Here's what I (Aleph 1.0) say:
If it's in the most recent version of Merriam-Webster's Unabridged Dictionary, 3rd New International Version, it's correct. 72.197.201.129 04:37, 16 May 2006 (UTC)
- Well, outside of that we're not Merriam-Webster's Unabridged 3rd NIV... "correct" spellings are easy to recognize. Whether a spelling is incorrect is hard to substantiate, given that multiple spellings can have acceptance (ax/axe, color/colour, griffin/gryphon) and dictionaries, especially those written for a particular region's POV, don't always list them all. Besides, the dictionary you mention only has, what, 450K words? and only English ones too, I hear—"international" indeed. —Muke Tever 00:46, 17 May 2006 (UTC)
Misspelled words.
[संपादन]Is there any Wiktionary policy on misspelled words? I just added the misspelling "seperate" to separate. Many dictionaries have a chart of frequently misspelled words; I was unable to find anything like that here. Should users just add misspelled words to the definition pages for their correct spellings? Put in redirects? 66.114.70.80
- I think there is some merit in your idea, but personally I'd rather have those misspellings which are not words in their own right show up as redlinks when they are entered as links in other entries, to make it more likely that the person making the entry or someone checking up on it will notice the misspelling. I'd rather see a list of commonly misspelled words, something that would fit well in the appendix. Gene Nygaard 5 July 2005 05:40 (UTC)
- There is a semi-official policy in Wiktionary:Criteria for inclusion. The policy is that common misspellings should be included and labeled as such. See torroid for example. Having them present as entries removes any doubt as to what's going on. If I look up torroid and it says "Common misspelling of toroid" I know exactly what's happening. I don't have to double-check on a separate page, and I don't have to wonder if it's just a word no one has entered yet. It would be nice to have a category of common misspellings, to provide a list as well. -dmh 6 July 2005 18:50 (UTC)
- I really like the notion of a category for misspellings, (common, rare or disputed.) --Connel MacKenzie 6 July 2005 19:16 (UTC) Even better, perhaps, would be a category of their corrected spellings: Category:Commonly misspelt words. --Connel MacKenzie 6 July 2005 19:43 (UTC)
- Agreement with dmh. If someone types in "torroid," and the page "toroid" comes up with no notice of the misspelling, there is a pretty good chance the user won't notice his miskate, and will continue using "torroid." Then again, just having nothing come up could lead to pages where a misspelled word is given the definition of the proper spelling (furthering confusion). Zachol
The Formatting section suggests the following for the definition of a misspelled word:
- # misspelling of [[...]]
That seems like a confusion of use vs. mention. Following seems better:
- # misspelling of ''[[...]]''
Is that more correct or am I unaware of some dictionary convention? Rodasmith 21:19, 23 January 2006 (UTC)
My critique
[संपादन]I really have other things to do, but since I've taken the time to read it, I'll opine for a minute or three:
- Under "Attestation" - change a.k.a. to aka
- Under "Vandalism" - remove "(generally within minutes)" Consider including at the end of this paragraph something along the lines of: If you think you have found an article that has been vandalised, please note that with {rfd}; that should bring it to the attention of the administrators faster.
- Under "Misspellings, . . ." - in the second to last sentence, change "English" to "British" for clarity. Rationale: In the previous sections we've been using "English" to refer to the language, then all of a sudden you use "English" to refer to "British"-type spellings; I believe that can cause unnecessary confusion.
- Under "Inflections" - I thought the current practice was to add common inflections. Kindly correct me if I'm wrong.
- Under "Names of actual people . . ." - I have to disagree with two examples given:
- Empire State Building - I think this should be included as a dictionary entry because I don't think it's immediately obvious that New York's motto is "The Empire State" from which this building derives its name.
- Thomas Jefferson - I think this should be included because this is the foundation of "Jeffersonian".
I hope this has been beneficial.
Cheers,
--Stranger 02:22, 12 September 2005 (UTC)
- I can't agree about "a.k.a." It is normally pronounced as an initialism; removing the periods would give a contrary impression.
- Nothing in the criteria forbids inflections; we only discourage them as useless. If you want to add them, it's your time.
- If it is clear that Thomas Jefferson refers to a specific historical individual. In the etymology for Jeffersonian it should be sufficient to have something like "derived from w:Thomas Jefferson (17??-1826)" The years are usefule for giving a time frame to the word.
- But be sure to hide the wikimachinery: "derived from Thomas Jefferson...." - dcljr 23:26, 6 October 2005 (UTC)
Comment removed from article
[संपादन]This is an HTML comment I'm moving from the Attestation section of the article. Don't ask me what it means... - dcljr 19:04, 30 November 2005 (UTC)
- <!-- We might want to recommend adding an entry on .../Citations with as much information as is known, on the assumption that Wiktionary will be around as long as Wiktionary is around -->
Widespreadness
[संपादन]Is there any place where appropriateness for inclusion is discussed beyond widespread, which is quoted here? It seems pretty obvious to me in the extreme cases, such that a word used only in British English, would obviously be included, and the usage of knock up as 'have sex with' in South Ajax, rather than the standard "make pregnant" used elsewhere would be excluded, but where jam buster is used by ~1 million English speakers, and maybe recognised by a few million more, is that genuinely 'widespread'? I can't seem to find anywhere here where people even attempt to address this issue. Wilyd 16:59, 12 January 2006 (UTC)
- Where in the world did you get the ~1 million figure for jam buster? - dcljr 02:21, 16 February 2006 (UTC)
- Um, that entire section is about defining what we mean by it. Jam buster would not meet our criteria because you won't find A) three independent citations that B) use the term in running text C) spanning a year. In general, we use books.google.com/print.google.com as a front-line check against protologisms. The search results of for jam buster indicate proper names and strange word combinations, but I didn't see one running text use of the two words together. --Connel MacKenzie T C 00:27, 21 February 2006 (UTC)
- Besides books.google.com I can highly recommend Amazon. Use the SIPs feature from one book, then modify the URL to find whatever word or phrase you want in many books. — Hippietrail 17:19, 22 February 2006 (UTC)
- Absolutely. In fact, I think there may be some risk from relying on books.google.com too much. These (and others) are wonderful resources, but dependence on one or the other should be avoided. --Connel MacKenzie T C 17:25, 22 February 2006 (UTC)
Proper Names that are subject to Translation
[संपादन]There is a category of proper names (place names, given names etc) that are subject to translation. I believe these should be included in Wiktionary.
For example, I think we need the words London and Londres, Munich and Munchen, even if neither are used in an attributive way. Also Peking and Beijing and any other forms in the Chinese form.
Also, names such as John, Jacques, Giovanni etc, which are all different language forms of the same name (I think). Probably each would qualify anyway.
Also names of stars named differently in different languages.
In fact, any proper name which has a commonly used translation or other language form should be eligible for inclusion, to fulfil the Wiktionary role in translation.
This would allow more proper names than just the attributive use criteria.--Richardb 07:56, 26 February 2006 (UTC)
CFI not universally applicable - Protologisms, WikiSaurus, concordnances etc
[संपादन]CFI not universally applicable - Protologisms, WikiSaurus, concordnances etcThere are many lists in Wiktionary, of varying purpose. Many of these contain words which do not necessarily meet the CFI as currently proposed. I strongly suggest that it would be wrong to apply the CFI to words in lists such as
- the list of Protologisms
- concordances
- WikiSaurus word lists.
By their very nature, these lists do not carry verfification, nor definition. The lists would mostly be wiped out if the CFI were applied to words in them.--Richardb 10:59, 26 February 2006 (UTC)
- Seems reasonable but I've thought a few times about putting a reference to the CFI in the Wikisaurus entries to give guidance on people considering turning the red links blue. (might have been done I've not looked for a couple of months). MGSpiller 01:34, 28 February 2006 (UTC)
- I've started a policy page Wiktionary:WikiSaurus criteria which addresses the criteria (and action) for words in WikiSaurus. ITt could probably do with some expansion to cover whether words should be linked or not.--Richardb 01:47, 11 May 2006 (UTC)
ISO code criterion
[संपादन]On the project page:
- If the language lacks an ISO 639 language code, it is almost surely not acceptable.
Problem with this:
Hundreds of extinct languages do not have ISO 639 codes and probably will never have them. This criterion, as presently stated, does not seem consistent with the following statement:
- As an international dictionary, Wiktionary is intended to include "all words in all languages".
Perhaps this is good for restricting constructed languages, but it doesnt seem good for natural languages.
- The importance for having this is to avoid treating local dialects as a separate language when the proponents insist that it is a separate language. The word "almost" is also there when someone can make a good case. Eclecticology 00:54, 2 March 2006 (UTC)
Formatting?
[संपादन]Why is formatting (incorrectly) described for a misspelling redirect? Doesn't that belong in WT:ELE or somewhere else? --Connel MacKenzie T C 18:00, 4 March 2006 (UTC)
Pawley list
[संपादन]- from User:Muke's comments in Wiktionary talk:Idioms...
Anyway: “The object is to describe what it takes to use a language properly as a member of society. Part of this is knowing what things to say, when to say them and how to say them in conventional ways. [...] Instead of striving to keep the lexicon small we need to enrich it. In fact we apply the terms ‘lexicon’, ‘lexeme’ (or ‘lexical item’) and ‘lexicalized’ in ways quite different from the grammarian. Now these terms are defined with respect to cultural facts as well as with respect to purely structural criteria. Complex words and compounds, and perhaps phrases, are considered part of the speaker's cultural lexicon if we can show that they have entered the social tradition, that they have attained the status of social institutions, being recognized as conventional ‘names of things’, as ‘terms’ in a set or terminology, as ‘set phrases’, and perhaps as ‘appropriate things to say’. All grammatical strings are not socially equal. We award special status to those strings that are culturally significant, even though they may also be perfectly grammatical. The upshot is an enormous increase in the number of lexemes compared to the ideal grammarian’s dictionary.” Andrew Pawley, as quoted in Making Dictionaries
In the same source is quoted his list of criteria for lexeme/headworthiness, which I have beforehand shared with the IRC channel:
- The naming test: Can the candidate for a lexeme be referred to in questions or statements such as the following: ‘What is it called?’ ‘It is called X.’ ‘We call it X, but they call it Y.’
- Membership in a terminological system: [...] Does X encompass other terms; can one say ‘it (dog) is a kind of X (animal)’ (=generic)? Is it a member of a set of similar things; can one say ‘X (a chair) is a kind of Y (furniture)’ (=specific)? Can it be used to show contrast; ‘is it a kind of X (fruit), but not a Y (vegetable)’? Does it have synonyms or antonyms?
- Customary status: Does the use of the phrase imply certain behavior patterns, values, or sequences of activities that are known by society at large? They represent conventionalized knowledge. For example, expected behavior at the front door is different from at the back door (besides their participation in idioms), indicating that these function as cultural units (lexemes) that are more significant than the sum of the parts. Consider go to the mosque, get off work, take a vacation.
- Legal status: Some phrases have such status that they are codified in legal usage: driving under the influence, breaking and entering, assault and battery, justifiable homicide. Even so-called ‘primitive’ societies with unwritten languages have categories of this sort for dealing with things like marriage negotiations and litigations over land, property, and adultery.
- Speech act formulas: Every language has some formulas “which carry out conversational moves” (Pawley 1986:106). For example, excuse me, how are you, y'all have a nice day, etc.
- Use of acronyms: This is often proof that a multi-word phrase represents concepts that have attained conventionalized or institutionalized status. Consider: VIP, DWI/DUI, IQ, RBI, SAT, ASAP, PTO, PTL, AWOL, BS, RSVP, R and R; in Indonesia: KB, DKI, KK, ABRI, DPRD, GBHN, etc.
- Single-word synonyms: the only one of its kind ↔ unique.
- Belonging to a terminological set: This is similar to (2), but focuses more on a pair of antonyms. Consider: tell the truth ↔ tell a lie, take care of ↔ neglect.
- Base for inflected or derived forms: short temper → short-tempered; ooh and ah → oohing and ahing, Indonesian ke mana → dikemanakannya (‘to where’ → ‘wind up where’).
- Internal pause unacceptable: The unacceptability of inserting a pause in the middle of clichés, idioms, and compounds is partial indication of their functioning as a unit. Consider the functional differences between bunch of baloney vs. bunch of bananas. One can say two bunches of bananas, but cannot do the same with the figurative sense of bunch of baloney.
- Inseparability of constituents: Insertion of other material changes the unity or naturalness of a phrasal lexeme. Consider: lead up the garden path. Saying lead up the beautiful garden path shifts it from a figurative to a literal interpretation. This is similar to (10) above.
- Ambiguity as to whether it should be written as a single word: whatchamacallit, thingamajig, man-in-the-street, oneupmanship.
- Conventionally reduced pronunciation: bosun (boatswain), won't, can't, o'clock, Newfoundland, Christmas, Worcestershire, thruppence (threepence) etc.
- Conventionally truncated forms: Widespread occurrence of shortened forms often indicate their role as a lexeme in the language: exam(ination), rad(ical), ex-con(vict), con(vict), con(fidence man), con(fidence trick), ex(-husband/-wife), pro and con, etc.
- Omission of headword: The modifier stands metonymically for the whole: She had an oral (examination), He had a physical (examination), A short (circuit) cut off the (electrical) power.
- Omission of final constituents: This often implies conventionalized knowledge: If you can’t beat ’em..., A stitch in time..., I haven’t the faintest (idea). These elided forms are often marked by peculiar intonation.
- Stress and intonation patterns: Different languages give different phonological clues for what is seen to function as a unit. English often uses stress and intonation. Government jargon is often coined through these means. Consider political matters memorandum.
- Invariable constituents or grammatical frame: The demanding and rhetorical Who do you think you are? does not have the same impact in the future. Kick the bucket does not mean the same when put in the passive. The thought had crossed my mind, and he took the law into his own hands are unnatural in the passive. Compare also stripped down formulaic sentences easier said than done, spoken like a man! There are also syntactically irregular or archaic idioms like easy does it, no go, no way, be that as it may, (she) wants in, once upon a time.
- Use of definite article on first mention: In English this can indicate the conventionalized nature of the ‘object’, showing the speaker assumes the identity is understood by the addressee: the fire department, the foreign legion, the eight ball.
- Writing conventions: Where there is a written tradition these may provide clues to perceived status as a unit. Capitals may indicate lexemes that are not typical proper nouns: Third World, Big Bang, Inner City. Beware that where a society has the luxury of supporting a literary community, some writers manipulate the use of capitals for unconventional purposes. Quotation marks may also indicate unitary status: he was considered a ‘bad boy’. Orally, some speakers use so-called or a preceding pause to mark an equivalent to quote marks.
- Unpredictability of form-meaning relation in semantic idioms: kick the bucket, chew the fat, shoot the breeze.
- Arbitrary selection of one meaning: Notice that button hole is a hole FOR putting buttons THROUGH, whereas bullet hole is a hole MADE BY bullets, post hole is a hole FOR setting posts IN, etc.
- Use in ritual language of parallelism: This is a special case of (2) and (8). Ritual language in parallelisms is widespread. It is found, for example, in Biblical Hebrew and many Austronesian languages, particularly in eastern Indonesia (Fox 1988). Existence as a paired entity in this context is sufficient for justifying its status as a conventionalized unit, and hence a lexeme.
Comments
[संपादन]I believe we should be honoring each of these as CFI. At some time in the past, I had reservations about a few of these, but not anymore. --Connel MacKenzie T C 01:40, 26 March 2006 (UTC)
- Perhaps I'm misinterpreting the CFI, but I don't see a difference between that, or at least the de facto CFI interpretation, and the list above, save #7. [18:18, 19 April 2006 (UTC)]
And #3.Davilla 19:43, 19 April 2006 (UTC)
- I split this into a separate section for editing ease. --Connel MacKenzie T C 18:35, 19 April 2006 (UTC)
- I think most of these, with the exception of idioms, come under regular attack, especially #1, #2, #3, #6, #8, #10, #15, #17, #19, #20 and #22. In the past, there was considerably more resistance to keeping terms containing a space in the headword. --Connel MacKenzie T C 18:35, 19 April 2006 (UTC)
- I don't believe #15 has come under attack because the list only requires the definition of e.g. oral as a shortened form of "oral examination". By my understanding it does not require that the latter, full form as an entry. If I am misreading, and this should be investigated, then you are correct that the CFI's idiomacy requirement does not match.
- Reconsidering my own thoughts (again), Pawley must have meant oral examination even though it isn't clearly worded as such. Then you are right, these have not generally been considered idiomatic, and their inclusion would require a change in policy or maybe just thinking. It doesn't seem like it would be too difficult to get folks to support at this stage. Davilla 16:25, 24 April 2006 (UTC)
- The problem with #3 is that the phrases can be altered, so this sort of information would more likely end up at a shorter phrase as an example, or usage note, or definition of its own (as in get off for "get off work"). I don't agree that take a vacation is the proper place because of all the different words that could be and often are inserted: "take a long vacation", "take many vacations", "take a flight cross country on business and before returning a convenient pleasure vacation". In most cases "take a vacation" is not going to be the search term unless someone already knew that take a vacation was the correct idiomatic phrase to look any of these up. My objections aren't strong, but that's the way I see it. That vacations can be "taken" is the information which needs to somehow be conveyed.
- Thinking about this a bit more I'm starting to agree with you that phrases like take a vacation are legitimately idiomatic and should be included. This reasoning comes about from considering other phrases that even include "one" or "someone" as placeholders, chosen as the best titles for entries. Thus I would not be surprised if they have come under attack. I am quite curious to know how well the list above matches currest practice. Davilla 17:27, 20 April 2006 (UTC)
- Back to the point, the majority of these I still think do match the CFI. That the others you've mentioned often come under attack is just a result of their nature in being somewhat borderline. For as many of these that fall under no.'s 1, 8, 10, 17, 19 and 20 specifically, there are a good number of similar phrases that clearly would not. Most multiple-word phrases brought to RfD, aside from the clearly vandalous, are ruled as keeps I think. You said some of the legitimate ones were turned down in the past. I'd like to know if any recent deletions have fallen under the above criteria. As far as I've looked, only War in Iraq seems to counter my claim, although I'm hoping perhaps it won't be deleted in the end. Does that match your analysis? Davilla 19:43, 19 April 2006 (UTC)
- I don't believe #15 has come under attack because the list only requires the definition of e.g. oral as a shortened form of "oral examination". By my understanding it does not require that the latter, full form as an entry. If I am misreading, and this should be investigated, then you are correct that the CFI's idiomacy requirement does not match.
Would argue that this Crtieria is very variably applied.
[संपादन]For example, an appearance in someone's online dictionary is suggestive, but it does not show the word actually used to convey meaning.
This critieria is applied at times to exclude words that someone does not like. Yet other words. such as medusetl, are accepted, though the only online reference that can be found is entry in some dictionary.
My contention is that existence in some recognised dictionary is sufficient for a word to be accepted into Wiktionary. The only discussion really is what is a "recognised dictionary". There are some very obvious candidates, such as the OED.
I believe the Criteria should be amended to reflect this, as this is in reality current practice amongst most of us.--Richardb 01:55, 11 May 2006 (UTC)
- Actually when words get sent to RFV, anyone who attempts to defend it by citing a dictionary is quite shouted down. As for medusetl, it does currently have a cite from a well-known work, so... —Muke Tever 21:49, 11 May 2006 (UTC)
- I've suggested before that inclusion in a dictionary, in fact any dictionary so long as it's in print and not one of these "urban" collections online, should at least qualify it for additional time in the RfV process. I would also be willing to define a "recognised" dictionary, one which would automatically permit a term here, as one for which the criteria for inclusion are strictly stronger than our own. Davilla 16:45, 12 May 2006 (UTC)
- I'm positive this was one of our original criteria, back before we had a criteria page. I've also seen it shouted down - an attitude I dislike. I know at least the OED has this criteria, even if they can't find any other sources, in which case their citation will list the dictionary it's in, or even something like "in various dictionaries" on occasion. — Hippietrail 18:52, 12 May 2006 (UTC)
- Davilla, I would love to hear your definition of a "recognised" dictionary, replete with an initial list of dictionaries. I took a couple stabs at doing that and was shouted down as they say. --Connel MacKenzie T C 22:18, 12 May 2006 (UTC)
- I started to compile one some time ago along with contact details. I'm sure I had another one somewhere without the contacts but including a lot more dictionary, specifically the Gage Candadian, and a good few non-English dictionaries too. — Hippietrail 22:25, 12 May 2006 (UTC)
Encyclopedic entries, Names that have words derived from them
[संपादन]These are two types of entries which people have been arguing in favour of keeping recently even though they do not currently meet the CFI.
- Should we add something to say "prominent people like Abraham Lincoln qualify for an entry"?
- Should we add "because there is a hairstyle named after the Beatles, 'the Beatles' also qualifies for an entry"?
I think #1 is based on whether or not we have decided to become an encyclopedic dictionary or not. If we have, we have to decide on which encyclopedic entries to include: people, places, more? It is my firm opinion that a) this must be voted on before adding, b) encyclopedic articles must be marked as such, probably a category is sufficient. Personally I'm not in favour but I'll accept the popular vote.
As for #2, I see no basis whatsoever. Where does this line of reasoning come from? It's certainly not the practice of a single dictionary I can think of and I don't think "Wiktionary is not paper" is the explanation for that either. Why are the etymology sections not enough? In all the cases I can imagine these are covered by #1 as encyclopedic entries anyway so if though I also recommend a vote on #2, a vote to accept #1 would go a long way toward #2 also.
Thoughts? — Hippietrail 19:33, 12 May 2006 (UTC)
- With respect to your first point, I contend that Abraham Lincoln should have an entry because he is a reference point - a symbol by which you can identify someone else's characteristics. Same goes for Einstein, Hitler, Mother Teresa, Cassanova, Lothario, etc. It is, however, a small list, since the usage must be attested. BD2412 T 20:22, 12 May 2006 (UTC)
- But on what basis does "being a reference point" bring a term into a dictionary rather than another kind of reference work such as an encyclopedia? What is to be gained from it? Are there any dictionaries you can think of that practice this? If not, why should we pioneer it (besides not being paper)? Also more importantly which tests would you apply to show that a proposed attestation shows use as a reference point? — Hippietrail 20:50, 12 May 2006 (UTC)
- Imagine a foreigner to our tongue reading a newspaper that refers to so-and-so as being an Abraham Lincoln (or, more likely an Abe Lincoln) - he will need to look up the term to see what that means - is it a good thing or bad? Looking up an encyclopedia article on Lincoln might help, but such an extensive coverage will reveal many characteristics - physically fit, honest, witty, brooding, conflicted, lionized, assassinated - any of which could be the one the term denotes. But we can explain that to call someone an Abraham Lincoln is to say that they are honest, indeed completely forthright. Similarly, to say that someone is a Mother Theresa is to say that they are saintly, not that they are old or strict in their views, or even a woman or a Catholic. BD2412 T 21:52, 12 May 2006 (UTC)
- But on what basis does "being a reference point" bring a term into a dictionary rather than another kind of reference work such as an encyclopedia? What is to be gained from it? Are there any dictionaries you can think of that practice this? If not, why should we pioneer it (besides not being paper)? Also more importantly which tests would you apply to show that a proposed attestation shows use as a reference point? — Hippietrail 20:50, 12 May 2006 (UTC)
- The examples above are funny because there have been many people named Einstein, Hitler, etc. If we're going to include encyclopedic terms, then shouldn't they be under Albert Einstein and Adolf Hitler, just like Abraham Lincoln? And would it be more appropriate to list "John D. Rockefeller", as he's known, or his full name "John Davison Rockefeller, Sr."? And why not redirect the other, since it's the same person, and also Abe Lincoln, etc.? And what if several people have the same name, like John Thompson? Why not have a disambiguation page, for surnames especially, in case someone is looking for a different person by the same name? In other words, why not just leave encyclopedic entries up to the frickin' encyclopedia in the first place!?! Davilla 05:44, 13 May 2006 (UTC)
- Um, so if someone were to say to you, "Great idea, Einstein", or "you pick up the tab, Rockafeller", you'd need a disambiguation page to figure out which Einstein/Rockafeller they were talking about? The dictionary should only list the meaning attested to be associated with the name, and should only identify the person used as a reference in the etymology. And last I heard, there is no attested phrase along the lines of "that guy's a regular John Thompson!" BD2412 T 15:56, 13 May 2006 (UTC)
- That's a TOTALLY different case!! The entry for Einstein as Albert Einstein is NOT encyclopedic because it refers to a specific Einstein. The word is a monicker that can easily be cited out of context,
as in the example you gave. I wrote the entry for Rockefeller, by the way. Davilla 18:27, 13 May 2006 (UTC)
- That's a TOTALLY different case!! The entry for Einstein as Albert Einstein is NOT encyclopedic because it refers to a specific Einstein. The word is a monicker that can easily be cited out of context,
- Um, so if someone were to say to you, "Great idea, Einstein", or "you pick up the tab, Rockafeller", you'd need a disambiguation page to figure out which Einstein/Rockafeller they were talking about? The dictionary should only list the meaning attested to be associated with the name, and should only identify the person used as a reference in the etymology. And last I heard, there is no attested phrase along the lines of "that guy's a regular John Thompson!" BD2412 T 15:56, 13 May 2006 (UTC)
- The examples above are funny because there have been many people named Einstein, Hitler, etc. If we're going to include encyclopedic terms, then shouldn't they be under Albert Einstein and Adolf Hitler, just like Abraham Lincoln? And would it be more appropriate to list "John D. Rockefeller", as he's known, or his full name "John Davison Rockefeller, Sr."? And why not redirect the other, since it's the same person, and also Abe Lincoln, etc.? And what if several people have the same name, like John Thompson? Why not have a disambiguation page, for surnames especially, in case someone is looking for a different person by the same name? In other words, why not just leave encyclopedic entries up to the frickin' encyclopedia in the first place!?! Davilla 05:44, 13 May 2006 (UTC)
- In other words, terms that convey idiomatic meanings, like Einstein, should have entries like Einstein currently has, and not like Rockefeller. The Beatles, on the other hand, have nothing idiomatic, and thereby do not really meet any criterion, right? —Vildricianus 20:15, 20 May 2006 (UTC) 20:15, 20 May 2006 (UTC)
- That's not my opinion. There are ways to justify all three that do not also permit encyclopedic titles like Albert Einstein. Edit: Certainly idiomatic is one good idea. Davilla 17:50, 26 May 2006 (UTC)
Tests for multiple-word entries
[संपादन]- See: Wiktionary:List of idioms that survived RFD DAVilla 09:29, 31 December 2006 (UTC)
After reviewing the Pawley list I've realized that the items aren't crieteria so much as clues that an expression is a lexeme. Of course it's multiple-word entries that are of the most interest to us. Essentially, a term is acceptable if it is considered to be a logical unit, especially if it fails sum-of-parts, cannot be altered, or is used differently than the norms of the language would otherwise dicatate. I've reconstructed a few tests from this list, but except as a starting point I'm not sure if relying on Pawley is the right way to proceed. Which tests we use should result from the tests developed during debates, as approved by the community. Since the list is disjunctive rather than conjunctive, a legitimate test cannot include any terms that should clearly be excluded. I think that's the best gauge we have so far for evaluating these. But then the enumerated list could never be considered complete, in the sense that new tests could be added when terms generally supported by the community are found not to fall under any accepted rule. For instance, I'm not sure if any of these allow for empty space.
From the Pawley list
[संपादन]As this is meant to be a starting point, I've only selected tests that I'm certain will pass consensus. In fact, that's exactly where I'd hope this is headed, which is why I've omitted some ideas that have potential. Please do not fault me on incompleteness. As stated above, the list will always be incomplete, but can be extended when the need arises.
1. The fancy dress test.
- Terms that are not understood in a different dialect although all constituents are understood.
3. The fried egg test.
- Terms that imply certain social knowledge that could not be derived from any of the constituents, nor from their combination.
4. The prior knowledge test.
- Terms that have a specific technical meaning in a certain field.
5. The never mind test.
- Terms that are used to structure conversation.
10,12. The in between test.
- Terms that are tightly bound, in which a pause cannot be inserted, or for which concatenation seems natural, if not standard.
12. The all right test.
Terms for which there is even the question as to the legitimacy of concatenation.
16. The easier said test.
- Terms whose final constituents are omitted, implying conventional knowledge.
17. The rocky chair (or pet name?) test.
- Terms signified as logical units by unusual patterns of stress or intonation.
18. The "mind was crossed" test.
- Terms that cannot be rewritten in certain grammatical frames.
18. The once upon a time test.
- Terms that are irregular or archaic syntactically.
22. The Egyptian pyramid test.
- Terms which do not have the most general meaning attributable, for which specific meanings are assigned to the constituents.
Some of these, e.g. 10 and 12, may be duplicative, but there's no harm done. The question is if any are too broadly written. We also need to develop tests for phrasebook entries. Davilla 22:28, 13 May 2006 (UTC)
Names of characters in books and films
[संपादन]Are we allowed to add imaginary people? Donald Duck, Gandalf etc but not real people Geoffrey Chaucer, Rudyard Kipling etc. If that is the case - it seems silly to me. Παρατηρητής
- Well, your complaint is legitimate, if somewhat overstated, but the point I think is to be a reference of language rather than factual information. Going by my own opinions about the inclusion of terms:
- I would guess Donald Duck could probably be cited out of context. Gandalf would get the axe if the fictional character were part of the definition. But as only a (dubious) external link, that hasn't been the case.
- It might be difficult, but Geoffrey Chaucer and Rudyard Kipling could be cited in places as simply Chaucer and Kipling, unintroduced. However, I'd doubt you could find either full name out of context.
- But then, what would you expect? Few people talk about Chaucer or Kipling. Anyways, they're included on Wikipedia, so is there a need for complaint? The point is to avoid duplicity. Davilla 17:38, 17 May 2006 (UTC)
- Except, of course for the old joke. Do you like Kipling. No I'm afraid not. I have never kippled before. Andrew massyn 22:35, 20 May 2006 (UTC) :)
- I haven't a clue about how Chaucer or Donald Duck could meet the CFI. Chaucer should say something about the surname, not the person, and Donald Duck, well, there's a heap of translations there... Should we adapt the CFI regarding translatable names? It seems that they receive more endorsement than non-translatables. —Vildricianus 22:21, 20 May 2006 (UTC)
Encyclopedic
[संपादन]This subject is currently under debate. This outline establishes my opinion on the idea. Davilla
Encyclopedic means that the sense refers to a specific person, work, or other historic topic. The following pages are specific examples of encyclopedic names that do not merit inclusion in Wiktionary under these titles:
- Popular films or series such as Harry Potter or I Love Lucy.
- Culturally significant stories such as the Little Red Hen or the Four Dragons.
- Titles of novels such as Little Women.
- Dictionaries such as the Shorter Oxford English Dictionary.
- Celebrated authors such as William Shakespeare.
- Important historical figures such as Dwight Eisenhower.
- Companies and trademarks such as Xerox.
- Newsworthy locations such as Waco, the city in Texas.
In any of the above examples, a person who ran across the term wondering what it meant would be more likly to look in an encyclopedia than a dictionary. The rationale for not including encyclopedic entries in general reflects the desire to avoid duplicating the content of Wikipedia. However, the rules are inclusive rather than exclusive. That means that encyclopedic terms can be included in many cases, provided they have entered use linguistically rather than just socially or culturally. An entry may be included if it satisfies any of the following criteria:
- The term is used attributively.
- Places: New York as used in "New York delicatessen".
- There are other attested words that derive from the name, not counting original trademarks that have been genericized.
- Places: The family name Salisbury from the city of Salisbury. Bostonian from Boston, the city in Massachusetts. However, Jeffersonian can refer to any city named "Jefferson".
- People: Jeffersonian from Jefferson, a Founding Father and U.S. President. Washington the state from Washington the President.
- Other: Beatlesque from the Beatles, the rock music group; Micro$oft from Microsoft, the software company.
- Notes: This doesn't work the other way around. Just because there's an entry for Sleeping Beauty doesn't mean Walt Disney's film deserves mention.
- The term is used
idiomaticallyfiguratively.- Places: The capital of a country is used as synecdoche for the government of the entire political state.
- People: An intelligent person can be labeled an Einstein; there is a great deal of notoriety associated with the name Kennedy.
- Other: The American Heritage Dictionary defines Cinderella as a person who "unexpectedly achieves recognition or success after a period of obscurity and neglect". This usage derives from Cinderella, the character in the fairy tale.
- A single name stands for a specific person or place in the general context, although there may be many other people or places with that name. This is generally a first step to figurative use.
- Places: Athens is a common place name, but without any other contextual information it refers to a specific city in Greece. Thus the Greek city gets an entry while the American city in Georgia, of many places, does not.
- People: Eisenhower in the general context means the U.S. President, not his grandson David Eisenhower, after whom Camp David is named.
- Other: For its fans, Rocky is the shortened form of The Rocky Horror Picture Show.
- The term has standard, or common and non-trivial, translations into many other languages, especially on the other side of the world.
- Places: Although Taipei is a transliteration from the Chinese 台北, it is a standard name that has survived newer romanizations.
- People: There are many common mappings for names, and the translations for most celebrities follow these. In some cases there are secondary mappings, such as Martial for Martialis, but even Marcus Valerius Martialis does not deserve an entry at his full name.
- Other: In Vietnamese, the Vietnam War could just as easily be referred to as Nội chiến, the "Civil War".
Note that proper nouns can be generic terms as well, and these rules for encyclopedic meanings do not apply to given or family names such as John or Smith, or to generic place names such as Jackson, which may be defined simply as common names. The generics of trademarks such as xerox, used as a common word, can also be included if they can be attested under the normal criteria.
Comments
[संपादन]A culturally significant story is bound to have translations, isn't it? Edit: Maybe not. Changed reference above. Davilla 21:54, 26 May 2006 (UTC)
Edit: Trademarks becoming generic do not warrant an entry for the original tradmark. Davilla 14:45, 28 May 2006 (UTC)
Edit: OED deserves mention because it could be used out of context and people would be expected to know what it means. Changed to SOED. Davilla 08:27, 21 June 2006 (UTC)
People from Boston, Lincolnshire, England (a very old town) were probably called Bostonians long before Boston, Mass was founded. But that doesn't invalidate your argument. I toyed with suggesting a category (also applying to Boston) where a foreign (to UK in this case) place is so well known that people nearby still assume that the foreign place, rather than the local eponymous one, is being referred to unless specifically differentiated, eg in the UK, Boston, Pennsylvania, Cyprus, Paris, etc or in the US (guessing) Plymouth. However, on reflection, this is probably an encyclopaedic issue, whereas your proposal is linguistic, so more valid.
- I'm sure there's plenty of room for improvement. Davilla 14:54, 21 June 2006 (UTC)
An interesting problem for the future is Westward Ho! a British town named after a book, and the only official British place name containing a !. Does this make it a valid entry for the town (since the ! makes it a linguistic anomaly) with the book being mentioned in the etymology, or what? Enginear 11:04, 21 June 2006 (UTC)
Stock symbols
[संपादन]I recall it coming up before, but the specific exclusion of stock ticker symbols seems to have been removed? Was it never here, only in conversations on BP? Any objections to indicating that we don't want them here, here on CFI? --Connel MacKenzie T C 07:35, 28 May 2006 (UTC)
- Vildricianus has suggested keeping this page updated. I don't or barely recall the debate on this topic, and certainly not the outcome, but you make it sound like this was the consensus, in which case that would be the correct action to take. It was a question of no English context, right? Then this might annotate the running-line-of-text concept well. Davilla 14:43, 28 May 2006 (UTC)
Lost patience with CFI
[संपादन]Certain users decided to develop these CFI. Which is not a bad thing, even if the Criteria are somewhat biased and arbitrary. But then those very same users selectively apply the CFI to in effect bowdlerise Wiktionary. The vast majority of words in Wiktionary do not meed CFI, but it's only the "offensive" ones that get targetted.
I really can't be bothered to attempt to modify the "policy", as regardless of the policy the bowdlerisers will still do their best/worst to remove "unsavoury" words.
The CFI are discredited by the very biased way they are used/applied. I can't be bothered to give examples. Just use the Random Page function a few times, and for each page see if the page meets the criteria. Particularly the one about 3 citations. Want to take any bets on what percentage of pages actually meet the criteria ?--Richardb 12:29, 29 May 2006 (UTC)
- What's a dictionary without a set of criteria? The number of users and contributors is only going to increase over time, and we're de facto defenseless without decent CFI. They're not just there to make deleting content justified; the opposite is equally important. Without CFI, you can't make a point against people who want to remove things unjustly. And BTW, what's with this "vast majority doesn't meet CFI"? It seems to me that you have a different impression of Wiktionary in its current shape. —Vildricianus 14:37, 29 May 2006 (UTC)
Trademarks
[संपादन]- Being a trademark or a company name does not guarantee inclusion.
While the companies mentioned in the text may claim their trademarks are nouns and not verbs, should widespread use of the words like "photoshopped" and "googled" trump those claims? They can claim all they want but if millions of people use the word, it exists, at least in my book. - 131.211.210.11 09:10, 30 August 2006 (UTC)
- That quotation pertains to the brand-specific (generally capitalized) usage. The words you're talking about are not trademarks by our definition because they are generic (so usually lower-case), although it's better to mention the status of the word for the information of readers. What the company says about the word couldn't matter less. They are not the authority on linguistic use, and what they claim does not carry legal weight in this context. The best example is xerox which has been listed in the OED since the fifties. Push comes to shove, it takes three quotations as per the CFI. DAVilla 04:28, 31 August 2006 (UTC)
"Verified through"
[संपादन]“Attested” means verified through
- Clearly widespread use,
- Usage in a well-known work,
- Appearance in a refereed academic journal, or
- Usage in permanently-recorded media, conveying meaning, in at least three independent instances spanning at least a year.
I don't quite catch it. Is that supposed to imply "verified through any of these" or "verified only through all of these"? Dart evader 20:54, 6 October 2006 (UTC)
- Through any of them (note commas after 1 & 2, and or after 3). --Enginear 21:05, 6 October 2006 (UTC)
- I'm surprised the 1st one hasn't been questioned yet. I've always considered 1,000,000 google hits to be a decent indication for that. Perhaps that is too low a number? --Connel MacKenzie 21:35, 6 October 2006 (UTC)
Typo
[संपादन]There is a typo on the page, but since it is blocked, thought I’d report it here: the word ‘hypocoristics’ should be capitalised. Remove this remark at will. 134.2.147.26 13:39, 18 October 2006 (UTC)
- Indeed it should, being at the beginning of a sentence. Done. Robert Ullmann 14:04, 18 October 2006 (UTC)
About
[संपादन]The archive for this talk page can be found at Wiktionary talk:Criteria for inclusion/Archive.
There is the page Wiktionary:Editable CFI where proposed changes to CFI are made, and discussed.
Subpages of Wiktionary:Criteria for inclusion: साचा:subpages
Multi-word entries, sums of their parts and translations
[संपादन]I've been thinking about a possible guideline regarding the multi-word terms. In particular, I'd like to neglect Davilla's Pawley test topic for this post, although the best solution would probably be a combination of both. See this as a possible additional test.
Some words that have been RFD'd lately I feel do merit some kind of inclusion here, whilst others don't, and the easiest way for me to determine that is to look at their translations. Example is WT:RFD#indoor baseball.
Thinking in English-only, I don't see the merits of including this particular term, or any of the other indoor terms, as their meaning is defined by indoor. However, as my argument there described, such terms are translated in one word in at least two languages that I know of, German and Dutch, and possibly as well in more languages that I don't know of.
This rule may have a large impact, which those who know some German or Dutch will know, for terms like vintage car may be (I don't know) translated into one word there.
There may be two benefits:
- Non-English entries of such terms, for instance the Dutch zaalvoetbal, can link properly to indoor football, instead of to indoor football.
- indoor football will list the correct translations for at least German and Duch, so that users don't have to go through the process of looking up indoor, which would have the Dutch translation zaal- (a combining form), then looking up football, and then guessing how to link them, keeping in mind the various very complex rules for morphological word building in Dutch and German.
Opinions? — Vildricianus 13:24, 4 June 2006 (UTC)
- I think there's a very close relationship between single words in other languages and what we would consider to be a single concept in English. However, other languages clearly also have concepts that do not exist in English as set phrases, such as father's older brother. To a person who speaks Chinese, mention of the word would immediately conjure images of what an older uncle might be to a younger, and the people in his own family who are associated with the word, as well as his father's friends as it turns out. To an English speaker, the phrase would have to be mentally summed, and the implications are not immediately obvious. So I don't think this could be used as an inclusive rule. I would wonder if it could be used as an exclusive rule, for instance if no other languages had a single word for skateboard wheel, or at least a term that passed the inclusive rules; essentially, if skateboard wheel isn't demonstrably a single concept in any other language, then it isn't one in English either. The inability to apply an exclusive rule like this to, say, vintage car because of some translation, would add credibility to the idea that it very well could be a single concept in English. Davilla 15:42, 5 June 2006 (UTC)
- There have been a few debates on RFD where this migh apply, particularly active volcano. Despite what I wrote above, I've been thinking that this is a pretty good criterion to fall back on, even if it isn't ever included specifically. We should definitely include last night because you can't say yesterday night in most contexts without sounding a little funny. We should probably include last year because of the translations, and it's a pretty common expression anyways. There isn't any reason I can see for keeping last financial year, but maybe I'm just trying to stir trouble. DAVilla 21:33, 15 December 2006 (UTC)
- I think that the question of the inclusion or exclusion of an expression as a derived term should not simply be a question of whether the meaning can be derived from a composition of its constituent words, but rather if it also includes a significant degree of markedness such that the use of some other combination of words to express the same meaning would be considered unnatural to a native speaker. This conventionality can be measured by looking at the distribution of the collocation relative to the distribution of other collocations with a similar derived meaning. As I understand it we're not just building another Webster's here, but rather trying to declare a much larger, more detailed description of the human lexicon. We don't just want a laundry list of how you might express a particular meaning, but also how one would express a particular meaning. If wikitionary is going to function well as a cross-linguistic resource, which I think it should, it needs to include the conventional. We can make another formal argument in favor of the inclusion of conventional expressions in the lexicon by considering the performance of Natural Language Processing systems. Systems that include statistically mined conventional multiword expressions in the lexicon perform significantly better at selecting a correct syntactic parse from amongst the many thousands of well-formed possibilities. This makes a lot of sense when you realize the agent, computer or human, that is listening or reading must first tokenize the input stream before interpreting it. If a collocation doesn't exist in the lexicon, then it can't be treated as a token, thus greatly (and unnaturally) increasing the combinatorial complexity of the language stream. Johnfbremerjr 10:54, 10 February, 2008
- Please see Wiktionary:Idioms that survived RFD, which is an attempt to import the Pawley guidelines and rationalize why the community supports some phrases and not others. DAVilla 07:34, 11 February 2008 (UTC)
Blogs
[संपादन]When were "blogs" added as durably archived? They are not. All of the citations of blog sources I've seen so far have not been google archive links (therefore, not to the durably archived source.)
This seems to be quite fallicious, as google doesn't seem to archive them.
The discussions that do mention blogs above give clear reasons for not using them (as CFI used to state) oddly, from the most inclusionist contributor Wiktionary has seen so far!
What gives? Who added "blogs" and why?
--Connel MacKenzie 10:27, 20 August 2006 (UTC)
- Of course being durably archived is the most important aspect. If the CFI is incorrect then this needs to be changed.
- I vaguely remember a time of revision of this article when this change might have been made, anyways it's somewhere in the history. (Wouldn't it be nice to be able to select a portion of the article and find when that text was most recently changed?) I don't think the intent was malicious but it would be nice to ask for the reasoning. One of the arguments at the time was about feeds being archived. I'm not literate enough in the technology to know if that applies to blogs. DAVilla 21:07, 21 August 2006 (UTC)
- I'm still puzzled by this. Can we make it go away? -- Visviva 14:37, 22 June 2007 (UTC)
- I have now removed this text. -- Visviva 02:20, 28 September 2007 (UTC)
Reconstructed languages
[संपादन]I object in the strongest possible terms to the unilateral imposition of 'policy' on the part of User:Robert Ullmann. Refusing repeated invitations to constructively state his position on Wiktionary_talk:Reconstructed terms, and following a failed deletion request, he just made unilateral changes to Wiktionary:Reconstructed terms to suit his whim, without bothering to give any explanation beyond '1/2 rewrite', knowing perfectly well his changes would be controversial.
I am not seeking to impose any fixed opinion of mine, I am looking for intelligent debate among people aware of the issues involved. Robert Ullmann's suggestion has some merit, but it also has flaws, and as long as he just keeps imposing it without debate, there is no way of ironing them out. Robert Ullmann does important work on wiktionary. But he has very idiosyncratic views on etymology and langauge reconstruction, and no interest in, and consequently no knowledge on the matter. It is bad enough that he abuses his admin privileges to chastise me over alleged violation of CFI (which has still 'Semi-Official status'), but to insert such a "policy" into CFI after the fact, and after realizing that it had not in fact been there at the time he chose to chastise over it is simply wikityranny (making up your laws as you go along), indefensible under wikiquette, and unacceptable on any Wikimedia project. Let him either discuss the issue amicably, or step down from policing about it.
I do invite anyone interested in the topic to seek for a solution acceptable to everybody, but I will not put up with such bullying tactics. Dbachmann 10:48, 26 January 2007 (UTC)
- User blocked for one week, knowingly removing CFI clarification made as result of policy vote, change reverted. Robert Ullmann 12:09, 26 January 2007 (UTC)
Oxford English Dictionary
[संपादन]I noticed that materteral is listed for deletion yet it appears in the OED. It is my thought that if a word is in the OED, it merits inclusion in wiktionary. What do others think? WilliamKF 19:43, 8 February 2007 (UTC)
- The tenuous decision has been generally to allow such references as some kind of refereed academic work, even though it is no such thing. For RFV, see
{{nosecondary}}
which explains some of the reasons why we don't/won't/can't take everything the OED has. The concession was made, I might add, during a dispute with Wiktionary's most infamous copyvio vandal, long before his actions were exposed as being 100% copyright violations. If it comes to a vote, I'd vote strongly against such folly; the minute a vote passed, someone would start a bot stubbing in the OED entries, exposing WMF to certain copyright concerns. On the other hand, if a word appears in the OED and here, but no other major dictionaries, we probably should delete it, even if it isn't a word-for-word copy. --Connel MacKenzie 20:02, 8 February 2007 (UTC)
- I'm not convinced by the copyvio reasoning implied in your last sentence (though I very much agree with the rest). AFAIK, OED only includes words for which it has either found prior use or prior mention (eg in earlier dictionaries). In the former case, we can make up our own minds as to the meaning of a prior use (though it would be dodgy if we cannot find the same or different cites via a separate search). In the latter case, copyright is less likely to be a problem, particularly since the dictionaries cited are usually >>120 yrs old.
- In the case of materteral I see that there are at least three good b.g.c. cites, so having found it's meaning, and feeling somewhat avuncular about it, I suppose I might weigh in behind it. --Enginear 20:28, 8 February 2007 (UTC)
- Yes, the Oxford English Dictionary (OED) is strictly based upon giving examples quoted from literature. In terms of copyright violations, the OED in its first edition (1928) and supplements dates back to the beginning of the twentieth century and therefore, would not be subject to copyright similar to how the Encyclopedia Britannica 11th edition is used on wikipedia. WilliamKF 20:48, 8 February 2007 (UTC)
- Note: I'm blocking this latest साचा:vandal sockpuppet साचा:vandal. --Connel MacKenzie 15:02, 13 February 2007 (UTC)
Sign languages?
[संपादन]Does Wiktionary in principle allow inclusion of words in sign languages? Showing the gesture shouldn't be too difficult -- stationary gestures can be shown with an image, mobile ones with a video -- but getting a gesture to be the name of an entry page might be more difficult. Angr 23:00, 10 February 2007 (UTC)
- This might need a separate namespace because of the very different format of presentation. How do you list synonyms and antonyms, for instance? How do you put a sign language entry into a translations table? Besides which, there are many different sign languages, including American, British, Hungarian, and that of the American Plains Indians. There are some websites linked from the Wikipedia article on w:sign language that might provide ideas. Can you imagine what the American sign language Wiktionary would look like? --EncycloPetey 23:06, 10 February 2007 (UTC)
- Listing synonyms and antonyms is not hard: include a picture or the like. Having an entry is what's hard: how do we include the term as the PAGENAME? See also my comment below, in this section.—msh210℠ 17:44, 11 February 2008 (UTC)
- There has been a bit of discussion on this question at various times. Wiktionary:About sign languages has some of the results of that discussion, and Wiktionary:Information desk#American_Sign_Language currently has my reply to someone else who recently asked the same question you just did, Angr. Any ideas you have would be great; probably the Wiktionary:Beer parlour or Wiktionary talk:About sign languages would be the best place to mention them, the former especially if they are ideas on how to include SL entries.—msh210℠ 17:44, 11 February 2008 (UTC)
Inflected forms
[संपादन]I think that inflected forms should be included if they belong to two different words in the same language. I once looked at a play script in Spanish and had to pause at the word viste to figure out which verb was meant. Other examples in Spanish are fue etc., ve, and the regular siento, sienta, and siente. Russian has дне and хоре.PierreAbbat 02:25, 11 June 2007 (UTC)
- Because spellings so easily overlap in different languages, here on en.wiktionary, we aim to include all inflected forms, not just ones that might have obvious problems. --Connel MacKenzie 07:51, 11 June 2007 (UTC)
use in a refereed journal: mathematics
[संपादन]I'm not sure about other fields, but in mathematics a refereed article will often have what we call "ad hoc definitions". That is, for example, the author, call him Smith, will say "let a foo subgroup be a subgroup that is finite and central". Smith then uses the word "foo" a hundred times over the course of his paper, but is never heard of again in the literature. Words like these should I think not have entries. On the other hand, sometimes Smith does the same thing, and then another author will say "Let a foo subgroup be, after Smith, a finite central subgroup" and use the term in his paper, and a third author will say "If the subgroup is foo (in the sense of Smith 2007), then..." and a fourth will say "If the subgroup is foo, then...". (This process does not occur over the span of four papers. But the progression is approximately correct.) At what point in this process does the word become acceptable in en.wikt?—msh210 18:45, 16 August 2007 (UTC)
- (Note incidentally that the word may have been in use by Smith and his colleagues in various universities well before it was ever published. But i'm assuming for the sake of argument that we cannot attest that.)—msh210 18:45, 16 August 2007 (UTC)
- If a word, e.g. “foo”, becomes strongly-enough associated with a particular definition that many authors begin to use the word without defining it, “foo” in that sense will naturally meet the existing CFI. Rod (A. Smith) 18:52, 16 August 2007 (UTC)
- Well, yes, that's the "fourth" stage above. But does a "in a refereed academic journal" rule apply to any of the earlier stages, was more my question.—msh210 19:55, 16 August 2007 (UTC)
- Well, the attestation section (actually, bulk of the CFI) was written to clarify the general rule: “A term should be included if it's likely that someone would run across it and want to know what it means.” Following the spirit of that general rule, I'd say that readers of academic journals are only likely to want to know what a given term means if the journal uses the term without defining it. That is, I assume that the “Appearance in a refereed academic journal” part of the attestation section is present to refine the clause “someone would run across it”, not to override the “[someone would] want to know what it means” clause. Does that make sense? Rod (A. Smith) 20:23, 16 August 2007 (UTC)
- Actually, that was Dmh's phrase ("if it's likely that someone would...want to know what it means,") if I recall correctly. Frankly, I don't know how that phrase escaped notice. --Connel MacKenzie 20:39, 16 August 2007 (UTC)
- The problem is that a word could have just one "appearance in a refereed academic journal" and be admitted, even if it were just the definition—a mere mention of existence, or even inexistence! Likewise, "usage in a well-known work" allows for literary nonces, which have been received with skepticism. We should alter CFI to say that all terms must convey meaning in three independent instances over a year, and that if disputed they must be so cited, with the exception of clearly widespread use. I cannot imagine that change eliminating anything of substance. DAVilla 13:09, 17 August 2007 (UTC)
- I disagree. I think there is a problem here, and I think you've mostly identified it correctly, but I don't think the solution is to remove these exceptions; the exceptions serve an important purpose. For example, there are plenty of languages that simply don't have the written corpus needed for their words to meet the normal CFI; but the academic-journal exception means that if linguists (or anthropologists) publish papers about these languages (or people) and define some words, we can include those. Also, while it might be obvious to us, after looking for independent cites, that a word in Romeo and Juliet is a nonce-word, the casual reader might not find that so obvious; and while obviously it's not worthwhile to include every nonce-word in every work, some works (the King James Version of the Bible, several of Shakespeare's plays, the U.S. Constitution and Declaration of Independence, etc.) are sufficiently well-known and widely read that it does make sense to include even their nonces. —RuakhTALK 16:37, 17 August 2007 (UTC)
- Anthropologists and ethnolinguists are wonderful people, but they are no more reliable than lexicographers when it comes to defining words. If there are no authentic durable records of a language whatsoever -- no recordings, no transcripts -- there is simply no material for us to work with. In this respect I think the use-mention distinction trumps any value in peer-reviewed scholarship ... so I'm inclined to agree with DAvilla that this clause no longer serves any useful purpose, and in fact contradicts our current practice. -- Visviva 02:39, 28 September 2007 (UTC)
Personal names from languages with a non-Latin script
[संपादन]Having perused this I cannot fathom why the Russian name Дмитрий exists only in Latin form and Владимир exists in both scripts??? I recommend strongly for names with no (orginal !) Cyrillic articles yet written to be moved to articles with the appropriate Cyrillic titles, if nobody minds. Is the article in Latin letters appropriate at all, since transliteration is always provided? Bogorm 08:56, 16 August 2008 (UTC)
- I think the issue here is simply the incompleteness of Wiktionary. Дмитрий certainly should exist (and Dmitry was in some serious need of cleanup). However, much as the two are related, the existence or lack thereof of Дмитрий should not in any way affect whether we keep Dmitry. If it can be attested, it should be kept. -Atelaes λάλει ἐμοί 09:08, 16 August 2008 (UTC)
- Deleting of Dmitry is not my concern, since I am not administrator and until someone proposes it for deletion. If you do not mind, I am going to move Dmitry to Дмитрий, so that the Latin title be used as redirection. Bogorm 09:12, 16 August 2008 (UTC)
- That would be inappropriate, since we do not use redirects that way. See Wiktionary:Redirects. --EncycloPetey 16:48, 16 August 2008 (UTC)
- Deleting of Dmitry is not my concern, since I am not administrator and until someone proposes it for deletion. If you do not mind, I am going to move Dmitry to Дмитрий, so that the Latin title be used as redirection. Bogorm 09:12, 16 August 2008 (UTC)
- In the future, please bear in mind that an entry requires some reformatting if you move it to a different language. -Atelaes λάλει ἐμοί 17:58, 16 August 2008 (UTC)
- Why have you deprived the article of the Transliterations section? What do you mean under work? It is not moved to a different language - Дмитрий is the only admissible form of the Russian name and Dmitry is an independent article about the English case, though I strongly doubt that any British would opt for a purely Russian name, unless he is a fervent adherent of the current Russian president. Bogorm 18:17, 16 August 2008 (UTC)
- I have removed the transliterations because we have a specific transliteration format for Russian words, and I don't know Cyrillic script. I put a marker on it to garner the attention of someone who knows Russian to come and add it. It is moved to another language because you took the content of Dmitry (an English word), and moved it to Дмитрий (a Russian word), without properly reformatting it. It was still classified as an English proper noun, as well as being in the English category for Latin and grc derivations. -Atelaes λάλει ἐμοί 18:25, 16 August 2008 (UTC)
- Well, I speak fluently Russian, but am not knowledgeable about formatting of proper names according to your wishes - so I shall essay to be helpful by elucidating here the transliteration but without adding it in templates, since I have not yet got accustomed to using templates (besides the quoted on my talk page): the official scientifical transliteration is "Dmitrij", but the popular for the English-speaking world is Dmitry(corroboration for my words is to be found in the article about President Medvedev, whose first name is rendered as transliteration in brackets and in the popular rendering in the title of the article). I do not know however, old Russian (Eastern Old Church Slavonic) and the question about the ancient spelling can hopefully be resolved by EncycloPetey (see below). Bogorm 19:15, 16 August 2008 (UTC)
- I'm not sure what you mean by "official scientifical transliteration" of Russian, unless you mean the International Scholarly System, since there are several standard schemes in use just for Russian transliteration, including the system used by the Libarary of Congress (which would transcribe Дмитрий as "Dmitrii". There are also different Latinizing transcription systems used in Germany and Poland that I have seen, and presumbaly there are many more besides. See Romanization of Russian on Wikipedia for a little more, including a table comparing 7 of the systems. The system preferred by the Russian government and the Russian Commenwealth is GOST 7.79. --EncycloPetey 20:16, 16 August 2008 (UTC)
- Yes, I meant the first one, because it is international and present in all articles about Russia in Wikipedia as the quoted one. And regional regulations have not international jurisdiction. Bogorm 22:11, 16 August 2008 (UTC)
- I'm not sure what you mean by "official scientifical transliteration" of Russian, unless you mean the International Scholarly System, since there are several standard schemes in use just for Russian transliteration, including the system used by the Libarary of Congress (which would transcribe Дмитрий as "Dmitrii". There are also different Latinizing transcription systems used in Germany and Poland that I have seen, and presumbaly there are many more besides. See Romanization of Russian on Wikipedia for a little more, including a table comparing 7 of the systems. The system preferred by the Russian government and the Russian Commenwealth is GOST 7.79. --EncycloPetey 20:16, 16 August 2008 (UTC)
- Well, I speak fluently Russian, but am not knowledgeable about formatting of proper names according to your wishes - so I shall essay to be helpful by elucidating here the transliteration but without adding it in templates, since I have not yet got accustomed to using templates (besides the quoted on my talk page): the official scientifical transliteration is "Dmitrij", but the popular for the English-speaking world is Dmitry(corroboration for my words is to be found in the article about President Medvedev, whose first name is rendered as transliteration in brackets and in the popular rendering in the title of the article). I do not know however, old Russian (Eastern Old Church Slavonic) and the question about the ancient spelling can hopefully be resolved by EncycloPetey (see below). Bogorm 19:15, 16 August 2008 (UTC)
- While there may be only one Cyrillic form in common current use, that is not the only spelling possible in previous centuries. Unfortunately, I have discovered that I am missing the relevant page from my copy of Nikolaj Michailovič Tupikov's Wörterbuch der Altrussischen Personnenamen (Köln & Wien: Böhlau Verlag, 1989). However Wickenden's Dictionary of Russian Names includes a large number of spellings. (Wickenden's are transliterated, but use a consistent transliteration system). --EncycloPetey 18:26, 16 August 2008 (UTC)
Somewhat related discussion is at my talk page.—msh210℠ 21:09, 20 August 2008 (UTC)
Scientific nomenclature
[संपादन]In general, how far should we be diving into the scientific/technical names of things? I have two main question's in mind:
- Should every (established) genus-species of organisms have an entry? ("established" meaning there's good citations, but that's pretty easy if you consider all the scientific journals out there who use the terms without defining them)
- Likewise, what makes chemical names worthy of inclusion? There's so many variations.. see w:8-Azaguanine where I listed all the synonyms I could find appearing in at least two independent sources. If you Google most of them, the chemical uses are buried deep in the results, however, if you use Google Scholar, they're usually all you get.
Now, I understand people don't normally use dictionaries for looking up this kind of information, but for the sake of completeness (and the limitless potential of Wiktionary), I guess my main question is where do you draw the line? Voxii 23:47, 3 March 2009 (UTC)
- Here's my sense of the current state of things:
- The sense of the community lately seems to be "no", that we should have genus names and species epithets but leave actual binomials and trinomials to Wikispecies. (But I would defer to EncycloPetey, DCDuring, and any others who have actually worked on this area lately.) This has varied over time, and we do have a number of binomial names. The not-yet-closed RFD for B. splendens may be pertinent.
- Sum of parts hyphenated chemical names (e.g. full IUPAC names and variations thereof) are out, with a possible exception if they happen to be used outside of the chemical field. So I don't think we would want anything above "8AG" in your list. Trade/commercial/brand names are out unless they happen to satisfy WT:CFI#Brand names. I expect that identifiers like "NSC-749" are also out, though I don't think that's ever been put to the test. Terms like "triazologuanine" should be fine, I think, provided that they are verifiable. -- Visviva 03:11, 4 March 2009 (UTC)
- Ok, that sounds reasonable. I figured things like "5-Amino-1,6-dihydro-7H-v-triazolo(4,5-d)pyrimidin-7-one" were out. Most of the ones that are a couple letters and numbers are usually only used by certain organizations or are something like a brand name, so I guess we don't want those either. As for the species names, I'll take a look around. I think some of the more popular ones might be worthy of inclusion. Thank you very much for your answers. Voxii 10:29, 4 March 2009 (UTC)
- I agree that "some of the more popular ones" should be included, such as Homo sapiens, Tyrannosaurus rex, and E. coli. Angr 15:06, 6 March 2009 (UTC)
- Here's my sense of the current state of things:
- I think Visviva has fairly summarised current unofficial preferences.
- I think we do a good service when we take attestable, includable vernacular (even brand/product) names and translate them into current scientific terms and find the best current WP article (and/or outside source) to link to. Sometimes dated scientific terms can be sussed out, though that is harder. WikiSpecies is already much more complete than we are ever likely to be about the structure of the taxonomic tree.
- Getting all the one-part taxonomic names is plenty challenging. I suppose the same would be true for all the combining forms of chemical terms as well. DCDuring TALK 16:02, 6 March 2009 (UTC)
The rule for inclusion of proper names are outdated, they are not followed and unhelpful. Maintenance complexity should not be a factor if name spelling can be checked against a variety of dictionaries.
The changes I propose are:
- Allow all country names, their capitals in the English language and in the original language and script, etymologies, alternative spellings, meanings, pronunciation and translations.
- Allow regional centres - capitals of states, provinces, counties, shires, regions, prefectures, oblasts, etc. regardless of their size.
- The inclusion of other place names to be discussed. Population, historical or economical importance? Provide reference to a dictionary (to discuss, which dictionaries are considered valid)
In any case, I suggest not to restrict but encourage the inclusion of proper names. Anatoli 03:14, 11 March 2009 (UTC)
- What do you suggest we use for evidence? Should all administrative units be included or just primary ones? Should the regions have a governmental administrative structure or could they be statistical areas or popularly used names for areas. We have obsolete meanings of words; should be have obsolete meanings of place names? Should we have official names or popular names or both. What about mythical places (Valhalla) or historical "places" with uncertain boundaries (Scythia)? Obviously not all of these questions need be answered at once. DCDuring TALK 03:45, 11 March 2009 (UTC)
- Judging from your questions, I can see that your main concern is maintenance and who is going to verify the accuracy? Like in all Wiki projects, there is always a risk and someone who knows the correct information can change.
- My main concern is that a gaseteer adds no value. The "maintenance" include conceptual concerns.
- Judging from your questions, I can see that your main concern is maintenance and who is going to verify the accuracy? Like in all Wiki projects, there is always a risk and someone who knows the correct information can change.
- By the regions I mean governmental administrative structure, like Urumchi/Ürümqi - capital of Xinjiang. Include unrecognised/partially recognised and disputed territories - in a neutral informative tone. (Western Sahara, Kosovo, etc). The status can have a leading entry explaining the status.
- Can't see any problem with obsolete names, if they are a redirect entry, "alternate or obsolete spelling" entry.
- I meant obsolete definitions of a word. Should every major border change be reflected?
- Official names for place names (cities/towns), countries - popular names (as is the case already)
- Some possible variations in the name in the name can be described in the entry, as was the case with Rostov. The Russian city of Veliky Novgorod was officially called Novgorod till 1999. I I were to create the entry (it's an administrative centre of a region) now, I would call it "Veliky Novgorod" with a link to the alternative or older spelling - "Novgorod". Popular names may be useful but if they are appropriate for this language. In the Russian entry I would make Великий Новгород the main entry with a popular name Новгород.
- Mystical and uncertain historical names are not my concern. Not sure if they need to be included but I don't see why not, if they can be useful for users. The leading sentence should specify what the entry is. Perhaps, we can exclude them for now but it's up to other Wictionarians. Anatoli 04:11, 11 March 2009 (UTC)
- Evidence? Happy to discuss this but in most cases, the names are obvious, well-known and easily verifiable by a simple search. The evidence may only be required in cases of a dispute. Then one needs to provides something solid. But isn't this the case already? If I write a name and you agree with the spelling, then there is no need for any evidence. This could be happening with the spelling of well-known names, such as San‘a’ (capital of Yemen), El Aaiún (Western Sahara), Urumchi, etc., which can have more than one spelling. In this case, we need to discuss the correct spelling for the entry. Anatoli 04:21, 11 March 2009 (UTC)
- You seem focused in this discussion on big entities, but you have spoken of all places, whatever their population. There are many of them. Should governmentally designated places be automatically included, whether or not there is a government associated with the place? See w:Place (United States Census Bureau), especially w:Census-designated place.
- Evidence? Happy to discuss this but in most cases, the names are obvious, well-known and easily verifiable by a simple search. The evidence may only be required in cases of a dispute. Then one needs to provides something solid. But isn't this the case already? If I write a name and you agree with the spelling, then there is no need for any evidence. This could be happening with the spelling of well-known names, such as San‘a’ (capital of Yemen), El Aaiún (Western Sahara), Urumchi, etc., which can have more than one spelling. In this case, we need to discuss the correct spelling for the entry. Anatoli 04:21, 11 March 2009 (UTC)
Furthermore, there is likely to be a great deal of interest in natural features: bodies of water; mountains, hills, mountain ranges; valleys, plains, plateaus; below-surface features; and man-made structures (buildings, kurgans, jetties); public places (parks, squares, plazas); public transportation; and roads. Are these items beyond your present concern? Should they be? Are the entities for which you would propose amending WT:CFI more or less meritorious?
Lastly, but most importantly, how are users better off because Wiktionary includes such place names and/or names of political/governmental entities? DCDuring TALK 14:49, 11 March 2009 (UTC)
- I think the proposed changes (1 and 2) would be fine, inasmuch as they mostly reflect current practice and keep the set of permitted entries small (countries, primary subdivisions of countries, and capitals thereof -- I suppose this would be a bit over 10,000 words, many of which we already have). Even though I'm not thrilled with the idea, and there are very serious unresolved problems in our treatment of proper nouns, I would support the change, just because it would reduce the current gap between policy and reality. Then we can go back to arguing about exactly how to handle these, and what to do with all the other place names. :-) Question: there are some very large cities that aren't the capital of anything, notably Los Angeles and Chicago in the States; should we have a clause permitting any city of more than 1 million people? -- Visviva 15:29, 11 March 2009 (UTC)
- I think we should at least have a clause for including the most heavily populated city in each state. It would be absurd to include Tallahassee but exclude Miami. I'd go farther down than a million, at any rate - 250,000 is a reasonable baseline. For place names with multiple uses (e.g. Jacksonville, Springfield), if one comes in, all should come in to avoid confusion. It's also fine to handle this, as we have, by defining the term as the name of multiple places and referring the reader to the Wikipedia disambiguation page. bd2412 T 03:47, 16 March 2009 (UTC)
- I think the proposal looks fine, for the same reasons as Visviva (having the same reservations as well). -Atelaes λάλει ἐμοί 18:48, 11 March 2009 (UTC)
- Natural features are beyond my present concern (although, I don't see any reason to object their existence, if they are correct), neither are smaller town districts. If the number of governmentally designated places are too many, I am happy to reduce and limit to about next level of the nations capital (states, provinces, prefectures, autonomous regions, territories or oblasts). The entries are not forced to be created and are not created automatically, they are created by editors manually, so I don't see any reason for concern of having too many to handle. I've been checking the appendixes, they seem to be mainly linked to the main body of Wiktionary, anyway, and if I tried to create an entry from a red link, it would create an entry in Wiktionary, not in the appendix. Am I missing something? The benefit for the users? - I have already explained, like any dictionary, it's for the information, besides, here, it's multilingual, allows to discuss/inform about etymology, pronunciation, transliteration, grammar and other linguistic issues. In reality yes, we have quite a number of proper names already, which is behind the policy. As I said before, my attitude is more is better than less, as long as it is accurate. Los Angeles is a big place and having it here is only beneficial. All 1 mln. (if not less) cities must be included, IMO. Sorry, mixed all answers in one paragraph, hopefully, it's readable. :) Just in case, I prefer "New York" to "New York city", and names of regions coinciding with its capitals/centres can go into one entry. Anatoli 00:45, 12 March 2009 (UTC)
- Wiktionary is an online dictionary (among other things). Wikipedia has large article with volumes of information irrelevant for finding translation for proper names. The linked multilingual articles (if you mean using this method for finding out what it is called in another language) are not necessarily linked to an identical article in another language, e.g. "USA" may be linked to "United States of America". The translations are not grouped in one place and the etymology, basic pronunciation is not available. I understand what you are referring to but this method is not for everyone and not is user friendly. Besides, Wiktionary provides a concise meaning of a proper name (at least, country and what it is, e.g. a city). That's all you need from a basic dictionary. Etymology and related terms would be a bonus but if you don't have a stub, there won't be anything to improve on. Anatoli 03:41, 12 March 2009 (UTC)
Toponyms are a special case, and I think a few extra rules need to be stated explicitly. As a general principal, we should handle them strictly lexicographically, and leave the encyclopedic documentation to Wikipedia. They should qualify for inclusion the same way as any other term: three attestations in durable works.
Contrary to common sense, most geographic references should not be used. Only references which examine place names from a linguistic (onomastic, toponymic, or etymological) angle should be used.
- Exhaustive official lists of place names should not be used, because they are prescriptive. If no one has ever written about Lower Slobovia in English, then it shouldn't have an English entry in a descriptive dictionary like ours.
- Place-name entries in general dictionaries should not be used as references or examples:
- General-reference dictionaries add toponyms to increase their quick-reference value for users. We add a Wikipedia link for the same purpose.
- General-reference dictionaries don't treat toponyms lexicographically, providing etymologies, documenting attested use, etc., rather they give encyclopedic or gazetteer information, like population, etc.
- Atlases should also be prohibited as references:
- Modern atlases are prescriptive, relying on official lists of approved geographic names rather than actual native-language usage.
- Modern atlases transcribe native place names for all but the most well-known places, and don't necessarily present names as used in English.
The “definitions” or descriptions, like those of other terms, should be the minimum necessary to define the place. Encyclopedic information like population, etc, should be prohibited. —Michael Z. 2009-03-16 03:12 z
Proverbs
[संपादन]The proverbs section states that if the phrase is a complete sentence, it should start with a capital letter. The linked example redirects to an uncapitalised version, and all entries in Category:English proverbs that do not begin with a proper noun are uncapitalised. I assume this document is dated, as opposed to many entries in that category being wrong, and thus needs revision. Mindmatrix 20:52, 24 March 2009 (UTC)
misspelling
[संपादन]The formatting says # {{misspelling of|[[...]]}} but I thought practice was # {{misspelling of|...}} with no linking (provided by the template instead). RJFJR 20:21, 19 May 2009 (UTC)
- See discussion and news (s.v. December).—msh210℠ 20:41, 19 May 2009 (UTC)
- No, that's different. RFJFR is right: we don't even want misspelling-only entries to count in the statistics. (Of course, it's hard to prevent it, because some of the other Wiktionaries will automatically create entries in response to ours, which means that we get interwiki links, which contain [[. But that's the idea.) —RuakhTALK 21:43, 19 May 2009 (UTC)
Translation target
[संपादन]The criteria for inclusion could be extended to include sum-of-parts terms if they serve as a translation target. Specific criteria for how to recognize a translation target are not yet clear.
Examples of possible translation targets:
- high school student – French: collégien or lycéen; added later: but: "highschooler"
- indoor football – Dutch: zaalvoetbal; added later: is this actually a non-SoP name of a sport?
- problem solving – German: Problemlösen; added later: but: "problemsolving"; but-but: "problemsolving" is much less common than "problem solving"
- small boat – Czech: loďka, lodička; diminutives in general; added later: but: "boatlet"; but-but: "boatlet" is rare.
- two-wheeled – Finnish: kaksipyöräinen
- email message – Finnish: sähköpostiviesti
- rice noodles - German: Reisnudeln
Discussions:
- Beer Parlour, WT:BP#Inclusion_of_SOPs_for_translations_.E2.80.94_proposal, 22 August 2009
See also:
Feel free to add further examples and bullet items identifying discussions to this post.
--Dan Polansky 17:01, 22 August 2009 (UTC)
- I believe that indoor football is a set phrase to be included anyway, and isn't SOP: indoor football is the name of a sport, with its own rules, it's not only football played indoor. high school student might be considered as a set phrase too, it would not be absurd. But adding small boat, small ... for the purpose of translations to languages such as Dutch, with a heavily used diminutive suffix, does not seem appropriate nor useful. So, yes, but only if a set phrase. Lmaltier 08:32, 23 August 2009 (UTC)
- These are good points. Yet, "high school student" is a sum-of-parts, and set phrases are not included per current WT:CFI, so "high school student" would be a newly included term if translations targets are added to WT:CFI. --Dan Polansky 09:26, 23 August 2009 (UTC)
- Why mentioning two-wheeled? It is already includable with current CFI. Lmaltier 13:21, 23 August 2009 (UTC)
- It is not all that clear that "two-wheeled" is includable per current CFI, given the current request for deletion of "two-wheeled". To me, "two-wheeled" seems rather SoPish.
- Feel free to add to the list above good examples of terms that would be added because of their translation-targetness. --Dan Polansky 14:33, 23 August 2009 (UTC)
- Why mentioning two-wheeled? It is already includable with current CFI. Lmaltier 13:21, 23 August 2009 (UTC)
- These are good points. Yet, "high school student" is a sum-of-parts, and set phrases are not included per current WT:CFI, so "high school student" would be a newly included term if translations targets are added to WT:CFI. --Dan Polansky 09:26, 23 August 2009 (UTC)
None of the examples of SoP terms needed as translation targets are necessary; a high-school student is also a साचा:term (or, specifically, a साचा:term or साचा:term, if you like), साचा:term is idiomatic, साचा:term (written as a single word) is common, a small boat is a साचा:term (or, more predictably, a साचा:term), and something that has two wheels can be called साचा:term. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 15:51, 23 August 2009 (UTC)
- Accepting this proposal would multiply the potential inventory of acceptable words by an order of magnitude. Every inflected verb or noun would suddenly need a dozen or two new English entries created exclusively for it. Agglutinative languages might require English entries like for your (plural) repeated pretending to be undesecratable (Hu. megszentségteleníthetetlenségeskedéseitekért).*
- This is taking glosses which belong in quotation marks and setting them in italics. It is also inviting editors to create entries for 100,000 S-o-P terms, phrases, and whole sentences. This is to increase the load on RFD a dozen-fold.
- There will always be terms which have no synonym in a foreign language. Heck, every language has many regionalisms which have no general equivalent.
- I'd be in favour of some new criterion for accepting “set phrases” or common expressions, but not for pretending that English has direct translations for every term in every language. —Michael Z. 2009-08-23 18:11 z
- I agree, -ish. We should never include a term that no one would ever look up. We can include terms that only a professional translator would expect to be able to look up; and we can include terms that most people would come across via internal links rather than by looking them up directly; but we should not include every series of English words that would be used to translate any foreign word. So, why do I say I only agree "-ish"? Because your comment purports to be objecting to Dan's proposal, but I don't think that is what Dan is proposing. He gives examples of series of English words that would be used to translate certain foreign words, and then labels them very explicitly as possible translation targets. Meaning that his proposal doesn't mandate all such entries. So, where you actually say what you don't think we should do, I agree; but where you object to "this proposal", "this", etc., I don't agree, or I don't know if I do, because I don't know if you're even talking about what you seem to be. —RuakhTALK 19:27, 23 August 2009 (UTC)
- Okay, maybe I misinterpreted it some. But our RFV and RFD pages are already swamped with totally s-o-p phrases. Allowing more entries by criteria that require subjective judgment might be asking for trouble. I'd rather include English phrases by their intrinsic English qualities than because they are handy for reasons involving every language but English. —Michael Z. 2009-08-23 22:23 z
- Am I correct in assuming that the whole point of this proposal is to allow translation tables to exist housing single terms in foreign languages whose English-language æquivalents are SoP phrases? † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 20:23, 23 August 2009 (UTC)
- (indent) An explanation: I am not proposing anything yet. I am trying to execute a descriptive undertaking: to understand the specific and concrete, meaning example-based, impact of the proposal. I am sorry that I have redirected the discussion here; it could have stayed in Beer Parlour. I have created this section so that the topic has its home location, from which it should be possible to link to the discussions in Beer Parlour. The discussions should be easier to find months or years later.
- In any cases, examples of the impact are desperately needed; the above discussion shows that people do as yet agree on what the impact of the proposal would be. And it is the impact or consequences of the proposal that make the proposal good or bad. --Dan Polansky 07:30, 24 August 2009 (UTC)
Voting on clarification at Wiktionary:Votes/pl-2009-08/Clarify names of specific entities
[संपादन]I started a vote, after BeeP discussion, to clarify the wording without changing the meaning of this section. —Michael Z. 2009-08-27 04:54 z
The OED cites Usenet, too.
[संपादन]I find it interesting to note that the OED’s sub-entry for “ˈfelching n.” cites a Usenet newsgroup as its earliest quotation in support of the term; I reproduce it literatim hereat:
- 1989 Re: How can you eat Unwashed Pussy? in alt.sex (Usenet newsgroup) 17 Nov., The story also talks about sucking on the clitoris… But‥I want to read about *felching!
It seems like we’re not the only ones who allow Usenet groups as evidence of attestation… † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 18:08, 18 September 2009 (UTC)
- They also cite plain old websites occasionally, tagging it as something like "OED archive." I understand their thinking, and obviously they have the resources to create their own "durable archives", but it seems kind of lame. -- Visviva 08:26, 21 September 2009 (UTC)
- Why? For better or for worse, we’re past the lexicographical age of restricting our quotations to those from literary magna opera. For a term in frequent current use, whether or not its use is in durably-archived media has nothing to do with whether a person will “run across it and want to know what it means”. Durable archiving is necessary solely for lexicographical verification. If a particular website coined or popularised a term, or represents the earliest recorded instance of its use, then it seems entirely appropriate to quote it as such; and if it isn’t durably archived, then it also seems entirely appropriate to durably archive it oneself, be that in the form of a printed screen-capture or whatever. In the continuum of descriptivist ethe, I could scarcely be described as a rabid inclusionist/inclusivist (Which is the better term there? Exclusionist, exclusivist, inclusionist, and inclusivist are all in the OED.), but I don’t see what value durable archiving has other than to facilitate lexicographical verification. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 13:15, 21 September 2009 (UTC)
- Agreed. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 14:16, 21 September 2009 (UTC)
- My view was probably soured by the fact that I first came upon this when researching some dictionary word or other (not sure if it's one I've added to the list yet or not). Their lone bona fide citation for this word, which had been coined in the mid-17th-century and passed from one dictionary to another since, was from a 21st-century German website. It seemed painfully obvious that this was simply the infelicitous choice of a hapless website translator who made the mistake of relying on a German-English dictionary that had copied the word in turn from some earlier dictionary. I would have liked to think that the OED might feel just the slightest twinge of shame for their own role in perpetuating this misinformation.
- But yes, there is certainly a valid use for this. -- Visviva 15:13, 21 September 2009 (UTC)
- Perhaps they felt “just the slightest twinge of shame” for harbouring a “zombie word” based on an argumentum ad verecundiam and wanted to bolster their descriptive credentials by showing the word to be attestable. (That would also explain why so many of these dictionary-only words get tagged
{{obsolete}}
in post–second-edition draft revisions.) It seems to be their policy that once a word is added it never gets thrown out. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 17:14, 25 September 2009 (UTC)
- Perhaps they felt “just the slightest twinge of shame” for harbouring a “zombie word” based on an argumentum ad verecundiam and wanted to bolster their descriptive credentials by showing the word to be attestable. (That would also explain why so many of these dictionary-only words get tagged
- It's also kind of lame in that their Web-site says, for example, “At the moment, because Internet addresses and references can change, texts that exist solely online cannot be used as a source for quotations.”[१] They don't say anything about an OED archive that renders a text non–solely online. —RuakhTALK 12:05, 25 September 2009 (UTC)
- Yeah, they should probably clarify that… † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 17:14, 25 September 2009 (UTC)
What Wiktionary is NOT
[संपादन]I believe we also need some statement of what Wiktionary is not.
Wiktionary is NOT an arbiter of what is suitable english, good english, correct english, grammatical. Like any English dictionary, Wiktionary is merely documenting, explining what is in use in English. It should be sufficient to show that a word or idiom is (or has been) in use, be it common useage or a specific group (such as the medical fraternity).
It seems to me that every time I come to Wiktionary and check through some of the words or idioms proposed for deletion, there are purists using arguments that essentially sets them up as arbiters of what is good, acceptable english.
To quote from WT:RFD#US_American
- "US America" is not a term that I have heard or read and is not plausibly an etymon of "US American". It seems not to matter to this self appointed arbiter that several citations of use are given.
- Acceptability as English is one thing. Suitability for any specific purpose is another.
And neither has anything to do with whether it should be in an English dictionary. No one here is/should be setting themselves as some authority to decide what is acceptable, what is suitable. Wiktionary should only be concerned with what is and is not used. If you want to decide what is acceptable or suitable use, or "Linguisitically Correct" you should go join the French Academy (or similar). The role of Wiktionary is not to decide any such thing. Is it used? Is there reasonable evidence of its use? There is. End of argument.
see also WT:TR#chillaxin --Richardb 11:09, 25 September 2009 (UTC)
- You seem to be looking for WT:NOT. A line about "Wiktionary is not prescriptive" would be a useful addition there. But this line seems especially pertinent to the current situation: "Wiktionary is not a battlefield. Every user is expected to interact with others civilly, calmly and in a spirit of cooperation."
- You may also wish to reacquaint yourself with the distinction between idiomaticity and attestation, both of which are discussed at length on the present page. -- Visviva 11:41, 25 September 2009 (UTC)
I agree (anyway, there is no other possible practical option on a wiki if you want to avoid edit wars, this is the NPOV principle). I just want to add that this is true for all languages, not only English. Lmaltier 17:24, 25 September 2009 (UTC)
Clarification Required
[संपादन]The CFI need clarification on one point:-
Attestation. “Attested” means verified through *Clearly widespread use, *Usage in a well-known work, *Appearance in a refereed academic journal, or *Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.
Are those 4 attenstation criteria joined by OR, or by AND.
My personal view is that they should be joined by an OR, so that a term that meets ANY of the criteria, and does not need to meet ALL of the criteria.
I would suggest a change of the paragraph to
“Attested” means verified through meeting ANY of the following conditions *Clearly widespread use, *Usage in a well-known work, *Appearance in a refereed academic journal, or *Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.
I cannot be bothered to mount a campaign or vote on my own. Any agree enough to take it on ? --Richardb 14:31, 1 October 2009 (UTC)
- It may be that the wording could be better, but the "or" reading is how it is applied, without any controversy (about the "or", anyway) in my experience. DCDuring TALK 16:42, 1 October 2009 (UTC)
Blunder needs to be corrected in CFI definition
[संपादन]Someone, at some time, has made a blunder, that has apparently been subsequently accepted by a vote.
Under ==General rule== we find the line-
A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic.
I hate to point out the absurdity, but, if obeyed, this would mean we would have ONLY idioms in Wiktionary !
I propose that the General Rule should be changed to:-
A word should be included if it meets any of the following criteria *Clearly in widespread use, *Used in a well-known work, *Appears in a refereed academic journal, or *Used in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year. (See below under Attestation for clarification of these criteria) A term other than a single word needs to meet the above criteria, and additionally be idiomatic. (See below for Criteria for Idiomaticity)
This change would also remove the disparity between the very loose, almost colloquial general rule (if it's likely that someone would run across it and want to know what it means) and the more formal attestation requirements.
Again, it needs to be changed, but I personally can't make the effort to mount a vote and a campaign. Anyone want to take it on ?--Richardb 14:58, 1 October 2009 (UTC)
- So, you want to get logical about this, eh? To avoid a premature vote on all the wording changes, Wiktionary:Editable CFI has been begun. How that will interact with "official" CFI remains to be seen, but it is likely to be constructive. And it's an easier place to make such suggestions. DCDuring TALK 16:48, 1 October 2009 (UTC)
- Not necessary in this case. It says "if", not "only if". The effect only applies if the condition is true. If the condition is false, then take no action one way or the other. --EncycloPetey 00:02, 3 October 2009 (UTC)
- I think some of the confusion may be due to the different meanings of idiomatic, two of them listed here:
1) Pertaining or conforming to the mode of expression characteristic of a language. 2) Resembling or characteristic of an idiom.
I think CFI is using the first meaning here, not the second. (Please correct me if I'm wrong) Facts707 09:53, 3 February 2010 (UTC)
medical terms policy?
[संपादन]I'm wondering if there is any set policy on medical terms. Many of them are of Latin or Greek origin and the same term is used in many languages (eg: aorta). But bruit (from the French) is commonly used in English speaking medicine to describe a certain heart sound, but that is not mentioned in bruit. Also, foramen ovale is defined but not foramen magnum. Facts707 10:08, 3 February 2010 (UTC)
- MW3 includes "bruit" in medical sense, but without a medical context, suggesting that it is likely worth inclusion. (Citations would be conclusive.) One of the best features of Webster's Third New International Dictionary ("MW3") is its coverage of scientific vocabulary. My print edition has Addenda with mostly technical terms dated as late as 1993. In the main (1961) portion they have five different singular compounds of "foramen" and one of "foramina". They add nothing further in the Addenda.
- Are you asking about Translingual status? I don't think we would make something Translingual until we had evidence that the term was used in a "significant" range of languages. Thus for "bruit" attested usage in running English text, should just lead to an English entry. If it is attestably used in German, French, Swedish, Italian, Russian,I would think that there is a case for Translingual. I haven't seen any particular shortcut to Translingual status for medical terms. Perhaps if multiple medical dictionaries declared it something like "International Scientific Vocabulary" ("ISV") as MW3 does with many entries, but not "bruit". DCDuring TALK 11:52, 3 February 2010 (UTC)
"Not a sum of parts" - proposed entry
[संपादन]We need an entry in "General rule" (after “Terms” to be broadly interpreted) to say that an entry shouldn't be included if it is just a sum of parts. For example, party leader should not be included (means "leader of a political party", "leader of an expedition", "leader of a celebration" depending on context), but post office should be, as it means "a place to send or receive mail" and not "a place that manages posts (such as football posts or signposts)" or "a place that manages jobs or positions (I'll put a new guard in that post). Facts707 19:52, 4 March 2010 (UTC)
- Uh, we do. See WT:CFI#Idiomaticity, "An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components.". --Yair rand 20:14, 4 March 2010 (UTC)
Why should we have "Given and family names"? - they are handled better and more completely by Wikipedia
[संपादन]I don't see why we need "Given and family names". Wikipedia has all the same information, but with greater coverage of names included and usually better etymologies and translations. If someone searches for a term that is not found in Wiktionary, the user will see the same term at Wiktionary, plus other related searches.
Compare Smith with w:Smith and w:Smith (surname) for example.
I don't think we should spend our limited resources trying to do something our sister project is already doing, and better.
Likewise, why do we have Seattle, but not Tacoma or Redmond? Or why do we need Tower of London when we have w:Tower of London?
These are not words in the English language, they're historical names and thus belong in Wikipedia unless they have entered the language for some other reason such as an idiom, e.g. Waterloo. Facts707 14:24, 13 May 2010 (UTC)
- The toponym Churchill, for example, is an English word (more precisely, a lexical unit). It has an etymology, an eponymous literal meaning (“church hill”), and it is applied to certain kinds of referents (places and people). We systematically compile such lexicographical information, and a person who just wants to “look it up in the dictionary” needn't read a whole encyclopedia article for it. Wikipedia could (but currently doesn't) have an article about w: Churchill (name), including encyclopedic information which doesn't belong in the dictionary. There is also an open question of whether we should include onomastic information, more specific to names than to other words.
- Of course, the person of w: Winston Churchill and the town of w: Churchill, Manitoba, are not words or names, so we shouldn't be duplicating Wikipedia's efforts by “defining” them here.
- As you may have noticed, place names are not accounted for by our guidelines, and there have been many discussions and proposals regarding them over recent months, but none has yet achieved concensus. —Michael Z. 2010-05-13 19:14 z
- For an academic justification for proper names in dictionaries, see Mufwene (1988), “Dictionaries and Proper Names,” in International Journal of Lexicography, v 1, n 3, p 268. —Michael Z. 2010-05-13 19:16 z
- By the way, there's no deadline, so our resources are effectively unlimited. As long as we define concrete limits on the scope of the project, by our wt: CFI, then it will remain doable. —Michael Z. 2010-05-13 19:19 z
- Wikipedia's articles on names have none of the same goals as Wiktionary's. Wiktionary includes pronunciation, etymology (from a linguistic standpoint), translations, inflections (for non-English names), and other information, basically the same kind of things as for words. Names fit perfectly into the mission, I can't see any reason not to have them. Wiktionary includes information about words, Wikipedia covers concepts. Thus, information such as that Seattle in American Sign Language is S@NearSide-PalmForward Sidetoside and audio pronunciations of the word exist in Wiktionary, and information about the things themselves belong in Wikipedia. There is little (if any) overlap. --Yair rand 19:26, 13 May 2010 (UTC)
Permanently recorded media
[संपादन]For a discussion of "permanently recorded media", see also Wiktionary talk:Searchable external archives, and #The OED cites Usenet, too. --Dan Polansky 16:05, 19 May 2010 (UTC)
Discussions:
- Beer parlour: What is Usenet?, September 2010
--Dan Polansky 12:04, 27 September 2010 (UTC)
Durably archived source
[संपादन]See #Permanently recorded media. --Dan Polansky 16:03, 19 May 2010 (UTC)
compare to Wikipedia
[संपादन]I added this at साचा:w, but Wiktionary's CFI is locked:
"It is similar in basic concept, but has vastly different criteria from, the criteria for inclusion (CFI) on the Wiktionary project."साचा:unsigned
- The basic concept is different: Wikipedia has articles on various topics, things, ideas, people, places, etc, while Wiktionary's entries are only about terms, names, proverbs. —Michael Z. 2010-05-25 21:45 z
WT:Phrasebook not mentioned
[संपादन]While this page does mention the term "phrasebook", it does not explain what to do with these types of phrases, nor does it mention WT:Phrasebook. Facts707 21:36, 28 May 2010 (UTC)
- Nobody yet has any inspiring vision for the Phrasebook and accordingly we don't have criteria either. Some think a phrasebook is sufficiently distinct from a dictionary that it should be a separate project. Some think it must be part of Wiktionary. Some think we should have a limited experiment. Some think it should have a separate namespace within Wiktionary. Some think that we should have a sex-tourism phrasebook as it is a neglected area in print phrasebooks. Other are offended or think it risks making us a laughing stock or placing us on blocked-site lists. In the meantime, I would not hesitate to add any phrase that is actually in a contemporary phrasebook. DCDuring TALK 15:50, 10 July 2010 (UTC)
Láadan
[संपादन]Shouldn't Láadan be moved to "languages whose origin and use are restricted to one or more related literary works and its fans"? The complete use of the language is pretty much restricted to w:Native Tongue (Suzette Haden Elgin novel). And while I'm at it, shouldn't we remove Orcish from that list? w:Orcish doesn't even mention a language, and while I'm sure there have been many unnotable proto-languages named Orcish, it's not a real language of any note. And even further, why don't we delete Delason, Glos, Jakelimotu, Kyerepon, Latejami, Linga, Sasxsek, Suoczil, and Tceqli from the list? None of them have Wiktionary entries, and I'm somewhat familiar with the field and don't recognize any of them. We can't exhaustively list constructed languages, so why mention a bunch of unnotable ones?--Prosfilaes 15:01, 10 July 2010 (UTC)
- Those seem like good changes. I haven't heard of anyone arguing for the inclusion of any of those languages. And none of the languages listed at the end even have Wikipedia pages. Bring it up in the WT:Beer parlour to hopefully others will agree. --Bequw → τ 23:55, 10 July 2010 (UTC)
Generic use
[संपादन]The attributive rule got voted out, but I didn't object to its idea, just its wording. It's was as badly worded as can be imagined. It would be nice to add it back in a new form, that is fully explaining what it means. A few points
- Most important IMO and most potentially controversial, specific entities should not 'require' generic use, but generic use should be one way for an entry to pass. Therefore if Late Latin isn't used generically, it won't be deleted.
- Uncontroversially, the wording should be precise and leave as little room for doubt as possible. For example, attributive use could mean grammatical attributive use. So David Beckham haircut would be attributive use of David Beckham to modify haircut. Generic use, IMO, should be a meaning other than the primary one. So Billy Elliot would pass because of three citations of 'a Billy Elliot' referring to a young male dancer. All three citations would have to back up the same meaning, not just any meaning. Mglovesfun (talk) 08:38, 4 August 2010 (UTC)
Can you give some context please. What part of the CFI are you proposing to modify? What do you want to see included/excluded that isn't currently? Why? You appear to be talking about both generic use and attributive use, (although it's not clear what the uses are of) yet the section title is just generic use? Can you link to the vote in question so we can see what it was about and what the wording was? Have you got any specific wording in mind or is this just a statement of desire for someone to do something about something? Thryduulf (talk) 09:27, 4 August 2010 (UTC)
- Wiktionary:Votes/pl-2010-05/Names of specific entities. Now that we don't have an attributive use rule, I'd like a generic use rule. I'll try and work out some wording when I have time. Mglovesfun (talk) 10:22, 5 August 2010 (UTC)