Clean Up Web Jargon

By JoeSimmons Last update May 27, 2009 — Installed 401 times.

Nice script

in
Subscribe to Nice script 8 posts, 2 voices



zornn User

This is a really cool script, but there are a few bugs:

It only capitalises a letter after a period, exclamation point or question mark (I set caps_regex = /[.!?]) once in every line. For example:
"This. is. some! useless? text" would only capitalise "Is," everything else would be ignored.

If text is replaced, the entire sentence will be highlighted instead of only the replaced word.

After tags such as <b>bold</b> text and others, the text after the closing tag, in this example "text" (or "and" in the screenshot due to the code tag), will be capitalised. Can this be fixed?

Here's what I mean: http://i39.tinypic.com/15eivb9.png

I have also added:

'form','select','style','span','noscript','s','big','strong','small','strike','acronym','abbr','cite','em','del','ins','dfn','var','tt','kbd','samp','q','sub','sup','label','fieldset','legend','optgroup','option','h1','h2','h3','h4','h5','h6','dt','dd','td','th','li'
to the badTags variable and removed 'blockquote'.

Could text in 'a' and 's' tags also not get capitalised?

 
zornn User

Also, how about adding a feature to remove all the excessive exclamation points, question marks, ellipses etc.? "!!!!!!=!", "???????=?", "!?!?!?!?=?!", "!!!11one!1=!", "........=...", "loooooooool=lol". I find these to be very annoying.

Also an array of words after which capitalization should not occur, like Latin phrases (i.e., i.c., ...) and other custom ones.

 
JoeSimmons Script's Author

Good idea. Will add that in.

 
zornn User

Cool, thanks.

Here's some more words I added in:

// Capitalization and Slang
'i' : 'I',
'(u|ya)' : 'you',
'y' : 'why',
'k' : 'okay',
'(y?ur|yu?r|yer)' : 'your',
'(y?ur[sz]|yu?r[sz]|yer[sz])' : 'your\'s',
'pl[ea]{0,2}(s|z)' : 'please',
'(thank?[xz]|t[nh]x)' : 'thanks',
'srr?y' : 'sorry',
'lemme' : 'let me',
'(yea?h|yep|uh-?huh|m{1,2}-?hm)' : 'yes',
'nope' : 'no',
'sec' : 'second',
'kinda' : 'kind of',
'gimme' : 'give me',
'whaddya' : 'what are you',
'wanna' : 'want to',
'b4' : 'before',
'im' : 'I\'m',
'dunno' : 'don\'t know',
'gotta' : 'have got to',
'gonna' : 'going to',
'dey' : 'they',
'i ain\'?t' : 'i am not',
'you ain\'?t' : 'you are not',
'he ain\'?t' : 'he is not',
'she ain\'?t' : 'she is not',
'it ain\'?t' : 'it is not',
'that ain\'?t' : 'that is not',
'those ain\'?t' : 'those are not',
'these ain\'?t' : 'these are not',
'[\'d]?em' : 'them',
'dat' : 'that',
'dis' : 'this',
'(w[au]t|whut|wha)' : 'what',
'srs' : 'serious',
'srsly' : 'seriously',
'bsns' : 'business',
'hai' : 'hi',
'bai' : 'bye',
'ppl' : 'people',
'pwn' : 'own',
'(pwned|[op]wn[dt])' : 'owned',
'fix\'?[dt]' : 'fixed',
'shoop' : 'photoshopped image',
'aussie' : 'Australian',
'azn' : 'Asian',
'suk' : 'suck',
'fuk' : 'fuck',
'mah' : 'my',
'jus' : 'just',
'usa' : 'USA',
'nao' : 'now',
'uk' : 'UK',
'chillax' : 'relax',
'orly' : 'Oh really',
'rly' : 'really',
'(liek|lyke?)' : 'like',
'm[ao]mma' : 'mother',
'(sup|wh?a[sz]{1,20}up)' : 'what\'s up',
'w8' : 'wait',
'h8' : 'hate',
'm8' : 'mate',
'sk8' : 'skate',
'(l8rz?|laterz)' : 'later',
'idd' : 'indeed',
'tho' : 'though',
'tard' : 'retard',
'tards' : 'retards',
'(gu[iy]ze?|gui[sz]|guyse)' : 'guys',
'(bro(tha)?|brah)' : 'brother',
'bros' : 'brothers',
'sis' : 'sister',
'(b?cuz)|(b\/c(uz)?)' : 'because',
'l[auo]lz': 'laughs',
'(teh|tha|da)' : 'the',
'(cu|cya)' : 'see you',
'pic' : 'picture',
'pics' : 'pictures',
'pic\'s' : 'picture\'s',
'me thinks' : 'I think',
'me goes' : 'I\'m going',

// *ing & apostrophe
'nothin\'?' : 'nothing',
'comin\'?' : 'coming',
'havin\'?' : 'having',
'goin\'?' : 'going',
'doin\'?' : 'doing',
'feelin\'?' : 'feeling',
'touchin\'?' : 'touching',
'eatin\'?' : 'eating',
'drinkin\'?' : 'drinking',
'sleepin\'?' : 'sleeping',
'restin\'?' : 'resting',
'fuckin\'?' : 'fucking',
'kickin\'?' : 'kicking',
'hittin\'?' : 'hitting',
'playin\'?' : 'playing',
'killin\'?' : 'killing',
'shittin\'?' : 'shitting',
'bitchin\'?' : 'bitching',
'freakin\'?' : 'freaking',
'shakin\'?' : 'shaking',
'bakin\'?' : 'baking',
'takin\'?' : 'taking',
'makin\'?' : 'making',
'fakin\'?' : 'faking',
'smokin\'?' : 'smoking',
'pissin\'?' : 'pissing',
'kissin\'?' : 'kissing',
'fishin\'?' : 'fishing',
'bitin\'?' : 'biting',
'dancin\'?' : 'dancing',
'shootin\'?' : 'shooting',
'buyin\'?' : 'buying',
'buildin\'?' : 'building',
'seein\'?' : 'seeing',
'watchin\'?' : 'watching',
'hearin\'?' : 'hearing',
'sayin\'?' : 'saying',
'singin\'?' : 'singing',
'smellin\'?' : 'smelling',
'drivin\'?' : 'driving',
'walkin\'?' : 'walking',
'runnin\'?' : 'running',
'lovin\'?' : 'loving',
'hatin\'?' : 'hating',
//...become *ing' (with an extra ') - bug
'an\'?' : 'and',
'\'n' : 'and',

// General Misspellings
'defin[ae]te?ly' : 'definitely',
'grammer' : 'grammar',
'awsome' : 'awesome',
'would of' : 'would have',
'should of' : 'should have',
'could of' : 'could have',
'youre' : 'you\'re',
'youll' : 'you\'ll',
'whats' : 'what\'s',
'thats' : 'that\'s',
'theyre' : 'they\'re',
'hes' : 'he\'s',
'shes' : 'she\'s',
'dont' : 'don\'t',
'isnt' : 'isn\'t',
'arent' : 'aren\'t',
'wont' : 'won\'t',
'mustnt' : 'mustn\'t',
'cant' : 'can\'t',
'didnt' : 'didn\'t',
'doesnt' : 'doesn\'t',
'shouldnt' : 'shouldn\'t',
'couldnt' : 'couldn\'t',
'havent' : 'haven\'t',
'hasnt' : 'hasn\'t',
'hadnt' : 'hadn\'t',
'wouldnt' : 'wouldn\'t',
'wasnt' : 'wasn\'t',
'werent' : 'weren\'t',
'neednt' : 'needn\'t',


// Acronyms
'brb' : 'be right back',
'w\/ ' : 'with ',
'w\/o' : 'without',
'w\/e' : 'whatever',
'j\/k' : 'just kidding',
'irl' : 'in real life',
'hf' : 'have fun',
'gl' : 'good luck',
'idk' : 'I don\'t know',
'ty' : 'thank you',
'tyt' : 'take your time',
'bbl' : 'be back later',
'btw' : 'by the way',
'fyi' : 'for your information',
'imo' : 'in my opinion',
'imho' : 'in my humble opinion',
'ttyl' : 'talk to you later',
'afk' : 'away from keyboard',
'iirc' : 'if I recall correctly',
'afaik' : 'as far as I know',
'afaic' : 'as far as I\'m concerned',

// Leet speak
'r[o0]{2}l' : 'rule',
'(k[o0]{2}l|kewl)' : 'cool',
'l33t' : 'elite',
'(n[o0]{2}b|newb)' : 'newbie',
'(n[o0]{2}b[sz]|newb[sz])' : 'newbies',
'h[a4]x[o0]r' : 'hacker',
'h[a4]x[o0]r[sz]' : 'hackers',

 
JoeSimmons Script's Author

Thanks for the list. Gonna modify it a bit and fix some other things soon.

 
zornn User

No problem. One more bug I think; it doesn't replace text if it's capitalised: http://i40.tinypic.com/14w3wj6.png

 
JoeSimmons Script's Author

Hmm, it should, I made the regex's case insensitive.

 
zornn User
FirefoxWindows

This is weird.

Here are some examples:

1.

Capitalized text. brb.
becomes:
Capitalized text. Be right back.

2.
Capitalized text. brb. brb.
becomes:
Capitalized text. Be right back. be right back.

3.
Capitalized text. Brb.
remains the same.

4.
Capitalized text. Brb. Capitalized text.
remains the same.

5.
Capitalized text. Brb. uncapitalized text.
becomes:
Capitalized text. Be right back. uncapitalized text.

6.
Capitalized text. Brb tnx.
becomes:
Capitalized text. Be right back thanks.

7.
Capitalized text. Brb tnx.
remains the same.

8.
uncapitalized text. Brb.
becomes:
Uncapitalized text. Be right back.

9.
uncapitalized text. brb. uncapitalized text.
becomes:
Uncapitalized text. Be right back. uncapitalized text.

10.
Capitalized text. Brb tnx. uncapitalized text.
becomes:
Capitalized text. Be right back thanks. uncapitalized text.

11.
Capitalized text. uncapitalized text.
becomes:
Capitalized text. Uncapitalized text.

12.
Capitalized text. uncapitalized text. uncapitalized text.
becomes:
Capitalized text. Uncapitalized text. uncapitalized text.

Whether the script capitalizes a letter or changes another lowercase word (in this case "tnx") prior or after a capitalized word that should be changed, in this case "Brb," seems to be a factor in whether the capitalized word ("Brb") will actually change or not. It also capitalizes a word only once in a line as you can see in examples 2, 5, 10 and 12 (or twice if the first letter is lowercase as in example 8 and 9).

I've also added and changed some words in the list above.

Cross
Presentational HTML allowed.
Use <code> for inline code and <pre> for code blocks. Use &lt; and &gt; for literal < and >.
We help break paragraphs and link your links.
or cancel