<div class='toolbar' role='navigation' macro='toolbar [[ToolbarCommands::ViewToolbar]]'></div>
<div class='title' macro='view title'></div>
<div class='subtitle'><span macro='view modifier link'></span>, <span macro='view modified date'></span> (<span macro='message views.wikified.createdPrompt'></span> <span macro='view created date'></span>)</div>
<div class='tagging' macro='tagging'></div>
<div class='tagged' macro='tags'></div>
<div class='viewer' macro='view text wikified'></div>
<div class='tagClear'></div>

> Alpage project team develops and maintains a full linguistic processing chain for French (try our online demo) . This chain is based on [[DyALog|]], [[FRMG|]], [[Lefff|]], [[SxPipe|]].

*  ''~DyALog'' is an environment to compile and run logic programs and natural language tabular parsers for various grammatical formalisms (DCGs, TAGs, TIGs, RCGs).
*   ''frmg'' is a a French grammar generated from a MetaGrammar and compiled with DyALog.
*   ''Lefff'' (Lexique des Formes Fléchies du Français / Lexicon of French inflected forms) is a large-scale morphological and syntactic lexicon for French, distributed under the LGPL-LR free software license (Lesser General Public License For Linguistic Resources).
*   ''sxpipe'' is a pre-parsing processing chain that handles segmentation and tokenization, spelling error correction, and named entities regognition. It is designed to transform in a robust way raw corpus to DAG of lexical entries.
!Parsing French using the MaltParser

Combination de [[MElt|MElt (ALPAGE Linguistic Workbench)]] et de [[MaltParser]] pour l'analyse de texte brut en français.
;Input :
: texte brut, ''utf-8''
;Output :
:the 10 usual [[CoNLL|CoNLL data format]] columns, plus an extra column for word cluster ids (between the 6th and 7th usual CoNLL columns)

*    [[MaltParser|]] version __1.3.1__, developed by Johan Hall, Jens Nilsson and Joakim Nivre at Växjö University and Uppsala University, Sweden. (Note it won't work with later malt versions)
*    the [[MElt|]] tagger (downloadable [[here|]]), developed by Pascal Denis & Benoît Sagot, ([[Coupling an annotated corpus and a morphosyntactic lexicon for state-of-the-art POS tagging with less human effort|]]. In Proceedings of PACLIC 2009, Hong Kong, China).
*    download and unzip the [[BONSAI|]] v3.2 archive new!, to get preprocessing code, and malt model and settings (best Malt model according to benchmark : uses predicted POS, predicted lemmas, predicted morpho features, and unsupervised word clusters). Note preprocessing code requires:
**        perl and python >2.5
**        [[python-cjson|]], to install with : python install 
*    Set the MALT_DIR variable to your local path to Malt 1.3.1
*    Set the BONSAI variable to your local path to BONSAI v3.2 

Parsing command
The following command will preprocess and parse a raw UTF-8 text file INFILE into INFILE.outmalt :
{{{$BONSAI/bin/ [-n] INFILE}}}

Use -n option if your text is already tokenized

Note : The output format is (almost) CoNLL : the 10 usual CoNLL columns, plus an extra column for word cluster ids (between the 6th and 7th usual CoNLL columns).
Note : newlines in input text are systematically interpreted as sentence frontiers.
Tagset used by : <<list filter [tag[tags:CC]]>>

The current tagset used by [[MElt|MElt (ALPAGE Linguistic Workbench)]] is as follows (Crabbé & Candito, 2008)
Crabbé, Candito 08 : [[Expériences d'analyse syntaxique du français|]], in Actes de TALN 2008 (Traitement automatique des langues naturelles), Avignon.

Optimisation pour l'analyse syntaxique à 29 tags, des 34 sous-catégories du [[FTB|French TreeBank POS Tags]]

|ADJ  |   adjective |
|ADJWH |   interrogative adjective |
|ADV |   adverb |
|ADVWH |   interrogative adverb |
|CC |   coordination conjunction |
|CLO |   object clitic pronoun |
|CLR |   reflexive clitic pronoun |
|CLS |   subject clitic pronoun |
|CS |   subordination conjunction |
|DET |   determiner |
|DETWH |   interrogative determiner |
|ET |   foreign word |
|I |   interjection |
|NC |   common noun |
|NPP |   proper noun |
|P |   preposition |
|P+D |   preposition+determiner amalgam |
|P+PRO |   prepositon+pronoun amalgam |
|PONCT |   punctuation mark |
|PREF |   prefix |
|PRO |   full pronoun |
|PROREL |   relative pronoun |
|PROWH |   interrogative pronoun |
|V |   indicative or conditional verb form |
|VIMP |   imperative verb form |
|VINF |   infinitive verb form |
|VPP |   past participle |
|VPR |   present participle |
|VS |   subjunctive verb form |
analyse syntaxique de surface
Data adheres to the following rules:
* Data files contain sentences separated by a blank line.
* A sentence consists of one or tokens, each one starting on a new line.
* A token consists of ten fields described in the table below. Fields are separated by a single tab character. Space/blank characters are not allowed in within fields
* All data files will contains these ten fields, although only the ID, FORM, CPOSTAG, POSTAG, HEAD and DEPREL columns are guaranteed to contain non-dummy (i.e. non-underscore) values for all languages.
* Data files are UTF-8 encoded (Unicode). If you think this will be a problem, have a look [[here|]].
|Field number: |Field name: |Description: |h
|1 |ID |Token counter, starting at 1 for each new sentence. |
|2 |FORM |Word form or punctuation symbol. |
|3 |LEMMA |Lemma or stem (depending on particular data set) of word form, or an underscore if not available. |
|4 |CPOSTAG |Coarse-grained part-of-speech tag, where tagset depends on the language. |
|5 |POSTAG |Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available. |
|6 |FEATS |Unordered set of syntactic and/or morphological features (depending on the particular language), separated by a vertical bar (&#124;), or an underscore if not available. |
|7 |HEAD |Head of the current token, which is either a value of ID or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero. |
|8 |DEPREL |Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply 'ROOT'. |
|9 |PHEAD |Projective head of current token, which is either a value of ID or zero ('0'), or an underscore if not available. Note that depending on the original treebank annotation, there may be multiple tokens an with ID of zero. The dependency structure resulting from the PHEAD column is guaranteed to be projective (but is not available for all languages), whereas the structures resulting from the HEAD column will be non-projective for some sentences of some languages (but is always available). |
|10 |PDEPREL |Dependency relation to the PHEAD, or an underscore if not available. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply 'ROOT'.  |

[[Main Page]]
[[POS tagger links]]
<div class='toolbar' macro='toolbar [[ToolbarCommands::EditToolbar]]'</div>
<div class='title' macro='view title'></div>
<div class='editor' macro='edit title'></div>
<div macro='annotations'></div>
<div class='editor' macro='edit text'></div>
<div class='editor' macro='edit tags'></div><div class='editorFooter'><span macro='message views.editor.tagPrompt'></span><span macro='tagChooser excludeLists'></span></div>
|''Description:''|//create//, //edit//, //view// and //delete// commands in toolbar <<toolbar fields>>.|
|''Date:''|Dec 21,2007|
|''Author:''|Pascal Collin|
|''License:''|[[BSD open source license|License]]|
|''Browser:''|Firefox 2.0; InternetExplorer 6.0, others|
On [[homepage|]], see [[FieldEditor example]]
*import this tiddler from [[homepage|]] (tagged as systemConfig)
*save and reload
*optionnaly : add the following css text in your StyleSheet : {{{#popup tr.fieldTableRow td {padding:1px 3px 1px 3px;}}}}


config.commands.fields.handlePopup = function(popup,title) {
	var tiddler = store.fetchTiddler(title);
	var fields = {};
	store.forEachField(tiddler,function(tiddler,fieldName,value) {fields[fieldName] = value;},true);
	var items = [];
	for(var t in fields) {
		var editCommand = "<<untiddledCall editFieldDialog "+escape(title)+" "+escape(t)+">>";
		var deleteCommand = "<<untiddledCall deleteField "+escape(title)+" "+escape(t)+">>";
		var renameCommand = "<<untiddledCall renameField "+escape(title)+" "+escape(t)+">>";
		items.push({field: t,value: fields[t], actions: editCommand+renameCommand+deleteCommand});
	items.sort(function(a,b) {return a.field < b.field ? -1 : (a.field == b.field ? 0 : +1);});
	var createNewCommand = "<<untiddledCall createField "+escape(title)+">>";
	items.push({field : "", value : "", actions:createNewCommand });
	if(items.length > 0)

config.commands.fields.listViewTemplate = {
	columns: [
		{name: 'Field', field: 'field', title: "Field", type: 'String'},
		{name: 'Actions', field: 'actions', title: "Actions", type: 'WikiText'},
		{name: 'Value', field: 'value', title: "Value", type: 'WikiText'}
	rowClasses: [
			{className: 'fieldTableRow', field: 'actions'}
	buttons: [	//can't use button for selected then delete, because click on checkbox will hide the popup

config.macros.untiddledCall = {  // when called from listview, tiddler is unset, so we need to pass tiddler as parameter
	handler : function(place,macroName,params,wikifier,paramString) {
		var macroName = params.shift();
		if (macroName) var macro = config.macros[macroName];
		var title = params.shift();
		if (title) var tiddler = store.getTiddler(unescape(title));
		if (macro) macro.handler(place,macroName,params,wikifier,paramString,tiddler);		

config.macros.deleteField = {
	handler : function(place,macroName,params,wikifier,paramString,tiddler) {
		if(!readOnly && params[0]) {
			fieldName = unescape(params[0]);
			var btn = createTiddlyButton(place,"delete", "delete "+fieldName,this.onClickDeleteField);
			btn.setAttribute("fieldName", fieldName);
	onClickDeleteField : function() {
		var title=this.getAttribute("title");
		var fieldName=this.getAttribute("fieldName");
		var tiddler = store.getTiddler(title);
		if (tiddler && fieldName && confirm("delete field " + fieldName+" from " + title +" tiddler ?")) {
			delete tiddler.fields[fieldName];
		return false;

config.macros.createField = {
	handler : function(place,macroName,params,wikifier,paramString,tiddler) {
		if(!readOnly) {
			var btn = createTiddlyButton(place,"create new", "create a new field",this.onClickCreateField);
	onClickCreateField : function() {
		var title=this.getAttribute("title");
		var tiddler = store.getTiddler(title);
		if (tiddler) {
			var fieldName = prompt("Field name","");
			if (store.getValue(tiddler,fieldName)) {
				window.alert("This field already exists.");
			else if (fieldName) {
				var v = prompt("Field value","");
		return false;

config.macros.editFieldDialog = {
	handler : function(place,macroName,params,wikifier,paramString,tiddler) {
		if(!readOnly && params[0]) {
			fieldName = unescape(params[0]);
			var btn = createTiddlyButton(place,"edit", "edit this field",this.onClickEditFieldDialog);
			btn.setAttribute("fieldName", fieldName);
	onClickEditFieldDialog : function() {
		var title=this.getAttribute("title");
		var tiddler = store.getTiddler(title);
		var fieldName=this.getAttribute("fieldName");
		if (tiddler && fieldName) {
			var value = tiddler.fields[fieldName];
			value = value ? value : "";
			var lines = value.match(/\n/mg);
			lines = lines ? true : false;
			if (!lines || confirm("This field contains more than one line. Only the first line will be kept if you edit it here. Proceed ?")) {
				var v = prompt("Field value",value);
		return false;

config.macros.renameField = {
	handler : function(place,macroName,params,wikifier,paramString,tiddler) {
		if(!readOnly && params[0]) {
			fieldName = unescape(params[0]);
			var btn = createTiddlyButton(place,"rename", "rename "+fieldName,this.onClickRenameField);
			btn.setAttribute("fieldName", fieldName);
	onClickRenameField : function() {
		var title=this.getAttribute("title");
		var fieldName=this.getAttribute("fieldName");
		var tiddler = store.getTiddler(title);
		if (tiddler && fieldName) {
			var newName = prompt("Rename " + fieldName + " as ?", fieldName);
			if (newName) {
				delete tiddler.fields[fieldName];
		return false;

config.shadowTiddlers.StyleSheetFieldsEditor = "/*{{{*/\n";
config.shadowTiddlers.StyleSheetFieldsEditor += ".fieldTableRow td {padding : 1px 3px}\n";
config.shadowTiddlers.StyleSheetFieldsEditor += ".fieldTableRow .button {border:0; padding : 0 0.2em}\n";
config.shadowTiddlers.StyleSheetFieldsEditor +="/*}}}*/";
store.addNotification("StyleSheetFieldsEditor", refreshStyles);


The FreeLing package consists of a library providing language analysis services.

FreeLing is designed to be used as an external library from any application requiring this kind of services. Nevertheless, a simple main program is also provided as a basic interface to the library, which enables the user to analyze text files from the command line.

Main services offered by FreeLing library:
*    Text tokenization
*    Sentence splitting
*    Morphological analysis
*    Suffix treatment, retokenization of clitic pronouns
*    Flexible multiword recognition
*    Contraction splitting
*    Probabilistic prediction of unkown word categories
*    Named entity detection
*    Recognition of dates, numbers, ratios, currency, and physical magnitudes (speed, weight, temperature, density, etc.)
*    PoS tagging
*    Chart-based shallow parsing
*    Named entity classification
*    WordNet based sense annotation and disambiguation
*    Rule-based dependency parsing
*    Nominal correference resolution

Currently supported languages are Spanish, Catalan, Galician, Italian, English, Russian, Portuguese, Welsh and Asturian. See the [[user manual|]] for more information about which services are available for each language.
Tagset used by : <<list filter [tag[tags:FTB]]>>

14 catégories, 34 sous-catégories (optimisé à 29 étiquettes : [[CC Tagset]])

|>|ABR |abreviation |
|>|ADJ |adjective |
|>|ADV |adverb |
|>|DET ||
| |DET:ART |article |
| |DET:POS |possessive pronoun (ma, ta, ...) |
|>|INT |interjection |
|>|KON |conjunction |
|>|NAM |proper name |
|>|NOM |noun |
|>|NUM |numeral |
|>|PRO |pronoun |
| |PRO:DEM |demonstrative pronoun |
| |PRO:IND |indefinite pronoun |
| |PRO:PER |personal pronoun |
| |PRO:POS |possessive pronoun (mien, tien, ...) |
| |PRO:REL |relative pronoun |
|>|PRP |preposition |
| |PRP:det |preposition plus article (au,du,aux,des) |
|>|PUN |punctuation |
| |PUN:cit |punctuation citation |
|>|SENT |sentence tag |
|>|SYM |symbol |
|>|VER ||
| |VER:cond |verb conditional |
| |VER:futu |verb futur |
| |VER:impe |verb imperative |
| |VER:impf |verb imperfect |
| |VER:infi |verb infinitive |
| |VER:pper |verb past participle |
| |VER:ppre |verb present participle |
| |VER:pres |verb present |
| |VER:simp |verb simple past |
| |VER:subi |verb subjunctive imperfect |
| |VER:subp |verb subjunctive present |
<<search>><<permaview>><<toggleSideBarTB right hide>>

;Input :
:tokenized text : one token per line, empty lines are sentence separators.
;Output/Trainning data :
:adding the predicted part-of-speech tag to the end of each line, separated by a TAB
:{{{FORM POS}}}
;Tag sets :
:EN: [[Wall Street Journal (WSJ) Tagset]]
:HU: the Szeged Corpus (

> Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants.

* Free and open source, even for commercial use. 
* For languages with more complex morphologies, HMM tagging could be quite competitive with the current generation of learning algorithms applying e.g. SVM and CRF methods. A major advantage is that the training/tagging cycle is orders of magnitude faster than in more complex models. 
* Precision of tagging on unknown and unseen words was a major priority for us during the development of hunpos. 
* Works smoothly with large tag sets. For example in Hungarian, as in other highly inflecting languages, it is important to preserve detailed morphological information in the POS tags in order to provide useful clues for higher level processing tasks. This leads to a significantly larger tagset than is common in English (744 tags here as opposed to the 36 standardly used in Treebank work), but does not degrade training and tagging performance. Although it would make the training process of non-generative models computationally expensive. 
* Effortless integration of knowledge from morphological analyzers/dictionaries into best path calculation. 
* Contextualized lexical probabilities with a context window of any size. Unlike traditional HMM models, HunPos estimates emission (lexical) probabilities based on the current tag and previous tags as well. 
* Hunpos has been implemented in OCaml, a high-level language which supports a succinct, well-maintainable coding style. OCaml has a high-performance compiler that produces native code with speed comparable to C/C++ implementations.
|Author|Eric Shulman|
|Description|adds support for resizing images|
This plugin adds optional syntax to scale an image to a specified width and height and/or interactively resize the image with the mouse.
The extended image syntax is:
where ''(w,h)'' indicates the desired width and height (in CSS units, e.g., px, em, cm, in, or %). Use ''auto'' (or a blank value) for either dimension to scale that dimension proportionally (i.e., maintain the aspect ratio). You can also calculate a CSS value 'on-the-fly' by using a //javascript expression// enclosed between """{{""" and """}}""". Appending a plus sign (+) to a dimension enables interactive resizing in that dimension (by dragging the mouse inside the image). Use ~SHIFT-click to show the full-sized (un-scaled) image. Use ~CTRL-click to restore the starting size (either scaled or full-sized).
[<img(21% ,+)[images/meow.gif]]
[<img(13%+, )[images/meow.gif]]
[<img( 8%+, )[images/meow.gif]]
[<img( 5% , )[images/meow.gif]]
[<img( 3% , )[images/meow.gif]]
[<img( 2% , )[images/meow.gif]]
[img(  1%+,+)[images/meow.gif]]
[<img(21% ,+)[images/meow.gif]]
[<img(13%+, )[images/meow.gif]]
[<img( 8%+, )[images/meow.gif]]
[<img( 5% , )[images/meow.gif]]
[<img( 3% , )[images/meow.gif]]
[<img( 2% , )[images/meow.gif]]
[img(  1%+,+)[images/meow.gif]]
2011.09.03 [1.2.3] bypass addStretchHandlers() if no '+' suffix is used (i.e., not resizable)
2010.07.24 [1.2.2] moved tip/dragtip text to config.formatterHelpers.imageSize object to enable customization
2009.02.24 [1.2.1] cleanup width/height regexp, use '+' suffix for resizing
2009.02.22 [1.2.0] added stretchable images
2008.01.19 [1.1.0] added evaluated width/height values
2008.01.18 [1.0.1] regexp for "(width,height)" now passes all CSS values to browser for validation
2008.01.17 [1.0.0] initial release
version.extensions.ImageSizePlugin= {major: 1, minor: 2, revision: 3, date: new Date(2011,9,3)};
var f=config.formatters[config.formatters.findByField("name","image")];
f.handler=function(w) {
	this.lookaheadRegExp.lastIndex = w.matchStart;
	var lookaheadMatch = this.lookaheadRegExp.exec(w.source)
	if(lookaheadMatch && lookaheadMatch.index == w.matchStart) {
		var floatLeft=lookaheadMatch[1];
		var floatRight=lookaheadMatch[2];
		var width=lookaheadMatch[3];
		var height=lookaheadMatch[4];
		var tooltip=lookaheadMatch[5];
		var src=lookaheadMatch[6];
		var link=lookaheadMatch[7];

		// Simple bracketted link
		var e = w.output;
		if(link) { // LINKED IMAGE
			if (config.formatterHelpers.isExternalLink(link)) {
				if (config.macros.attach && config.macros.attach.isAttachment(link)) {
					// see [[AttachFilePluginFormatters]]
					e = createExternalLink(w.output,link);
					e.title = config.macros.attach.linkTooltip + link;
				} else
					e = createExternalLink(w.output,link);
			} else 
				e = createTiddlyLink(w.output,link,false,null,w.isStatic);

		var img = createTiddlyElement(e,"img");
		if(floatLeft) img.align="left"; else if(floatRight) img.align="right";
		if(width||height) {
			var x=width.trim(); var y=height.trim();
			var stretchW=(x.substr(x.length-1,1)=='+'); if (stretchW) x=x.substr(0,x.length-1);
			var stretchH=(y.substr(y.length-1,1)=='+'); if (stretchH) y=y.substr(0,y.length-1);
			if (x.substr(0,2)=="{{")
				{ try{x=eval(x.substr(2,x.length-4))} catch(e){displayMessage(e.description||e.toString())} }
			if (y.substr(0,2)=="{{")
				{ try{y=eval(y.substr(2,y.length-4))} catch(e){displayMessage(e.description||e.toString())} };;
			if (stretchW||stretchH) config.formatterHelpers.addStretchHandlers(img,stretchW,stretchH);
		if(tooltip) img.title = tooltip;

		if (config.macros.attach && config.macros.attach.isAttachment(src))
			src=config.macros.attach.getAttachment(src); // see [[AttachFilePluginFormatters]]
		else if (config.formatterHelpers.resolvePath) { // see [[ImagePathPlugin]]
			if (config.browser.isIE || config.browser.isSafari) {
					return false;
			} else
		w.nextMatch = this.lookaheadRegExp.lastIndex;

	tip: 'SHIFT-CLICK=show full size, CTRL-CLICK=restore initial size',
	dragtip: 'DRAG=stretch/shrink, '

config.formatterHelpers.addStretchHandlers=function(e,stretchW,stretchH) {
	e.statusMsg='width=%0, height=%1';'move';;;
	e.onmousedown=function(ev) { var ev=ev||window.event;
		return false;
	e.onmousemove=function(ev) { var ev=ev||window.event;
		if (this.sizing) {
			var currX=!config.browser.isIE?ev.pageX:(ev.clientX+findScrollX());
			var currY=!config.browser.isIE?ev.pageY:(ev.clientY+findScrollY());
			var newW=(currX-this.offsetLeft)/(this.startX-this.offsetLeft)*this.startW;
			var newH=(currY-this.offsetTop )/(this.startY-this.offsetTop )*this.startH;
			if (this.stretchW) s.width =Math.floor(Math.max(newW,this.minW))+'px';
			if (this.stretchH) s.height=Math.floor(Math.max(newH,this.minH))+'px';
			clearMessage(); displayMessage(this.statusMsg.format([s.width,s.height]));
		return false;
	e.onmouseup=function(ev) { var ev=ev||window.event;
		if (ev.shiftKey) {''; }
		if (ev.ctrlKey)  {;; }
		return false;
	e.onmouseout=function(ev) { var ev=ev||window.event;
		return false;
|Author|Eric Shulman|
|Description|interactive controls for import/export with filtering.|
Combine tiddlers from any two TiddlyWiki documents.  Interactively select and copy tiddlers from another TiddlyWiki source document.  Includes prompting for skip, rename, merge or replace actions when importing tiddlers that match existing titles.  When done, a list of all imported tiddlers is written into [[ImportedTiddlers]].
see [[ImportTiddlersPluginInfo]] for details
!!!!!interactive control panel
<<importTiddlers inline>>
^^(see also: [[ImportTiddlers]] shadow tiddler)^^}}}
2011.02.14 4.6.2 fix OSX error: use picker.file.path
2009.10.10 4.6.1 in createImportPanel, Use {{{window.Components}}} instead of {{{config.browser.isGecko}}} to avoid applying FF3 'file browse' fixup in Chrome.
2009.10.06 4.6.0 added createTiddlerFromFile (import text files)
|please see [[ImportTiddlersPluginInfo]] for additional revision details|
2005.07.20 1.0.0 Initial Release
version.extensions.ImportTiddlersPlugin= {major: 4, minor: 6, revision: 2, date: new Date(2011,2,14)};

// IE needs explicit global scoping for functions/vars called from browser events

// default cookie/option values
if (!config.options.chkImportReport) config.options.chkImportReport=true;

// default shadow definition
config.shadowTiddlers.ImportTiddlers='<<importTiddlers inline>>';

// use shadow tiddler content in backstage panel
if (config.tasks) config.tasks.importTask.content='<<tiddler ImportTiddlers>>' // TW2.2 or above
// backward-compatiblity for TW2.0.x and TW1.2.x
if (config.macros.importTiddlers==undefined) config.macros.importTiddlers={};
if (typeof merge=='undefined') {
	function merge(dst,src,preserveExisting) {
		for(var i in src) { if(!preserveExisting || dst[i] === undefined) dst[i] = src[i]; }
		return dst;
if (config.browser.isGecko===undefined)
	$: function(id) { return document.getElementById(id); }, // abbreviation
	label: 'import tiddlers',
	prompt: 'Copy tiddlers from another document',
	openMsg: 'Opening %0',
	openErrMsg: 'Could not open %0 - error=%1',
	readMsg: 'Read %0 bytes from %1',
        convertUTF8: 'This TW version is %0 (<2.52) => call convertUTF8ToUnicode',
	noConvertUTF8: 'This TW version is %0 (>=2.52) => skip convertUTF8ToUnicode()',
	foundMsg: 'Found %0 tiddlers in %1',
	filterMsg: "Filtered %0 tiddlers matching '%1'",
	summaryMsg: '%0 tiddler%1 in the list',
	summaryFilteredMsg: '%0 of %1 tiddler%2 in the list',
	plural: 's are',
	single: ' is',
	countMsg: '%0 tiddlers selected for import',
	processedMsg: 'Processed %0 tiddlers',
	importedMsg: 'Imported %0 of %1 tiddlers from %2',
	loadText: 'please load a document...',
	closeText: 'close',
	doneText: 'done',
	startText: 'import',
	stopText: 'stop',
	local: true,		// default to import from local file
	src: '',		// path/filename or URL of document to import (retrieved from SiteUrl)
	proxy: '',		// URL for remote proxy script (retrieved from SiteProxy)
	useProxy: false,	// use specific proxy script in front of remote URL
	inbound: null,		// hash-indexed array of tiddlers from other document
	newTags: '',		// text of tags added to imported tiddlers
	addTags: true,		// add new tags to imported tiddlers
	listsize: 10,		// # of lines to show in imported tiddler list
	importTags: true,	// include tags from remote source document when importing a tiddler
	keepTags: true,		// retain existing tags when replacing a tiddler
	sync: false,		// add 'server' fields to imported tiddlers (for sync function)
	lastFilter: '',		// most recent filter (URL hash) applied
	lastAction: null,	// most recent collision button performed
	index: 0,		// current processing index in import list
	sort: ''		// sort order for imported tiddler listbox
// hijack core macro handler
if (config.macros.importTiddlers.coreHandler==undefined)

config.macros.importTiddlers.handler = function(place,macroName,params,wikifier,paramString,tiddler) {
	if (!params[0] || params[0].toLowerCase()=='core') { // default to built in
		if (config.macros.importTiddlers.coreHandler)
	} else if (params[0]=='link') { // show link to floating panel
	} else if (params[0]=='inline') {// show panel as INLINE tiddler content
	} else if (config.macros.loadTiddlers)
		config.macros.loadTiddlers.handler(place,macroName,params); // any other params: loadtiddlers
// Handle link click to create/show/hide control panel
function onClickImportMenu(e) { var e=e||window.event;
	var parent=resolveTarget(e).parentNode;
	var panel=document.getElementById('importPanel');
	if (panel==undefined || panel.parentNode!=parent) panel=createImportPanel(parent);
		anim.startAnimating(new Slider(panel,!isOpen,false,'none'));
	e.cancelBubble = true; if (e.stopPropagation) e.stopPropagation(); return(false);
// Create control panel: HTML, CSS
function createImportPanel(place) {
	var cmi=config.macros.importTiddlers; // abbrev
	var panel=cmi.$('importPanel');
	if (panel) { panel.parentNode.removeChild(panel); }
	if (!cmi.src.length) cmi.src=store.getTiddlerText('SiteUrl')||'';
	if (!cmi.proxy.length) cmi.proxy=store.getTiddlerText('SiteProxy')||'SiteProxy';
	if (window.Components) { // FF3 FIXUP
	return panel;
// process control interactions
function onClickImportButton(which,event) {
	var cmi=config.macros.importTiddlers; // abbreviation
	var list=cmi.$('importList'); if (!list) return false;
	var thePanel=cmi.$('importPanel');
	var theCollisionPanel=cmi.$('importCollisionPanel');
	var theNewTitle=cmi.$('importNewTitle');
	var count=0;
	switch (
		case 'importFromFile':	// show local panel
		case 'importFromWeb':	// show HTTP panel
		case 'importOptions':	// show/hide options panel
		case 'fileImportSource':
		case 'importLoad':		// load import source into hidden frame
			importReport();		// if an import was in progress, generate a report
			cmi.inbound=null;	// clear the imported tiddler buffer
			refreshImportList();	// reset/resize the listbox
			if (cmi.src=='') break;
			// Load document, read it's DOM and fill the list
		case 'importSelectFeed':	// select a pre-defined systemServer feed URL
			var p=Popup.create(which); if (!p) return false;
			var tids=store.getTaggedTiddlers('systemServer');
			if (!tids.length)
				createTiddlyText(createTiddlyElement(p,'li'),'no pre-defined server feeds');
			for (var t=0; t<tids.length; t++) {
				var u=store.getTiddlerSlice(tids[t].title,'URL');
				var d=store.getTiddlerSlice(tids[t].title,'Description');
				if (!d||!d.length) d=store.getTiddlerSlice(tids[t].title,'description');
				if (!d||!d.length) d=u;
						var u=this.getAttribute('url');
			event.cancelBubble = true;
			if (event.stopPropagation) event.stopPropagation();
			return false;
			// create popup with feed list
			// onselect, insert feed URL into input field.
		case 'importSelectAll':		// select all tiddler list items (i.e., not headings)
			importReport();		// if an import was in progress, generate a report
			for (var t=0,count=0; t < list.options.length; t++) {
				if (list.options[t].value=='') continue;
			clearMessage(); displayMessage(cmi.countMsg.format([count]));
		case 'importSelectNew':		// select tiddlers not in current document
			importReport();		// if an import was in progress, generate a report
			for (var t=0,count=0; t < list.options.length; t++) {
				if (list.options[t].value=='') continue;
			clearMessage(); displayMessage(cmi.countMsg.format([count]));
		case 'importSelectChanges':		// select tiddlers that are updated from existing tiddlers
			importReport();		// if an import was in progress, generate a report
			for (var t=0,count=0; t < list.options.length; t++) {
				if (list.options[t].value==''||!store.tiddlerExists(list.options[t].value)) continue;
				for (var i=0; i<cmi.inbound.length; i++) // find matching inbound tiddler
					{ var inbound=cmi.inbound[i]; if (inbound.title==list.options[t].value) break; }
				list.options[t].selected=(inbound.modified-store.getTiddler(list.options[t].value).modified>0); // updated tiddler
			clearMessage(); displayMessage(cmi.countMsg.format([count]));
		case 'importSelectDifferences':		// select tiddlers that are new or different from existing tiddlers
			importReport();		// if an import was in progress, generate a report
			for (var t=0,count=0; t < list.options.length; t++) {
				if (list.options[t].value=='') continue;
				if (!store.tiddlerExists(list.options[t].value)) { list.options[t].selected=true; count++; continue; }
				for (var i=0; i<cmi.inbound.length; i++) // find matching inbound tiddler
					{ var inbound=cmi.inbound[i]; if (inbound.title==list.options[t].value) break; }
				list.options[t].selected=(inbound.modified-store.getTiddler(list.options[t].value).modified!=0); // changed tiddler
			clearMessage(); displayMessage(cmi.countMsg.format([count]));
		case 'importApplyFilter':	// filter list to include only matching tiddlers
			importReport();		// if an import was in progress, generate a report
			if (!cmi.all) // no tiddlers loaded = '0 selected'
				{ displayMessage(cmi.countMsg.format([0])); return false; }
			var hash=cmi.$('importLastFilter').value;
			refreshImportList();	// reset/resize the listbox
		case 'importStart':		// initiate the import processing
			importReport();		// if an import was in progress, generate a report
			if (cmi.index>0) cmi.index=-1; // stop processing
			else cmi.index=importTiddlers(0); // or begin processing
		case 'importClose':		// unload imported tiddlers or hide the import control panel
			// if imported tiddlers not loaded, close the import control panel
			if (!cmi.inbound) {'none'; break; }
			importReport();		// if an import was in progress, generate a report
			cmi.inbound=null;	// clear the imported tiddler buffer
			refreshImportList();	// reset/resize the listbox
		case 'importSkip':	// don't import the tiddler
			var theItem	= list.options[cmi.index];
			for (var j=0;j<cmi.inbound.length;j++)
			if (cmi.inbound[j].title==theItem.value) break;
			var theImported = cmi.inbound[j];
			theImported.status='skipped after asking';			// mark item as skipped'none';
			cmi.index=importTiddlers(cmi.index+1);	// resume with NEXT item
		case 'importRename':		// change name of imported tiddler
			var theItem		= list.options[cmi.index];
			for (var j=0;j<cmi.inbound.length;j++)
			if (cmi.inbound[j].title==theItem.value) break;
			var theImported		= cmi.inbound[j];
			theImported.status	= 'renamed from '+theImported.title;	// mark item as renamed
			theImported.set(theNewTitle.value,null,null,null,null);		// change the tiddler title
			theItem.value		= theNewTitle.value;			// change the listbox item text
			theItem.text		= theNewTitle.value;			// change the listbox item text'none';
			cmi.index=importTiddlers(cmi.index);	// resume with THIS item
		case 'importMerge':	// join existing and imported tiddler content
			var theItem	= list.options[cmi.index];
			for (var j=0;j<cmi.inbound.length;j++)
			if (cmi.inbound[j].title==theItem.value) break;
			var theImported	= cmi.inbound[j];
			var theExisting	= store.getTiddler(theItem.value);
			var theText	= theExisting.text+'\n----\n^^merged from: ';
			theText		+='[['+cmi.src+'#'+theItem.value+'|'+cmi.src+'#'+theItem.value+']]^^\n';
			theText		+='^^'+theImported.modified.toLocaleString()+' by '+theImported.modifier+'^^\n'+theImported.text;
			var theDate	= new Date();
			var theTags	= theExisting.getTags()+' '+theImported.getTags();
			theImported.status   = 'merged with '+theExisting.title;	// mark item as merged
			theImported.status  += ' - '+theExisting.modified.formatString('MM/DD/YYYY 0hh:0mm:0ss');
			theImported.status  += ' by '+theExisting.modifier;'none';
			cmi.index=importTiddlers(cmi.index);	// resume with this item
		case 'importReplace':		// substitute imported tiddler for existing tiddler
			var theItem		  = list.options[cmi.index];
			for (var j=0;j<cmi.inbound.length;j++)
			if (cmi.inbound[j].title==theItem.value) break;
			var theImported     = cmi.inbound[j];
			var theExisting	  = store.getTiddler(theItem.value);
			theImported.status  = 'replaces '+theExisting.title;		// mark item for replace
			theImported.status += ' - '+theExisting.modified.formatString('MM/DD/YYYY 0hh:0mm:0ss');
			theImported.status += ' by '+theExisting.modifier;'none';
			cmi.index=importTiddlers(cmi.index);	// resume with THIS item
		case 'importListSmaller':		// decrease current listbox size, minimum=5
			if (list.options.length==1) break;
		case 'importListLarger':		// increase current listbox size, maximum=number of items in list
			if (list.options.length==1) break;
		case 'importListMaximize':	// toggle listbox size between current and maximum
			if (list.options.length==1) break;
config.macros.importTiddlers.showPanel=function(place,show,skipAnim) {
	if (typeof place=='string') var place=document.getElementById(place);
	if (!place||! return;
	if(!skipAnim && anim && config.options.chkAnimate) anim.startAnimating(new Slider(place,show,false,'none'));
function refreshImportList(selectedIndex) {
	var cmi=config.macros.importTiddlers; // abbrev
	var list=cmi.$('importList'); if (!list) return;
	// if nothing to show, reset list content and size
	if (!cmi.inbound) {
		while (list.length > 0) { list.options[0] = null; }
		list.options[0]=new Option(cmi.loadText,'',false,false);
	// there are inbound tiddlers loaded...
	if (cmi.$('importSelectPanel').style.display=='none')

	// get the sort order
	if (!selectedIndex)   selectedIndex=0;
	if (selectedIndex==0) cmi.sort='title';		// heading
	if (selectedIndex==1) cmi.sort='title';
	if (selectedIndex==2) cmi.sort='modified';
	if (selectedIndex==3) cmi.sort='tags';
	if (selectedIndex>3) {
		// display selected tiddler count
		for (var t=0,count=0; t < list.options.length; t++) {
			if (!list.options[t].selected) continue;
			if (list.options[t].value!='')
			else { // if heading is selected, deselect it, and then select and count all in section
				for ( t++; t<list.options.length && list.options[t].value!=''; t++) {
		clearMessage(); displayMessage(cmi.countMsg.format([count]));
	if (selectedIndex>3) return; // no refresh needed

	// get the alphasorted list of tiddlers
	var tiddlers=cmi.inbound;
	tiddlers.sort(function (a,b) {if(a['title'] == b['title']) return(0); else return (a['title'] < b['title']) ? -1 : +1; });
	// clear current list contents
	while (list.length > 0) { list.options[0] = null; }
	// add heading and control items to list
	var i=0;
	var indent=String.fromCharCode(160)+String.fromCharCode(160);
	if (cmi.all.length==tiddlers.length)
		var summary=cmi.summaryMsg.format([tiddlers.length,(tiddlers.length!=1)?cmi.plural:cmi.single]);
		var summary=cmi.summaryFilteredMsg.format([tiddlers.length,cmi.all.length,(cmi.all.length!=1)?cmi.plural:cmi.single]);
	list.options[i++]=new Option(summary,'',false,false);
	list.options[i++]=new Option(((cmi.sort=='title'   )?'>':indent)+' [by title]','',false,false);
	list.options[i++]=new Option(((cmi.sort=='modified')?'>':indent)+' [by date]','',false,false);
	list.options[i++]=new Option(((cmi.sort=='tags')?'>':indent)+' [by tags]','',false,false);
	// output the tiddler list
	switch(cmi.sort) {
		case 'title':
			for(var t = 0; t < tiddlers.length; t++)
				list.options[i++] = new Option(tiddlers[t].title,tiddlers[t].title,false,false);
		case 'modified':
			// sort descending for newest date first
			tiddlers.sort(function (a,b) {if(a['modified'] == b['modified']) return(0); else return (a['modified'] > b['modified']) ? -1 : +1; });
			var lastSection = '';
			for(var t = 0; t < tiddlers.length; t++) {
				var tiddler = tiddlers[t];
				var theSection = tiddler.modified.toLocaleDateString();
				if (theSection != lastSection) {
					list.options[i++] = new Option(theSection,'',false,false);
					lastSection = theSection;
				list.options[i++] = new Option(indent+indent+tiddler.title,tiddler.title,false,false);
		case 'tags':
			var theTitles = {}; // all tiddler titles, hash indexed by tag value
			var theTags = new Array();
			for(var t=0; t<tiddlers.length; t++) {
				var title=tiddlers[t].title;
				var tags=tiddlers[t].tags;
				if (!tags || !tags.length) {
					if (theTitles['untagged']==undefined) { theTags.push('untagged'); theTitles['untagged']=new Array(); }
				else for(var s=0; s<tags.length; s++) {
					if (theTitles[tags[s]]==undefined) { theTags.push(tags[s]); theTitles[tags[s]]=new Array(); }
			for(var tagindex=0; tagindex<theTags.length; tagindex++) {
				var theTag=theTags[tagindex];
				list.options[i++]=new Option(theTag,'',false,false);
				for(var t=0; t<theTitles[theTag].length; t++)
					list.options[i++]=new Option(indent+indent+theTitles[theTag][t],theTitles[theTag][t],false,false);
	list.selectedIndex=selectedIndex;		  // select current control item
	if (list.size<cmi.listsize) list.size=cmi.listsize;
	if (list.size>list.options.length) list.size=list.options.length;
// re-entrant processing for handling import with interactive collision prompting
function importTiddlers(startIndex) {
	var cmi=config.macros.importTiddlers; // abbrev
	if (!cmi.inbound) return -1;
	var list=cmi.$('importList'); if (!list) return;
	var t;
	// if starting new import, reset import status flags
	if (startIndex==0)
		for (var t=0;t<cmi.inbound.length;t++)
	for (var i=startIndex; i<list.options.length; i++) {
		// if list item is not selected or is a heading (i.e., has no value), skip it
		if ((!list.options[i].selected) || ((t=list.options[i].value)==''))
		for (var j=0;j<cmi.inbound.length;j++)
			if (cmi.inbound[j].title==t) break;
		var inbound = cmi.inbound[j];
		var theExisting = store.getTiddler(inbound.title);
		// avoid redundant import for tiddlers that are listed multiple times (when 'by tags')
		if (inbound.status=='added')
		// don't import the 'ImportedTiddlers' history from the other document...
		if (inbound.title=='ImportedTiddlers')
		// if tiddler exists and import not marked for replace or merge, stop importing
		if (theExisting && (inbound.status.substr(0,7)!='replace') && (inbound.status.substr(0,5)!='merge'))
			return i;
		// assemble tags (remote + existing + added)
		var newTags = '';
		if (cmi.importTags)
			newTags+=inbound.getTags()	// import remote tags
		if (cmi.keepTags && theExisting)
			newTags+=' '+theExisting.getTags(); // keep existing tags
		if (cmi.addTags && cmi.newTags.trim().length)
			newTags+=' '+cmi.newTags; // add new tags
		// set the status to 'added' (if not already set by the 'ask the user' UI)
		// set sync fields
		if (cmi.sync) {
			if (!inbound.fields) inbound.fields={}; // for TW2.1.x backward-compatibility
		// do the import!
		store.saveTiddler(inbound.title, inbound.title, inbound.text, inbound.modifier, inbound.modified, inbound.tags, inbound.fields, true, inbound.created);
                store.fetchTiddler(inbound.title).created = inbound.created; // force creation date to imported value (needed for TW2.1.x and earlier)
	return(-1);	// signals that we really finished the entire list
function importStopped() {
	var cmi=config.macros.importTiddlers; // abbrev
	var list=cmi.$('importList'); if (!list) return;
	var theNewTitle=cmi.$('importNewTitle');
	if (cmi.index==-1){ 
		importReport();	// import finished... generate the report
	} else {
		// import collision...
		// show the collision panel and set the title edit field
		if (cmi.$('importApplyToAll').checked && cmi.lastAction &&!='importRename')
function importReport() {
	var cmi=config.macros.importTiddlers; // abbrev
	if (!cmi.inbound) return;
	// if import was not completed, the collision panel will still be open... close it now.
	var panel=cmi.$('importCollisionPanel'); if (panel)'none';
	// get the alphasorted list of tiddlers
	var tiddlers = cmi.inbound;
	// gather the statistics
	var count=0; var total=0;
	for (var t=0; t<tiddlers.length; t++) {
		if (!tiddlers[t].status || !tiddlers[t].status.trim().length) continue;
		if (tiddlers[t].status.substr(0,7)!='skipped') count++;
	// generate a report
	if (total) displayMessage(cmi.processedMsg.format([total]));
	if (count && config.options.chkImportReport) {
		// get/create the report tiddler
		var theReport = store.getTiddler('ImportedTiddlers');
		if (!theReport) { theReport=new Tiddler(); theReport.title='ImportedTiddlers'; theReport.text=''; }
		// format the report content
		var now = new Date();
		var newText = 'On '+now.toLocaleString()+', '+config.options.txtUserName
		newText +=' imported '+count+' tiddler'+(count==1?'':'s')+' from\n[['+cmi.src+'|'+cmi.src+']]:\n';
		if (cmi.addTags && cmi.newTags.trim().length)
			newText += 'imported tiddlers were tagged with: "'+cmi.newTags+'"\n';
		newText += '<<<\n';
		for (var t=0; t<tiddlers.length; t++) if (tiddlers[t].status)
			newText += '#[['+tiddlers[t].title+']] - '+tiddlers[t].status+'\n';
		newText += '<<<\n';
		// update the ImportedTiddlers content and show the tiddler
		theReport.text	 = newText+((theReport.text!='')?'\n----\n':'')+theReport.text;
		theReport.modifier = config.options.txtUserName;
		theReport.modified = new Date();
                store.saveTiddler(theReport.title, theReport.title, theReport.text, theReport.modifier, theReport.modified, theReport.tags, theReport.fields);
	// reset status flags
	for (var t=0; t<cmi.inbound.length; t++) cmi.inbound[t].status='';
	// mark document as dirty and let display update as needed
	if (count) { store.setDirty(true); store.notifyAll(); }
	// always show final message when tiddlers were actually loaded
	if (count) displayMessage(cmi.importedMsg.format([count,tiddlers.length,cmi.src.replace(/%20/g,' ')]));
// // File and XMLHttpRequest I/O
config.macros.importTiddlers.askForFilename=function(here) {
	var msg=here.title; // use tooltip as dialog box message
	var path=getLocalPath(document.location.href);
	var slashpos=path.lastIndexOf('/'); if (slashpos==-1) slashpos=path.lastIndexOf('\\'); 
	if (slashpos!=-1) path = path.substr(0,slashpos+1); // remove filename from path, leave the trailing slash
	var file='';
	var result='';
	if(window.Components) { // moz
		try {'UniversalXPConnect');

			var nsIFilePicker = window.Components.interfaces.nsIFilePicker;
			var picker = Components.classes[';1'].createInstance(nsIFilePicker);
			picker.init(window, msg, nsIFilePicker.modeOpen);
			var thispath = Components.classes[';1'].createInstance(Components.interfaces.nsILocalFile);
			if (!=nsIFilePicker.returnCancel) var result=picker.file.path;
		catch(e) { alert('error during local file access: '+e.toString()) }
	else { // IE
		try { // XPSP2 IE only
			var s = new ActiveXObject('UserAccounts.CommonDialog');
			s.Filter='All files|*.*|Text files|*.txt|HTML files|*.htm;*.html|';
			s.FilterIndex=3; // default to HTML files;
			if (s.showOpen()) var result=s.FileName;
		catch(e) {  // fallback
			var result=prompt(msg,path+file);
	return result;

config.macros.importTiddlers.loadRemoteFile = function(src,callback) {
	if (src==undefined || !src.length) return null; // filename is required
	var original=src; // URL as specified
	var hashpos=src.indexOf('#'); if (hashpos!=-1) src=src.substr(0,hashpos); // URL with #... suffix removed (needed for IE)
	displayMessage(this.openMsg.format([src.replace(/%20/g,' ')]));
	if (src.substr(0,5)!='http:' && src.substr(0,5)!='file:') { // if not a URL, read from local filesystem
		var txt=loadFile(src);
		if (!txt) { // file didn't load, might be relative path.. try fixup
			var pathPrefix=document.location.href;  // get current document path and trim off filename
			var slashpos=pathPrefix.lastIndexOf('/'); if (slashpos==-1) slashpos=pathPrefix.lastIndexOf('\\'); 
			if (slashpos!=-1 && slashpos!=pathPrefix.length-1) pathPrefix=pathPrefix.substr(0,slashpos+1);
			if (pathPrefix.substr(0,5)!='http:') src=getLocalPath(src);
			var txt=loadFile(src);
		if (!txt) { // file still didn't load, report error
			displayMessage(config.macros.importTiddlers.openErrMsg.format([src.replace(/%20/g,' '),'(filesystem error)']));
		} else {
			 displayMessage(config.macros.importTiddlers.readMsg.format([txt.length,src.replace(/%20/g,' ')]));
                        var vers = version.major+version.minor*.1+version.revision*.01;
			if (vers < 2.52) {
			} else {
			if (callback) callback(true,original,txt,src,null);
	} else {

	var remoteStore=new TiddlyWiki();
	return remoteStore.getTiddlers('title');	

	var remoteStore=new TiddlyWiki();
	var lines=CSV.replace(/\r/g,'').split('\n');
	var names=lines.shift().replace(/"/g,'').split(',');
	// ENCODE commas and newlines within quoted values
	var comma='!~comma~!'; var commaRE=new RegExp(comma,'g');
	var newline='!~newline~!'; var newlineRE=new RegExp(newline,'g');
		function(x){ return x.replace(/\,/g,comma).replace(/\n/g,newline); });
	// PARSE lines
	var lines=CSV.split('\n');
	for (var i=0; i<lines.length; i++) { if (!lines[i].length) continue;
		var values=lines[i].split(',');
		// DECODE commas, newlines, and doubled-quotes, and remove enclosing quotes (if any)
		for (var v=0; v<values.length; v++)
		// EXTRACT tiddler values
		var title=''; var text=''; var tags=[]; var fields={};
		var created=null; var when=new Date(); var who=config.options.txtUserName;
		for (var v=0; v<values.length; v++) { var val=values[v];
			if (names[v]) switch(names[v].toLowerCase()) {
				case 'title':	title=val.replace(/\[\]\|/g,'_'); break;
				case 'created': created=new Date(val); break;
				case 'modified':when=new Date(val); break;
				case 'modifier':who=val; break;
				case 'text':	text=val; break;
				case 'tags':	tags=val.readBracketedList(); break;
				default:	fields[names[v].toLowerCase()]=val; break;
		// CREATE tiddler in temporary store
		if (title.length)
	return remoteStore.getTiddlers('title');

config.macros.importTiddlers.createTiddlerFromFile=function(src,txt) {
	var t=new Tiddler();
	var pos=src.lastIndexOf("/"); if (pos==-1) pos=src.lastIndexOf("\\");
	t.created=t.modified=new Date();
	if (src.substr(src.length-3,3)=='.js') t.tags=['systemConfig'];
	return [t];

	var cmi=config.macros.importTiddlers; // abbreviation
	var src=src.replace(/%20/g,' ');
	if (!success) { displayMessage(cmi.openErrMsg.format([src,xhr.status])); return; }
	if (!cmi.all||!cmi.all.length) cmi.all=cmi.readTiddlersFromCSV(txt)
	if (!cmi.all||!cmi.all.length) cmi.all=cmi.createTiddlerFromFile(src,txt)
	var count=cmi.all?cmi.all.length:0;
	var querypos=src.lastIndexOf('?'); if (querypos!=-1) src=src.substr(0,querypos);
	cmi.inbound=cmi.filterByHash(params,cmi.all); // use full URL including hash (if any)

	var hashpos=src.lastIndexOf('#'); if (hashpos==-1) return tiddlers;
	var hash=src.substr(hashpos+1); if (!hash.length) return tiddlers;
	var tids=[];
	var params=hash.parseParams('anon',null,true,false,false);
	for (var p=1; p<params.length; p++) {
		switch (params[p].name) {
			case 'anon':
			case 'open':
			case 'tag':
				if (store.getMatchingTiddlers) { // for boolean expressions - see MatchTagsPlugin
					var r=store.getMatchingTiddlers(params[p].value,null,tiddlers);
					for (var t=0; t<r.length; t++) tids.pushUnique(r[t].title);
				} else for (var t=0; t<tiddlers.length; t++)
					if (tiddlers[t].isTagged(params[p].value))
			case 'story':
				for (var t=0; t<tiddlers.length; t++)
					if (tiddlers[t].title==params[p].value) {
						for (var s=0; s<tiddlers[t].links.length; s++)
			case 'search':
				for (var t=0; t<tiddlers.length; t++)
					if (tiddlers[t].text.indexOf(params[p].value)!=-1)
	var matches=[];
	for (var t=0; t<tiddlers.length; t++)
		if (tids.contains(tiddlers[t].title))
	return matches;
!!!Control panel CSS
#importPanel {
	display: none; position:absolute; z-index:11; width:35em; right:105%; top:3em;
	background-color: #eee; color:#000; font-size: 8pt; line-height:110%;
	border:1px solid black; border-bottom-width: 3px; border-right-width: 3px;
	padding: 0.5em; margin:0em; -moz-border-radius:1em;-webkit-border-radius:1em;
#importPanel a, #importPanel td a { color:#009; display:inline; margin:0px; padding:1px; }
#importPanel table { width:100%; border:0px; padding:0px; margin:0px; font-size:8pt; line-height:110%; background:transparent; }
#importPanel tr { border:0px;padding:0px;margin:0px; background:transparent; }
#importPanel td { color:#000; border:0px;padding:0px;margin:0px; background:transparent; }
#importPanel select { width:100%;margin:0px;font-size:8pt;line-height:110%;}
#importPanel input  { width:98%;padding:0px;margin:0px;font-size:8pt;line-height:110%}
#importPanel .box { border:1px solid #000; background-color:#eee; padding:3px 5px; margin-bottom:5px; -moz-border-radius:5px;-webkit-border-radius:5px;}
#importPanel .topline { border-top:1px solid #999; padding-top:2px; margin-top:2px; }
#importPanel .rad { width:auto; }
#importPanel .chk { width:auto; margin:1px;border:0; }
#importPanel .btn { width:auto; }
#importPanel .btn1 { width:98%; }
#importPanel .btn2 { width:48%; }
#importPanel .btn3 { width:32%; }
#importPanel .btn4 { width:23%; }
#importPanel .btn5 { width:19%; }
#importPanel .importButton { padding: 0em; margin: 0px; font-size:8pt; }
#importPanel .importListButton { padding:0em 0.25em 0em 0.25em; color: #000000; display:inline }
#backstagePanel #importPanel { left:10%; right:auto; }
!!!Control panel HTML
<!-- source and report -->
<table><tr><td align=left>
	import from
	<input type="radio" class="rad" name="importFrom" id="importFromFile" value="file" CHECKED
		onclick="onClickImportButton(this,event)" title="show file controls"> local file
	<input type="radio" class="rad" name="importFrom" id="importFromWeb"  value="http"
		onclick="onClickImportButton(this,event)" title="show web controls"> web server
</td><td align=right>
	<input type=checkbox class="chk" id="chkImportReport"
		onClick="config.options['chkImportReport']=this.checked;"> create report

<div class="box" id="importSourcePanel" style="margin:.5em">
<div id="importLocalPanel" style="display:block;margin-bottom:2px;"><!-- import from local file  -->
enter or browse for source path/filename<br>
<input type="file" id="fileImportSource" size=57 style="width:100%"
<div id="importLocalPanelFix" style="display:none"><!-- FF3 FIXUP -->
	<input type="text" id="fileImportSourceFix" style="width:90%"
		title="Enter a path/file to import"
	<input type="button" id="fileImportSourceFixButton" style="width:7%" value="..."
		title="Select a path/file to import"
		onClick="var r=config.macros.importTiddlers.askForFilename(this); if (!r||!r.length) return;
</div><!--end FF3 FIXUP-->
</div><!--end local-->
<div id="importHTTPPanel" style="display:none;margin-bottom:2px;"><!-- import from http server -->
<table><tr><td align=left>
	enter a URL or <a href="javascript:;" id="importSelectFeed"
		onclick="return onClickImportButton(this,event)" title="select a pre-defined 'systemServer' URL">
		select a server</a><br>
</td><td align=right>
	<input type="checkbox" class="chk" id="importUsePassword"
	<input type="checkbox" class="chk" id="importUseProxy"
<input type="text" id="importSiteProxy" style="display:none;margin-bottom:1px" onfocus="" value="SiteProxy"
<input type="text" id="importSourceURL" onfocus="" value="SiteUrl"
<div id="importIDPWPanel" style="text-align:center;margin-top:2px;display:none";>
username: <input type=text id="txtImportID" style="width:25%" 
 password: <input type=password id="txtImportPW" style="width:25%" 
</div><!--end idpw-->
</div><!--end http-->
</div><!--end source-->

<div class="box" id="importSelectPanel" style="display:none;margin:.5em;">
<table><tr><td align=left>
<a href="javascript:;" id="importSelectAll"
	onclick="return onClickImportButton(this)" title="SELECT all tiddlers">
&nbsp;<a href="javascript:;" id="importSelectNew"
	onclick="return onClickImportButton(this)" title="SELECT tiddlers not already in destination document">
&nbsp;<a href="javascript:;" id="importSelectChanges"
	onclick="return onClickImportButton(this)" title="SELECT tiddlers that have been updated in source document">
&nbsp;<a href="javascript:;" id="importSelectDifferences"
	onclick="return onClickImportButton(this)" title="SELECT tiddlers that have been added or are different from existing tiddlers">
</td><td align=right>
<a href="javascript:;" id="importListSmaller"
	onclick="return onClickImportButton(this)" title="SHRINK list size">
<a href="javascript:;" id="importListLarger"
	onclick="return onClickImportButton(this)" title="GROW list size">
<a href="javascript:;" id="importListMaximize"
	onclick="return onClickImportButton(this)" title="MAXIMIZE/RESTORE list size">
<select id="importList" size=8 multiple
	<!-- NOTE: delay refresh so list is updated AFTER onchange event is handled -->
<div style="text-align:center">
	<a href="javascript:;"
		title="click for help using filters..."
		onclick="alert('A filter consists of one or more space-separated combinations of: tiddlertitle, tag:[[tagvalue]], tag:[[tag expression]] (requires MatchTagsPlugin), story:[[TiddlerName]], and/or search:[[searchtext]]. Use a blank filter to restore the list of all tiddlers.'); return false;"
	<input type="text" id="importLastFilter" style="margin-bottom:1px; width:65%"
		title="Enter a combination of one or more filters. Use a blank filter for all tiddlers."
		onfocus="" value=""
	<input type="button" id="importApplyFilter" style="width:20%" value="apply"
		title="filter list of tiddlers to include only those that match certain criteria"
		onclick="return onClickImportButton(this)">
</div><!--end select-->

<div class="box" id="importOptionsPanel" style="text-align:center;margin:.5em;display:none;">
	apply tags: <input type=checkbox class="chk" id="chkImportTags" checked
		onClick="config.macros.importTiddlers.importTags=this.checked;">from source&nbsp;
	<input type=checkbox class="chk" id="chkKeepTags" checked
		onClick="config.macros.importTiddlers.keepTags=this.checked;">keep existing&nbsp;
	<input type=checkbox class="chk" id="chkAddTags" 
			if (this.checked) document.getElementById('txtNewTags').focus();">add tags<br>
	<input type=text id="txtNewTags" style="margin-top:4px;display:none;" size=15 onfocus="" 
		title="enter tags to be added to imported tiddlers" 
		document.getElementById('chkAddTags').checked=this.value.length>0;" autocomplete=off>
	<nobr><input type=checkbox class="chk" id="chkSync" 
		link tiddlers to source document (for sync later)</nobr>
</div><!--end options-->

<div id="importButtonPanel" style="text-align:center">
	<input type=button id="importLoad"	class="importButton btn3" value="open"
		title="load listbox with tiddlers from source document"
	<input type=button id="importOptions"	class="importButton btn3" value="options..."
		title="set options for tags, sync, etc."
	<input type=button id="importStart"	class="importButton btn3" value="import"
		title="start/stop import of selected source tiddlers into current document"
	<input type=button id="importClose"	class="importButton btn3" value="done"
		title="clear listbox or hide control panel"

<div class="none" id="importCollisionPanel" style="display:none;margin:.5em 0 .5em .5em;">
	<table><tr><td style="width:65%" align="left">
		<table><tr><td align=left>
			tiddler already exists:
		</td><td align=right>
			<input type=checkbox class="chk" id="importApplyToAll" 
			checked>apply to all
		<input type=text id="importNewTitle" size=15 autocomplete=off">
	</td><td style="width:34%" align="center">
		<input type=button id="importMerge"
			class="importButton" style="width:47%" value="merge"
			title="append the incoming tiddler to the existing tiddler"
		--><input type=button id="importSkip"
			class="importButton" style="width:47%" value="skip"
			title="do not import this tiddler"
		--><br><input type=button id="importRename"
			class="importButton" style="width:47%" value="rename"
			title="rename the incoming tiddler"
		--><input type=button id="importReplace"
			class="importButton" style="width:47%" value="replace"
			title="discard the existing tiddler"
</div><!--end collision-->
|Author|Eric Shulman|
|Description|Documentation for InlineJavascriptPlugin|
''Call directly into TW core utility routines, define new functions, calculate values, add dynamically-generated TiddlyWiki-formatted output'' into tiddler content, or perform any other programmatic actions each time the tiddler is rendered.
This plugin adds wiki syntax for surrounding tiddler content with {{{<script>}}} and {{{</script>}}} markers, so that it can be recognized as embedded javascript code.  When a tiddler is rendered, the plugin automatically invokes any embedded scripts, which can be used to construct and return dynamically-generated output that is inserted into the tiddler content.
<script type="..." src="..." label="..." title="..." key="..." show>
	/* javascript code goes here... */
All parameters are //optional//.    When the ''show'' keyword is used, the plugin will also include the script source code in the output that it displays in the tiddler.  This is helpful when creating examples for documentation purposes (such as used in this tiddler!)

__''Deferred execution from an 'onClick' link''__
<script label="click here" title="mouseover tooltip text" key="X" show>
	/* javascript code goes here... */
	alert('you clicked on the link!');
By including a {{{label="..."}}} parameter in the initial {{{<script>}}} marker, the plugin will create a link to an 'onclick' script that will only be executed when that specific link is clicked, rather than running the script each time the tiddler is rendered.  You may also include a {{{title="..."}}} parameter to specify the 'tooltip' text that will appear whenever the mouse is moved over the onClick link text, and a {{{key="X"}}} parameter to specify an //access key// (which must be a //single// letter or numeric digit only).

__''Loading scripts from external source files''__
<script src="URL" show>
	/* optional javascript code goes here... */
</script>You can also load javascript directly from an external source URL, by including a src="..." parameter in the initial {{{<script>}}} marker (e.g., {{{<script src="demo.js"></script>}}}).  This is particularly useful when incorporating third-party javascript libraries for use in custom extensions and plugins.  The 'foreign' javascript code remains isolated in a separate file that can be easily replaced whenever an updated library file becomes available.

In addition to loading the javascript from the external file, you can also use this feature to invoke javascript code contained within the {{{<script>...</script>}}} markers.  This code is invoked //after// the external script file has been processed, and can make immediate use of the functions and/or global variables defined by the external script file.
>Note: To ensure that your javascript functions are always available when needed, you should load the libraries from a tiddler that is rendered as soon as your TiddlyWiki document is opened, such as MainMenu.  For example: put your {{{<script src="..."></script>}}} syntax into a separate 'library' tiddler (e.g., LoadScripts), and then add {{{<<tiddler LoadScripts>>}}} to MainMenu so that the library is loaded before any other tiddlers that rely upon the functions it defines. 
>Normally, loading external javascript in this way does not produce any direct output, and should not have any impact on the appearance of your MainMenu.  However, if your LoadScripts tiddler contains notes or other visible content, you can suppress this output by using 'inline CSS' in the MainMenu, like this: {{{@@display:none;<<tiddler LoadScripts>>@@}}}
!!!!!Creating dynamic tiddler content and accessing the ~TiddlyWiki DOM
An important difference between TiddlyWiki inline scripting and conventional embedded javascript techniques for web pages is the method used to produce output that is dynamically inserted into the document: in a typical web document, you use the {{{document.write()}}} (or {{{document.writeln()}}}) function to output text sequences (often containing HTML tags) that are then rendered when the entire document is first loaded into the browser window.

However, in a ~TiddlyWiki document, tiddlers (and other DOM elements) are created, deleted, and rendered "on-the-fly", so writing directly to the global 'document' object does not produce the results you want (i.e., replacing the embedded script within the tiddler content), and instead will //completely replace the entire ~TiddlyWiki document in your browser window (which is clearly not a good thing!)//.  In order to allow scripts to use {{{document.write()}}}, the plugin automatically converts and buffers all HTML output so it can be safely inserted into your tiddler content, immediately following the script.

''Note that {{{document.write()}}} can only be used to output "pure HTML" syntax.  To produce //wiki-formatted// output, your script should instead return a text value containing the desired wiki-syntax content'', which will then be automatically rendered immediately following the script.  If returning a text value is not sufficient for your needs, the plugin also provides an automatically-defined variable, 'place', that gives the script code ''direct access to the //containing DOM element//'' into which the tiddler output is being rendered.  You can use this variable to ''perform direct DOM manipulations'' that can, for example:
* generate wiki-formatted output using {{{wikify("...content...",place)}}}
* vary the script's actions based upon the DOM element in which it is embedded
* access 'tiddler-relative' DOM information using {{{story.findContainingTiddler(place)}}}
''When using an 'onclick' script, the 'place' element actually refers to the onclick //link text// itself, instead of the containing DOM element.''  This permits you to directly reference or modify the link text to reflect any 'stateful' conditions that might set by the script.  To refer to the containing DOM element from within an 'onclick' script, you can use "place.parentNode" instead.
!!!!!Instant "bookmarklets"
You can also use an 'onclick' link to define a "bookmarklet": a small piece of javascript that can be ''invoked directly from the browser without having to be defined within the current document.''  This allows you to create 'stand-alone' commands that can be applied to virtually ANY TiddlyWiki document... even remotely-hosted documents that have been written by others!!  To create a bookmarklet, simply define an 'onclick' script and then grab the resulting link text and drag-and-drop it onto your browser's toolbar (or right-click and use the 'bookmark this link' command to add it to the browser's menu).

*When writing scripts intended for use as bookmarklets, due to the ~URI-encoding required by the browser, ''you cannot not use ANY double-quotes (") within the bookmarklet script code.''
*All comments embedded in the bookmarklet script must ''use the fully-delimited {{{/* ... */}}} comment syntax,'' rather than the shorter {{{//}}} comment syntax.
*Most importantly, because bookmarklets are invoked directly from the browser interface and are not embedded within the TiddlyWiki document, there is NO containing 'place' DOM element surrounding the script.  As a result, ''you cannot use a bookmarklet to generate dynamic output in your document,''  and using {{{document.write()}}} or returning wiki-syntax text or making reference to the 'place' DOM element will halt the script and report a "Reference Error" when that bookmarklet is invoked.  
Please see [[InstantBookmarklets]] for many examples of 'onclick' scripts that can also be used as bookmarklets.
!!!!!Special reserved function name
The plugin 'wraps' all inline javascript code inside a function, {{{_out()}}}, so that any return value you provide can be correctly handled by the plugin and inserted into the tiddler.  To avoid unpredictable results (and possibly fatal execution errors), this function should never be redefined or called from ''within'' your script code.
!!!!!$(...) 'shorthand' function
As described by Dustin Diaz [[here|]], the plugin defines a 'shorthand' function that allows you to write:
in place of the normal standard javascript syntax:
This function is provided merely as a convenience for javascript coders that may be familiar with this abbreviation, in order to allow them to save a few bytes when writing their own inline script code.
simple dynamic output:
><script show>
	document.write("The current date/time is: "+(new Date())+"<br>");
	return "link to current user: [["+config.options.txtUserName+"]]\n";
dynamic output using 'place' to get size information for current tiddler:
><script show>
	if (!window.story) window.story=window;
	var title=story.findContainingTiddler(place).getAttribute("tiddler");
	var size=store.getTiddlerText(title).length;
	return title+" is using "+size+" bytes";
dynamic output from an 'onclick' script, using {{{document.write()}}} and/or {{{return "..."}}}
><script label="click here" show>
	document.write("<br>The current date/time is: "+(new Date())+"<br>");
	return "link to current user: [["+config.options.txtUserName+"]]\n";
creating an 'onclick' button/link that accesses the link text AND the containing tiddler:
><script label="click here" title="clicking this link will show an 'alert' box" key="H" show>
	if (!window.story) window.story=window;
	var tid=story.findContainingTiddler(place).getAttribute('tiddler');
	alert('Hello World!\nlinktext='+txt+'\ntiddler='+tid);
dynamically setting onclick link text based on stateful information:
<script label="click here">
	/* toggle "txtSomething" value */
	var on=(config.txtSomething=="ON");
	return "\nThe current value is: "+config.txtSomething;
	/* initialize onclick link text based on current "txtSomething" value */
	var on=(config.txtSomething=="ON");
<script label="click here">
	/* toggle "txtSomething" value */
	var on=(config.txtSomething=="ON");
	return "\nThe current value is: "+config.txtSomething;
	/* initialize onclick link text based on current "txtSomething" value */
	var on=(config.txtSomething=="ON");
loading a script from a source url:
> contains:
>>{{{function inlineJavascriptDemo() { alert('Hello from demo.js!!') } }}}
>>{{{displayMessage('InlineJavascriptPlugin: demo.js has been loaded');}}}
>note: When using this example on your local system, you will need to download the external script file from the above URL and install it into the same directory as your document.
><script src="demo.js" show>
	return "inlineJavascriptDemo() function has been defined"
><script label="click to invoke inlineJavascriptDemo()" key="D" show>
2010.12.15 1.9.6 allow (but ignore) type="..." syntax
2009.04.11 1.9.5 pass current tiddler object into wrapper code so it can be referenced from within 'onclick' scripts
2009.02.26 1.9.4 in $(), handle leading '#' on ID for compatibility with JQuery syntax
2008.06.11 1.9.3 added $(...) function as 'shorthand' for document.getElementById()
2008.03.03 1.9.2 corrected fallback declaration of wikifyPlainText() (fixes Safari "parse error")
2008.02.23 1.9.1 in onclick function, use string instead of array for 'bufferedHTML' (fixes IE errors)
2008.02.21 1.9.0 output from 'onclick' scripts (return value or document.write() calls) are now buffered and rendered into into a span following the script.  Also, added default 'return false' handling if no return value provided (prevents HREF from being triggered -- return TRUE to allow HREF to be processed).  Thanks to Xavier Verges for suggestion and preliminary code.
2008.02.14 1.8.1 added backward-compatibility for use of wikifyPlainText() in TW2.1.3 and earlier
2008.01.08 [*.*.*] plugin size reduction: documentation moved to ...Info tiddler
2007.12.28 1.8.0 added support for key="X" syntax to specify custom access key definitions
2007.12.15 1.7.0 autogenerate URI encoded HREF on links for onclick scripts.  Drag links to browser toolbar to create bookmarklets.  IMPORTANT NOTE: place is NOT defined when scripts are used as bookmarklets.  In addition, double-quotes will cause syntax errors.  Thanks to PaulReiber for debugging and brainstorming.
2007.11.26 1.6.2 when converting "document.write()" function calls in inline code, allow whitespace between "write" and "(" so that "document.write ( foobar )" is properly converted.
2007.11.16 1.6.1 when rendering "onclick scripts", pass label text through wikifyPlainText() to parse any embedded wiki-syntax to enable use of HTML entities or even TW macros to generate dynamic label text.
2007.02.19 1.6.0 added support for title="..." to specify mouseover tooltip when using an onclick (label="...") script
2006.10.16 1.5.2 add newline before closing '}' in 'function out_' wrapper.  Fixes error caused when last line of script is a comment.
2006.06.01 1.5.1 when calling wikify() on script return value, pass hightlightRegExp and tiddler params so macros that rely on these values can render properly
2006.04.19 1.5.0 added 'show' parameter to force display of javascript source code in tiddler output
2006.01.05 1.4.0 added support 'onclick' scripts.  When label="..." param is present, a button/link is created using the indicated label text, and the script is only executed when the button/link is clicked.  'place' value is set to match the clicked button/link element.
2005.12.13 1.3.1 when catching eval error in IE, e.description contains the error text, instead of e.toString().  Fixed error reporting so IE shows the correct response text.  Based on a suggestion by UdoBorkowski
2005.11.09 1.3.0 for 'inline' scripts (i.e., not scripts loaded with src="..."), automatically replace calls to 'document.write()' with 'place.innerHTML+=' so script output is directed into tiddler content.  Based on a suggestion by BradleyMeck
2005.11.08 1.2.0 handle loading of javascript from an external URL via src="..." syntax
2005.11.08 1.1.0 pass 'place' param into scripts to provide direct DOM access 
2005.11.08 1.0.0 initial release
Les différents formats d'entrée rencontrés sont :
* //raw// text : texte brut <<tag in:raw>>
* //tokenized// text : 
** <<tag in:sent>> 1 //sentence// par ligne, //token//s (y compris les ponctuations) separés par des //espace//s <<slider show-in-sent-ex [[exemple in:sent]] "(exemple)" "exemple d'entrée tokenisée 1 phrase par ligne.">>
** <<tag in:tok>> 1 //token// par ligne, avec ou sans marque de fin de phrase (le plus souvent une ligne blanche)
** <<tag in:cols>> 1 //token// par ligne, multicolones
* //xml// : <<tag in:xml>> 
** [[Stanford POS Tagger]] : extraction du texte brut contenu sous une liste de noeuds => //raw// text.

Du point de vue //informationnel//, il y a trois niveaux d'entrée :
* texte ''brut'', i.e. sans informations supplémentaires (entrées étiquetées <<tag in:raw>>, mais aussi l'entrée XML du [[Stanford POS Tagger]])
* texte ''tokenisé'', i.e. délimitation des tokens, et parfois également des phrases  (entrées étiquetées <<tag in:sent>> ou <<tag in:tok>>)
* texte ''tokenisé enrichi'', i.e. délimitation des tokens et informations additionnelles (entrées multicolones <<tag in:cols>>)

Du point de vue du //format//, la variabilité est plus grande :
* différents ''encoding'' : pour le moins UTF-8, latin1(ISO8859-1 et ISO8859-15) + systèmes windows/mac pour le français (+ absence d'accents)
* signe de ''fin de ligne'' : dos(CRLF) / unix(LF) / mac(CR) -- normalement BufferedReader.readLine() gère les 3 cas
* dans le cas du ''texte brut'', la marque de ''fin de ligne'' peut-être interprétée comme :
** coupure régulière du texte (//wrap//), auquel cas il faut joindre les lignes __sans nécessairement insérer d'espace__
** marque de fin de ligne (entrée 1 phrase par ligne)
** marque de fin de paragraphe/section, auquel cas une ligne peut contenir plusieurs phrases.
[img[]] jSafran : Free syntax editor/parser
Christophe Cerisara, LORIA

//Installeur, code et démo actuellement indisponible//

* Intégration de différents outils : TreeTagger, MaltParser, MATE parser
* Interface graphique

JSafran is an open-source 100%-Java software to annotate a text&audio corpus with syntactic dependency trees, either manually, automatically or semi-automatically. It integrates the Malt-parser, the MATE parser, the Treetagger, plus experimental unsupervised rule-base Bayesian models. It can be interfaced with JTrans to automatically align the sound file and listen to specific utterances. It integrates JGIT to support collaborative edition & versionning, CoNLL'05-06-08-09 I/O, and many more features...

Matthieu Constant : [img[]] LIGM (Laboratoire d'informatique Gaspard-Monge,  Université Paris-Est Marne-la-Vallée)

LGTagger is an open-source Part-of-speech tagger that also recognizes Multiword units. It is based on Conditional Random Fields (CRF) and large-coverage lexical resources. The lexical resources can be composed of morphosyntactic dictionaries (including simple and compound words) and strongly lexicalized local grammars. It presently works for French.

Reference: Matthieu Constant and Anthony Sigogne. ~MWU-aware ~Part-of-Speech Tagging with a CRF model and lexical resources. ACL Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE'11). 2011. [[pdf|]]

Langage : Java + librairies/outils en C/C++ (Unitex + Wapiti pour les CRF)

Licence LGPL & resources LGPL-LR

;Input : utf-8
:- tokenized and segmented in sentences (one token per line, blank line to separate sentences) 
:- raw &rArr; (1) sentence segmented with Unitext for French (2) token separed by  regex (sequences of letters + a few particular cases)
;Tag set :
:[[CC Tagset]] (Crabbé & Candito, 2008)

!!! POS tagging
Utilise un taggeur CRF.
Construit un transducteur acyclique (TFST) dont les transistions sont les tokens associés à chaque POS tag, puis cherche le chemin optimum.

Features :
* //Internal unigram features//
** w~~0~~ (form)
** Lowercase form of w~~0~~
** Prefix of w~~0~~ = P with |P| < 5
** Suffix of w~~0~~ = S with |S| < 5
** w~~0~~ contains a hyphen
** w~~0~~ contains a digit
** w~~0~~ is capitalized
** w~~0~~ is all capital
** w~~0~~ is capitalized and BOS (Begin Of Sentence)
** w~~0~~ is multiword
** Lexicon tags AC~~0~~ of w~~0~~ & w~~0~~ is multiword
* //Contextual unigram features//
** w~~i~~ i ∈ {−2, −1, 1, 2}
** w~~i~~ w~~j~~ , (j, k) ∈ {(−1, 0), (0, 1), (−1, 1)}
** AC~~i~~ & w~~i~~ is multiword, i ∈ {−2, −1, 1, 2}
* Bigram features
** t~~−1~~ (POS tag précédent)
nota : //Ambiguity Class of the token// (AC) : liste de POS candidats trouvés dans un lexique externe.

@@(!)@@ les ressources lexicales ont 2 fonctions :
* définir la //feature// AC pour le taggeur CRF
* réduire la liste des POS candidats (i.e. pour un token dans le lexique, seuls les transitions avec un POS tag du lexique sont conservés avant de chercher le chemin optimum).

!!!Pour les MWE :
* Le taggeur est entrainé avec les POS couplés avec une étiquette //"IOB"// (__B__egin, __I__nside, __O__utside : POS+B, POS+I ou POS)
* Des ressources lexicales variées permettent d'ajouter des //features// d'apprentissage :
** POS : POS de l'unité multiword (ex: "Banque de Chine" NPP, "pouvoir d'achat" NC)
** STRUCT : (optionnel) structure interne (ex "pouvoir d'achat" NPN=Noun Preposition Noun)
** SEM : (optionnel) sémantique (ex "Banque de Chine" ORG=organisation)
** POSITION : Begin/Inside/Outside
* Ajout des transitions correspondants aux MWE

!! Ressources lexicales
* Dictionnaires morphosyntaxiques
** DELA ([[téléchargement|]])
** Lefff ([[]])
** Prolex (toponymes) ([[]])
** Organizations (dicctionaire d'organisation)
** First names (dictionnaire de prénoms)
* Grammaires locales
** grammaires faites manuellement à partir de celles trouvées sur [[GraalWeb|]]
[img(100px+,)[]] LIA_TAGG: a statistical POS tagger + syntactic bracketer
Dévellopé par F. Béchet au LIA :
Aussi sur sa page perso au LIF :

Pas de documentation en ligne.
Fichier //README// joint au code source.

;Input : (''ISO8859-1'')
- tokenized text, 1 token/line (? balises de phrase {{{<s> </s>}}})
- le texte brut peut être préparé par l'outil //lia_clean//
;Output : (''ISO8859-1'')
- 1 token/line {{{WORD POS}}} (pour le français, il existe une sortie avec le {{{LEMMA}}})

Note : option pour __réaccentuer le texte__

> LanguageTool is an Open Source proofreading software for English, French, German, Polish, Romanian, and [[more than 20 other languages|]]. It finds many errors that a simple spell checker cannot detect like mixing up there/their and it detects some grammar problems.

Tips and Tricks : (
Tagging a corpus using LanguageTool
LanguageTool has a POS tagger, sometimes it has a disambiguator or a chunker for a language, so you can use it to tag a big corpus. We added a special command-line switch —taggeronly or, for short, -t to disable rule checking and use only the tagger.
We didn't go beyond 3 GB of pure text but you should experience no problem even with huge corpora.
{{{java -jar LanguageTool.jar -l <language> -c <encoding> -t <corpus_file> > <tagged_corpus_file>}}}

;Input :
:raw text 
;Output :
:append inside brackets lemma/POS proposition__s__ (comma separated) to each token
:+ sentence tags {{{<S> </S>}}}
:+ /!\ 2 initial lines of information in the output : <<slider show-ltpos-out-start [[languagetool output start]] "(show/hide)" "example of the 2 firts lines of information in LanguageTool POS Tagger.">>
:&nbsp;&nbsp;&nbsp;{{{<S> This[this/DT]  is[be/VBZ]  a[a/DT]  sample[sample/JJ,sample/NN,sample/VB,sample/VBP]  sentence[sentence/NN,sentence/VB,sentence/VBP].[./.,</S>]}}}
;Tag set : (in source code)
:- EN: Derived from [[Penn Treebank English POS tag set]] see [[LanguageTool English Tagset]]
:- FR: see [[LanguageTool French Tagset]]
File {{{languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/tagset.txt}}} in the source code :
These are mostly the tags of the Penn Treebank tagset as used by LanguageTool,
with examples. See "new tag" for tags introduced by LanguageTool.
For more details, also see

CC    Coordinating conjunction: and, or, either, if, as, since, once, neither, less
CD    Cardinal number: one, two, twenty-four
DT    Determiner: an, an, all, many, much, any, some, this
EX    Existential there: there (no other words)
FW    Foreign word: infinitum, ipso
IN    Preposition/subordinate conjunction: except, inside, across, on, through, beyond, with, without
JJ    Adjective: beautiful, large, inspectable
JJR   Adjective, comparative: larger, quicker
JJS   Adjective, superlative: largest, quickest
LS    List item marker: not used by LanguageTool
MD    Modal: should, can, need, must, will, would
NN    Noun, singular or mass: bicycle, earthquake, zipper
NNS   Noun, plural: bicycles, earthquake, zippers
NN:U    Mass noun		#new tag - deviation from Penn: admiration, air, Afrikaans
NN:UN    Noun used as mass	#new tag - deviation from Penn: establishment, wax, afternoon
NNP   Proper noun, singular: Denver, DORAN, Alexandra
NNPS  Proper noun, plural: Buddhists, Englishmen
PDT   Predeterminer: all, sure, such, this, many, half, both, quite
POS   Possessive ending: s (as in: Peter's)
PRP   Personal pronoun: everyone, I, he, it, myself
PRP$  Possessive pronoun: its, our, their, mine, my, her, his, your
RB    Adverb and negation: easily, sunnily, suddenly, specifically, not
RBR   Adverb, comparative: better, faster, quicker
RBS   Adverb, superlative: best, fastest, quickest
RP    Particle: in, into, at, off, over, by, for, under
SYM   Symbol: not used by LanguageTool
TO    to: to (no other words)
UH    Interjection: aargh, ahem, attention, congrats, help
VB    Verb, base form: eat, jump, believe
VBD   Verb, past tense: ate, jumped, believed
VBG   Verb, gerund/present participle: eating, jumping, believing
VBN   Verb, past participle: eaten, jumped, believed
VBP   Verb, non-3rd ps. sing. present: eat, jump, believe
VBZ   Verb, 3rd ps. sing. present: eats, jumps, believes
WDT   wh-determiner: that, whatever, what, whichever, which (no other words)
WP    wh-pronoun: that, whatever, what, whatsoever, whosoever, who, whom, whoever, which (no other words)
WP$   Possessive wh-pronoun: whose (no other words)
WRB   wh-adverb: however, how, whereever, where, when, why
``    Left open double quote
Tagset used by : <<list filter [tag[tags:LT-FR]]>>

(?) Connection with Dicollecte POS : ?
file {{{languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/tagset.LT.txt}}} in the source code :



    Noun:                   N [gender] [number]
    Adjective:              J [gender] [number]

                [gender]    m   = masculine
                            f   = feminine
                            e   = epicene

                [number]    s   = singular
                            p   = plural
                            sp  = singular or plural

        N m s
        N f p
        J e sp

--  MISC  --

    Adverb:                     A
    Interjection:               I
    Onomatopeia:                O
    Cardinal number:            K
    Abbreviation:               S
    Proper name:                Z
    Marker                      M
--  VERBS  --

    Infinitive:             V inf
    Conjugation:            V [mood] [tense] [person] [number]
    Present participle:     V ppr
    Past participle:        V ppa [gender] [number]

                [mood]      ind     = indicative
                            con     = conditional
                            sub     = subjonctive
                            imp     = imperative
                [tense]     pres    = present
                            psim    = “passé simple”  (past: action done once)
                            impa    = “imparfait”     (past: action done regularly)
                            futu    = future
                [person]    1       = first person
                            2       = second person
                            3       = third person
                [number]    s       = singular
                            p       = plural
        avoir               “V avoir”  instead of  “V”
        être                “V etre”   instead of  “V”
        V ind pres 1 p
        V ind psim 3 p
        V sub impa 3 s
        V inf
        V ppa f p


    preposition:                    P

    conjonction:                    C
    subordinating conjunction:      C sub
    coordinating conjunction:       C coor
    determiner:                     D
                                    D [gender] [number]
    pronoun:                        R [gender] [number]
    relative pronoun:               R rel [gender] [number]
    demonstrative pronoun:          R dem [gender] [number]
    reflexive pronoun:              R refl [person] [gender] [number]
    personal pronoun:               R pers [to] [gender] [number]
                                    R pers [to] [number] 
                        [gender]    m   = masculine
                                    f   = feminine
                                    e   = epicene

                        [number]    s   = singular
                                    p   = plural

                        [person]    1   = first person
                                    2   = second person
                                    3   = third person

                        [to]        suj = subject
                                    obj = object
|Source|derived from|
|Version|0 (2.0.6)|
|Author|GMM (Eric Shulman)|
|Description|list of tags with full boolean expressions (AND, OR, NOT, nested parentheses, and regex)|
> Something like [[MatchTagsPluginInfo]] but create list of tags
> Define the macro //listTags//
> Only the //inline// mode is implemented
Format take 3 variables :
* %0 : tag
* %1 : tag macro with the tag (i.e. {{{<<tag [[%0]]>>}}})
* %2 : {{{>>}}} (to use other macros, by example {{{"<<tag [[%0]]%2"}}} is equivalent of {{{"%1"}}})
;Input : utf-8
:- tokenized and segmented in sentences (__one sentence per line__), following the French Treebank conventions (
:- raw (no tokenization, no segmentation in sentences), you can activate MElt's embedded lightweight tokenizer by using the '-t' option.
;Tag set :
:[[CC Tagset]] (Crabbé & Candito, 2008)

> A high-accuracy MEMM POS-tagger for French trained on French Treebank and Lefff.
MEMM : [[Maximum-Entropy Markov Model|]]

MElt is a Python implementation of the MaxEnt Markov Model part-of-speech tagger
described in:
* P. Denis and B. Sagot. 2009. Coupling an annotated corpus and a
morphosyntactic lexicon for state-of-the-art POS tagging with less
human effort. In Proc. of PACLIC 23, Hong Kong, China.
* P. Denis and B. Sagot. 2010. Exploitation d'une ressource lexicale pour la 
construction d'un Etiqueteur morphosyntaxique Etat-de-l'art du français.  
In Proc. of TALN 2010, Montreal, Canada.

> To successfully run MElt, you need to install Numpy along with a fairly recent version of Python - we've tried 2.5 and 2.6.
marquage d'unités/expressions multi-mots (Multi-Words Expression)
Ce wiki rassemble quelques informations sur différents POS taggers travaillant avec le français ainsi que quelques autres POS taggers importants (travailllant au moins avec l'anglais).

Les différents tags servent à indexer les articles :
* fonctionalités : ''<<tag POS-tagger "POS-tagger" "Part-of-Speech tagger">>'', <<tag tag-set "tag-set" "jeux de tags">>, <<tag Morpho Morpho "analyse morphologique">>, <<tag Chunker>>, <<tag Parser>>, <<tag Tokenization>>, <<tag [[Sentence Segmentation]]>>, <<tag [[Named Entity]]>>, ''<<tag MWE "MWE" "MultiWord Expression/Unit">>'', <<tag Coreference>>, ''<<tag Toolkit>>'', ...
* langues : français : ''<<tag lang:FR lang:FR français>>'', anglais: <<tag lang:EN lang:EN anglais>>, autres langues :<<listTags inline "%1" ", " lang:.* AND NOT lang:FR AND NOT lang:EN AND NOT lang:FR? >>
* entrées (préfixe "in:") : //raw text// <<tag in:raw>>, //tokenized// <<tag in:sent>>, <<tag in:tok>>, <<tag in:cols>>, //xml// <<tag in:xml>> (nota: le tag <<tag format>> sert pour les formats d'entrée/sortie spécifiques) (autre entrée : <<listTags inline "%1" ", " in:.* AND NOT ( in:raw OR in:sent OR in:tok OR in:cols OR in:xml)>>)
* algorithmes utilisés (préfixe "algo:") :  <<listTags inline "%1" ", " algo:.*>>
* langage de programation (préfixe "src:")  :  <<listTags inline "%1" ", " src:.*>>
* licence (préfixe "licence:") :   <<listTags inline "%1" ", " licence:.*>>

Voici la liste des articles de POS-taggers pour le français : <<matchTags popup "label:POS-tagger & lang:FR" POS-tagger AND lang:FR>> <<matchTags "#[[%0]]<br>^^%6^^" "\n" POS-tagger AND lang:FR>>
Et la liste des autres POS-taggers : <<matchTags popup "label:POS-tagger & !lang:FR" POS-tagger AND NOT lang:FR>>  <<matchTags "#[[%0]]<br>^^%6^^" "\n" POS-tagger AND NOT lang:FR>>

Liste de tag sets : (tag <<tag tag-set>>):
* pour le français : <<matchTags popup "label:tag-set & lang:FR" tag-set AND lang:FR>> <<matchTags "#[[%0]]" "\n" tag-set AND lang:FR>>
* pour l'anglais : <<matchTags popup "label:tag-set & lang:EN" tag-set AND lang:EN>> <<matchTags "#[[%0]]" "\n" tag-set AND lang:EN>>

L'article [[POS tagger links]] contients quelques liens de listes ou de comparaisons de POS taggers.
<<tagsTree menu-pages "" 2 4 index prettyname>>
<<tagsTree menu "" 1 4 index prettyname>>

<<matchTags popup "label:POS-taggers pour le français" POS-tagger AND lang:FR>>
<<matchTags inline "[[%0]]" "\n" POS-tagger AND lang:FR>>

<<matchTags popup "label:Other POS-tagger" POS-tagger AND NOT lang:FR>>
<<matchTags inline "[[%0]]" "\n" POS-tagger AND NOT lang:FR>>

<<matchTags popup "label:Other tools" (Morpho OR Chunker OR Parsing OR Tokenization OR Sentence Segmentation OR Named Entity OR Coreference) AND NOT POS-tagger>>
<<matchTags inline "[[%0]]" "\n" (Morpho OR Chunker OR Parsing OR Tokenization OR Sentence Segmentation OR Named Entity OR Coreference OR Toolkit) AND NOT POS-tagger>>

<<tag lang:FR "Tous les outils pour le français" "outils/tagsets pour le français">>
<<tag lang:EN "All tools for English" "outils/tagset pour l'anglais">>
[img[]] """MAchine Learning for LanguagE Toolkit"""

Possibilité de //Sequence Tagging// (CRF) : [[Quick Start|]] [[Developer's Guide|]]

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. [[Quick Start|]] [[Developer's Guide|]]

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers. [[Quick Start|]] [[Developer's Guide|]]

Topic models are useful for analyzing large collections of unlabeled text. The MALLET topic modeling toolkit contains efficient, sampling-based implementations of Latent Dirichlet Allocation, Pachinko Allocation, and Hierarchical LDA. [[Quick Start|]]

Many of the algorithms in MALLET depend on numerical optimization. MALLET includes an efficient implementation of Limited Memory BFGS, among many other optimization methods. [[Developer's Guide|]]

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors. [[Quick Start|]] [[Developer's Guide|]]

An add-on package to MALLET, called GRMM, contains support for inference in general graphical models, and training of CRFs with arbitrary graphical structure. [[About GRMM|]] 

Malt-TAB is a text-based representation, which is mainly used by MaltParser. Malt-TAB contains a subset of the features in Malt-XML, and attributes are implicitly defined by their position. Each word is represented on one line, with attribute values being separated by tabs. The required order of attributes is as follows:
{{{form (required) < postag (required) < head (optional) < deprel (optional)}}}

Although ''head'' and ''deprel'' are optional, they must either both be included or both be omitted. (Normally, all four columns are present in the input when training the parser and in the output when parsing, while only ''form'' and ''postag'' are present in the input when parsing.) Please note also that the ''id'' attribute is not represented explicitly at all. Words in a sentence are separated by one newline; sentences are separated by one additional newline. A dependency tree for the Swedish sentence "Genom skattereformen införs individuell beskattning (särbeskattning) av arbetsinkomster." can be represented as follows:

|Genom |pp |3 |ADV |
|skattereformen |nn.utr.sin.def.nom |1 |PR |
|införs |vb.prs.sfo | 0 |ROOT |
|individuell |jj.pos.utr.sin.ind.nom |5 |ATT |
|beskattning |nn.utr.sin.ind.nom |3 |SUB |
|( |pad |5 |IP |
|särbeskattning |nn.utr.sin.ind.nom |5 |APP |
|) |pad | 5 |IP |
|av |pp |5 |ATT |
|arbetsinkomster |nn.utr.plu.ind.nom |9 |PR |
|. |mad |3 |IP |

;Output/Trainning data :
:- [[CoNLL data format]] 1 token/line, sentence separed by blank line, 10 columns
:- [[Malt-TAB]] 1 token/line, sentence separed by blank line, 4 columns
;Input :
:- the first six columns of the [[CoNLL data format]]
:- [[Malt-TAB]] 1 token/line, 2 columns

MaltParser can be characterized as a data-driven parser-generator. While a traditional parser-generator constructs a parser given a grammar, a data-driven parser-generator constructs a parser given a treebank. MaltParser is an implementation of inductive dependency parsing, where the syntactic analysis of a sentence amounts to the derivation of a dependency structure, and where inductive machine learning is used to guide the parser at nondeterministic choice points (Nivre, 2006). The parsing methodology is based on three essential components:
#    Deterministic parsing algorithms for building labeled dependency graphs (Kudo and Matsumoto,2002; Yamada and Matsumoto, 2003; Nivre,2003)
#    History-based models for predicting the next parser action at nondeterministic choice points (Black et al., 1992; Magerman, 1995; Ratnaparkhi, 1997; Collins, 1999)
#    Discriminative learning to map histories to parser actions (Kudo and Matsumoto, 2002; Yamada and Matsumoto, 2003; Nivre et al., 2004; Hall et al., 2006)
MaltParser implements nine deterministic parsing algorithms:
*    Nivre arc-eager
*    Nivre arc-standard
*    Covington non-projective
*    Covington projective
*    Stack projective
*    Stack swap-eager
*    Stack swap-lazy
*    Planar (implemented by Carlos Gómez-Rodríguez)
*    2-planar (implemented by Carlos Gómez-Rodríguez)

MaltParser allows users to define feature models of arbitrary complexity.

MaltParser currently includes two machine learning packages (thanks to Sofia Cassel for her work on LIBLINEAR):
*    LIBSVM - A Library for Support Vector Machines (Chang, 2001).
*    LIBLINEAR -- A Library for Large Linear Classification (Fan et al., 2008).

MaltParser can also be turned into a phrase structure parser that recovers both continuous and discontinuous phrases with both phrase labels and grammatical functions (Hall and Nivre, 2008a; Hall and Nivre, 2008b).
|Author|Eric Shulman|
|Description|'tag matching' with full boolean expressions (AND, OR, NOT, and nested parentheses)|
> see [[MatchTagsPluginInfo]]
2011.10.28 2.0.6 added .matchTags CSS class to popups to enable custom styling via StyleSheet
2011.01.23 2.0.5 fix core tweak for TW262+: adjust code in config.filters['tag'] instead of filterTiddlers()
2010.08.11 2.0.4 in getMatchingTiddlers(), fixed sorting for descending order (e.g, "-created")
| please see [[MatchTagsPluginInfo]] for additional revision details |
2008.02.28 1.0.0 initial release
version.extensions.MatchTagsPlugin= {major: 2, minor: 0, revision: 6, date: new Date(2011,10,28)};

// store.getMatchingTiddlers() processes boolean expressions for tag matching
//    sortfield (optional) sets sort order for tiddlers - default=title
//    tiddlers (optional) use alternative set of tiddlers (instead of current store)
TiddlyWiki.prototype.getMatchingTiddlers = function(tagexpr,sortfield,tiddlers) {

	var debug=config.options.chkDebug; // abbreviation
	var cmm=config.macros.matchTags; // abbreviation
	var r=[]; // results are an array of tiddlers
	var tids=tiddlers||store.getTiddlers();
	if (tids && sortfield) tids=store.sortTiddlers(tids,sortfield);
	if (debug) displayMessage(cmm.msg1.format([tids.length]));

	// try simple lookup to quickly find single tags or tags that
	// contain boolean operators as literals, e.g. "foo and bar"
	for (var t=0; t<tids.length; t++)
		if (tids[t].isTagged(tagexpr)) r.pushUnique(tids[t]);
	if (r.length) {
		if (debug) displayMessage(cmm.msg4.format([r.length,tagexpr]));
		return r;
	// convert expression into javascript code with regexp tests,
	// so that "tag1 AND ( tag2 OR NOT tag3 )" becomes
	// "/\~tag1\~/.test(...) && ( /\~tag2\~/.test(...) || ! /\~tag3\~/.test(...) )"

	// normalize whitespace, tokenize operators, delimit with "~"
	var c=tagexpr.trim(); // remove leading/trailing spaces
	c = c.replace(/\s+/ig," "); // reduce multiple spaces to single spaces
	c = c.replace(/\(\s?/ig,"~(~"); // open parens
	c = c.replace(/\s?\)/ig,"~)~"); // close parens
	c = c.replace(/(\s|~)?&&(\s|~)?/ig,"~&&~"); // &&
	c = c.replace(/(\s|~)AND(\s|~)/ig,"~&&~"); // AND
	c = c.replace(/(\s|~)?\|\|(\s|~)?/ig,"~||~"); // ||
	c = c.replace(/(\s|~)OR(\s|~)/ig,"~||~"); // OR
	c = c.replace(/(\s|~)?!(\s|~)?/ig,"~!~"); // !
	c = c.replace(/(^|~|\s)NOT(\s|~)/ig,"~!~"); // NOT
	c = c.replace(/(^|~|\s)NOT~\(/ig,"~!~("); // NOT(
	// change tag terms to regexp tests
	var terms=c.split("~"); for (var i=0; i<terms.length; i++) { var t=terms[i];
		if (/(&&)|(\|\|)|[!\(\)]/.test(t) || t=="") continue; // skip operators/parens/spaces
		if (t==config.macros.matchTags.untaggedKeyword)
			terms[i]="tiddlertags=='~~'"; // 'untagged' tiddlers
	c=terms.join(" ");
	if (debug) { displayMessage(cmm.msg2.format([tagexpr])); displayMessage(cmm.msg3.format([c])); }

	// scan tiddlers for matches
	for (var t=0; t<tids.length; t++) {
	 	// assemble tags from tiddler into string "~tag1~tag2~tag3~"
		var tiddlertags = "~"+tids[t].tags.join("~")+"~";
		try { if(eval(c)) r.push(tids[t]); } // test tags
		catch(e) { // error in test
			break; // skip remaining tiddlers
	if (debug) displayMessage(cmm.msg4.format([r.length,tagexpr]));
	return r;
config.macros.matchTags = {
	msg1: "scanning %0 input tiddlers",
	msg2: "looking for '%0'",
	msg3: "using expression: '%0'",
	msg4: "found %0 tiddlers matching '%1'",
	noMatch: "no matching tiddlers",
	untaggedKeyword: "-",
	untaggedLabel: "no tags",
	untaggedPrompt: "show tiddlers with no tags",
	defTiddler: "MatchingTiddlers",
	defTags: "",
	defFormat: "[[%0]]",
	defSeparator: "\n",
	reportHeading: "Found %0 tiddlers tagged with: '{{{%1}}}'\n----\n",
	handler: function(place,macroName,params,wikifier,paramString,tiddler) {
		var mode=params[0]?params[0].toLowerCase():'';
		if (mode=="inline")
		if (mode=="report" || mode=="panel") {
			var target=params.shift()||this.defTiddler;
		if (mode=="popup") {
			if (params[0]&&params[0].substr(0,6)=="label:") var label=params.shift().substr(6);
			if (params[0]&&params[0].substr(0,7)=="prompt:") var prompt=params.shift().substr(7);
		} else {
			var fmt=(params.shift()||this.defFormat).unescapeLineBreaks();
			var sep=(params.shift()||this.defSeparator).unescapeLineBreaks();
		var sortBy="+title";
		if (params[0]&&params[0].substr(0,5)=="sort:") sortBy=params.shift().substr(5);
		var expr = params.join(" ");
		if (mode!="panel" && (!expr||!expr.trim().length)) return;
		if (expr==this.untaggedKeyword)
			{ var label=this.untaggedLabel; var prompt=this.untaggedPrompt };
		switch (mode) {
			case "popup": this.createPopup(place,label,expr,prompt,sortBy); break;
			case "panel": this.createPanel(place,expr,fmt,sep,sortBy,target); break;
			case "report": this.createReport(target,this.defTags,expr,fmt,sep,sortBy); break;
			case "inline": default: this.createInline(place,expr,fmt,sep,sortBy); break;
	formatList: function(tids,fmt,sep) {
		var out=[];
		for (var i=0; i<tids.length; i++) { var t=tids[i];
			var title=t.title;
			var who=t.modifier;
			var when=t.modified.toLocaleString();
			var text=t.text;
			var first=t.text.split("\n")[0];
			var desc=store.getTiddlerSlice(t.title,"description");
			var tags=t.tags.length?'[['+t.tags.join(']] [[')+']]':'';
		return out.join(sep);
	createInline: function(place,expr,fmt,sep,sortBy) {
	createPopup: function(place,label,expr,prompt,sortBy) {
		var btn=createTiddlyButton(place,
			function(ev){ return config.macros.matchTags.showPopup(this,ev||window.event); });
	showPopup: function(here,ev) {
		var p=Popup.create(here,null,"matchTags popup"); if (!p) return false;
		var tids=store.getMatchingTiddlers(here.getAttribute("expr"));
		var list=[]; for (var t=0; t<tids.length; t++) list.push(tids[t].title);
		if (!list.length) createTiddlyText(p,this.noMatch);
		else {
			var b=createTiddlyButton(createTiddlyElement(p,"li"),
				function() {
					var list=this.getAttribute("list").readBracketedList();
			b.setAttribute("list","[["+list.join("]] [[")+"]]");
		var out=this.formatList(tids," &nbsp;[[%0]]&nbsp; ","\n"); wikify(out,p);;
		if(ev.stopPropagation) ev.stopPropagation();
		return false;
	createReport: function(target,tags,expr,fmt,sep,sortBy) {
		var tids=store.sortTiddlers(store.getMatchingTiddlers(expr),sortBy);
		if (!tids.length) { displayMessage('no matches for: '+expr); return false; }
		var msg=config.messages.overwriteWarning.format([target]);
		if (store.tiddlerExists(target) && !confirm(msg)) return false;
		var out=this.reportHeading.format([tids.length,expr])
		store.saveTiddler(target,target,out,config.options.txtUserName,new Date(),tags,{});
		story.closeTiddler(target); story.displayTiddler(null,target);
	createPanel: function(place,expr,fmt,sep,sortBy,tid) {
		var s=createTiddlyElement(place,"span"); s.innerHTML=store.getTiddlerText("MatchTagsPlugin##html");
		var f=s.getElementsByTagName("form")[0];
		f.expr.value=expr; f.fmt.value=fmt; f.sep.value=sep.escapeLineBreaks();
		f.tid.value=tid; f.tags.value=this.defTags;
<form style='display:inline;white-space:nowrap'>
<input type='text'    name='expr' style='width:50%' title='tag expression'><!--
--><input type='text'    name='fmt'  style='width:10%' title='list item format'><!--
--><input type='text'    name='sep'  style='width:5%'  title='list item separator'><!--
--><input type='text'    name='tid'  style='width:12%' title='target tiddler title'><!--
--><input type='text'    name='tags' style='width:10%' title='target tiddler tags'><!--
--><input type='button'  name='go'   style='width:8%'  value='go' onclick="
	var expr=this.form.expr.value;
	if (!expr.length) { alert('Enter a boolean tag expression'); return false; }
	var fmt=this.form.fmt.value;
	if (!fmt.length) { alert('Enter the list item output format'); return false; }
	var sep=this.form.sep.value.unescapeLineBreaks();
	var tid=this.form.tid.value;
	if (!tid.length) { alert('Enter a target tiddler title'); return false; }
	var tags=this.form.tags.value;
	return false;">
// SHADOW TIDDLER for displaying default panel input form
config.shadowTiddlers.MatchTags="<<matchTags panel>>";
// TWEAK core filterTiddlers() or config.filters['tag'] (in TW262+)
// to use getMatchingTiddlers instead getTaggedTiddlers
// for enhanced boolean matching in [tag[...]] syntax
var TW262=config.filters && config.filters['tag']; // detect TW262+
var fname=TW262?"config.filters['tag']":"TiddlyWiki.prototype.filterTiddlers";
var code=eval(fname).toString().replace(/getTaggedTiddlers/g,'getMatchingTiddlers');
// REDEFINE core handler for enhanced boolean matching in tag:"..." paramifier
// use filterTiddlers() instead of getTaggedTiddlers() to get list of tiddlers.
config.paramifiers.tag = {
	onstart: function(v) {
		var tagged = store.filterTiddlers("[tag["+v+"]]");
|Author|Eric Shulman|
|Description|documentation for MatchTagsPlugin|
This plugin extends the {{{[tag[tagname]]}}} macro parameter syntax used by the TiddlyWiki core {{{<<list>>}}} macro so that, instead of a simple tagname value, you can specify a complex combination of tagname values using a //boolean expression// containing AND, OR, and NOT operators, enclosed in nested parentheses if needed.
<<list filter "[tag[expression]]">>
In addition, the plugin defines a new macro, {{{<<matchTags ...>>}}} that can be used instead of the core {{{<<list>>}}} macro to output a list of matching tiddlers //using a custom 'item format' and 'separator'//.  You can also use this macro to create a command link that displays the matching tiddlers within a popup list, similar to the standard {{{<<tag tagName>>}}} macro, but matching a combination of tag values rather than a single tag value.
<<matchTags inline "format" "separator" sort:fieldname tag expression>>
<<matchTags popup "label:..." "prompt:..." sort:fieldname tag expression>>
<<matchTags report TiddlerName "format" "separator" sort:fieldname tag expression>>
<<matchTags panel  Tiddlername "format" "separator" sort:fieldname tag expression>>
* ''inline'', ''report'', ''panel'', and ''popup''<br>are keywords that indicate the type of output that the macro should produce:
** ''inline'' //(default)// - displays a list of matching tiddlers embedded directly in tiddler content
** ''popup'' - embeds a command button that, when clicked, lists matching tiddlers in a ~TiddlyWiki popup display
** ''report'' - generates a list of matching tiddler in a separate [[MatchingTiddlers]] report tiddler
** ''panel'' - displays an interactive form for generating a [[MatchingTiddlers]] report
* ''format''<br>defines the wiki-syntax for rendering list items.  The following //substitution markers// can be used to insert tiddler-specific information for each matched tiddler:
** {{{%0}}} - title
** {{{%1}}} - modifier (author)
** {{{%2}}} - modified (date of last change)
** {{{%3}}} - text (all tiddler content)
** {{{%4}}} - firstline (tiddler content up to the first newline)
** {{{%5}}} - description (tiddler slice or section content named "description" or "Description")
** {{{%6}}} - tags (space-separated, bracketed list)
* ''separator''<br>defines the wiki-syntax to use //between// each matching title (e.g., ", " creates a comma-separated list, while "\n" displays one tiddler per line).
* ''sort:fieldname'' (optional)<br>specifies the sort order for the resulting list of tiddlers.  You can specify any tiddler field name (standard or custom-defined).  Standard tiddler fieldnames include: //title, created, modified, modifier//.  If not specified, tiddlers are sorted by title.  You can prefix the fieldname with "+" or "-" to indicate ascending or descending order, respectively.
* ''tag expression''<br>the remaining parameter(s) are joined together to define the boolean expression to be matched.
When using the ''popup'' option, there are two additional (and optional) parameters you can specify:
* ''"label:..."''(optional)<br> indicates the text for the popup command link.  The default is to display the specified tag expression itself.
* ''"prompt:..."'' (optional)<br>indicates the mouseover 'tooltip' for the popup command link.
* note: you can apply custom CSS styles (e.g., font size) to the popup by adding a rule for ".matchTags .popup" to your [[StyleSheet]].
When using the ''report'' or ''panel'' option, an additional parameter may be provided:
* ''~TiddlerName''<br>specifies the target tiddler into which the output will be generated (default: [[MatchingTiddlers]])
*A tag expression can use any combination of text operators: ''AND'', ''OR'', ''NOT'' (or their equivalent javascript operators: ''&&'', ''||'', ''!''), contained in nested parentheses as needed.
*Operators should be delimited by spaces or parentheses.
*Before matching, leading/trailing spaces are automatically trimmed and multiple spaces are reduced to single spaces.
*Tag values containing embedded spaces do //not// have to be enclosed in {{{[[...]]}}}.
*Tag values that contain boolean operators as ''literal text'' (e.g., {{{"foo and bar"}}} or {{{"foo && bar"}}} cannot be used within a compound boolean expression, but //can// be matched if specified by themselves, without any other tag values or operators.
*To match tiddlers that are untagged, use "-" as a special tag value within the expression.
*You can match "wildcard" tags  by using //regular expression// (i.e., "text pattern") syntax within a tag value, e.g. {{{[Tt]agvalue.*}}}
display a popup list:
<<matchTags popup sample OR (settings AND systemConfig)>>
><<matchTags popup sample OR (settings AND systemConfig)>>
display a popup list with custom label:
<<matchTags popup "label:samples and settings" sample OR (settings AND systemConfig)>>
><<matchTags popup "label:samples and settings" sample OR (settings AND systemConfig)>>
display a popup list of untagged tiddlers:
<<matchTags popup ->>
><<matchTags popup ->>
generate a report using interactive form control panel
<<matchTags panel "MatchingTiddlers" "[[%0]]" "\n" sample OR (settings AND systemConfig)>>
>{{smallform{<<matchTags panel "MatchingTiddlers" "[[%0]]" "\n" sample OR (settings AND systemConfig)>>}}}
comma-separated list:
<<matchTags "[[%0]]" ", " sample OR (settings AND systemConfig)>>
><<matchTags "[[%0]]" ", " sample OR (settings AND systemConfig)>>
numbered list (sorted by modification date, most recent first):
<<matchTags "#[[%0]] (%2)<br>^^%5^^" "\n" sort:-modified sample OR (settings AND systemConfig)>>
><<matchTags "#[[%0]] (%2)<br>^^%5^^" "\n" sort:-modified sample OR (settings AND systemConfig)>>
bullet-item list (using the TiddlyWiki core {{{<<list filter ...>>}}} macro):
//(Note: when using the core {{{<<list>>}}} macro, you should always enclose the entire tag filter parameter within quotes)//
<<list filter "[tag[sample OR (settings AND systemConfig)]]">>
><<list filter "[tag[sample OR (settings AND systemConfig)]]">>
2011.10.28 2.0.6 added .matchTags CSS class to popups to enable custom styling via StyleSheet
2011.01.22 2.0.5 fix core tweak for TW262+: adjust code in config.filters['tag'] instead of filterTiddlers()
2010.08.11 2.0.4 in getMatchingTiddlers(), fixed sorting for descending order (e.g, "-created")
2010.03.02 2.0.3 added %6 format (tags)
2010.03.01 2.0.2 in formatList(), don't automatically put '[[' and ']]' around title (%0) in formatted output
2009.08.29 2.0.1 added support for {{{config.macros.matchTags.defTags}}} to auto-tag [[MatchingTiddlers]] output
2008.09.04 2.0.0 added "report" and "panel" options to generate formatted results and store in a tiddler.  Also, added config.macros.matchTags.formatList(place,fmt,sep) API to return formatted output for use with other plugins/scripts
2008.09.01 1.9.2 fixed return value from popup button handler so IE doesn't attempt to leave the page
2008.08.31 1.9.1 improved expression conversion handling to permit use of regular expressions for "wildcard" matching within tag values
2008.06.12 1.9.0 added support for formatted output of: title, who, when, text, firstline, description (slice or section)
2008.06.05 1.8.0 in getMatchingTiddlers(), added optional sortfield and tiddlers params to support use of alternative set of tiddlers instead of using current store content (provides filtering support for ImportTiddlersPlugin)
2008.06.04 1.7.1 in getMatchingTiddlers(), reworked conversion of expression for more robust parsing of whitespace, parentheses and javascript operators and allow use of "-" (untagged) //within// expressions
2008.05.19 1.7.0 in getMatchingTiddlers(), use reverseLookup() instead of forEachTiddler() to permit access to tiddlers included via [[IncludePlugin|]]
2008.05.17 1.6.0 in getMatchingTiddlers(), rewrote expression conversion to handle tags with spaces tag values that are substrings of other tag values.
2008.05.16 1.5.0 added special case using "-" to find UNTAGGED tiddlers
2008.05.15 1.4.0 added "popup" output option
2008.05.14 1.3.4 instead of hijacking getTaggedTiddlers(), added tweak of filterTiddlers() prototype to replace getTaggedTiddlers() with getMatchingTiddler() so that core use of getTaggedTiddlers() does not perform boolean processing of tiddler titles such as [[To Be or not To Be]].  Also, improved "filter error" messages in getMatchingTiddlers() to report tag expression in addition to actual eval error.
2008.04.25 1.3.3 in getTaggedTiddlers(), fixed handling for "not" embedded within a tag
2008.04.21 1.3.2 in getTaggedTiddlers(), fixed handling for initial "NOT" and "NOT(expr)" syntax
2008.04.20 1.3.1 in getTaggedTiddlers(), corrected check for boolean expression to avoid excess processing of tags containing spaces.  Also, improved handling for non-existing tags that contain text of existing tags
2008.04.19 1.3.0 in filterTiddlers(), use getTaggedTiddlers() instead of matchTags(), and then hijack getTaggedTiddlers() to add matchTags() handling
2008.04.19 [*.*.*] plugin size reduction: moved documentation to [[MatchTagsPluginInfo]]
2008.03.25 1.2.0 added optional "sort:fieldname" parameter
2008.03.20 1.1.2 in handler(), replace 'encodeTiddlyLink' with explicit [[...]] brackets to ensure that one-word tiddler titles are properly rendered as TiddlyLinks
2008.02.29 1.1.1 in matchTags(), added handling to skip remaining tiddlers if expression has an error
2008.02.29 1.1.0 refactored to define store.matchTags() and extend store.filterTiddlers()
2008.02.28 1.0.0 initial release

;Input : (''UTF-8'')
:tokenized text, 1 token/line, blank line separate sentences
;Output/Training data :
:one token per line, with three columns separated by __spaces or tabs__. The columns contain word form, lemma and morphological tag respectively. Sentences are separated by an empty line. Text should be encoded in UTF-8.
;Lexicon format :
:one entry per line, fields in each entry separated by whitespace in the UTF-8 encoding.
;Tag sets:
:ES: Model trained on approx. 168000 tokens from the __Spanish AnCora treebank__ (Marti et al 2007).Lexicon features extracted from the __Spanish Resource Grammar project__, contains over 556,000 word forms
:FR: Model trained on approx. 277000 tokens from the __French Treebank__ (Abeillé et al, 2003). Lexicon features extracted from the __LeFFF lexicon__ (Sagot et al, 2006). The version we used contains over 225000 word forms

>  Morfette is a tool for supervised learning of inflectional morphology. Given a corpus of sentences annotated with lemmas and morphological labels, and optionally a lexicon, morfette learns how to morphologically analyse new sentences.

> In the learning stage Morfette fits two separate logistic regression models: one for morphological tagging and one for lemmatization. The predictions of the models are combined dynamically and produce a globally plausible sequence of morphological-tag - lemma pairs for a sentence.

> In Morfette lemmatization is cast as a classification task where a a lemmatization class corresponds to the specification of the edit operations which are needed to transform the inflected word form into the corresponding lemma.

> The basic approach is described in (Chrupala et al 2008 and Chrupala 2008). The current version of Morfette uses an averaged perceptron to fit the models, rather than Maximum Entropy training. The lemmatization classes are Edit-Tree-based as described in (Chrupala 2008).
étiquetage morphosyntaxique (plus fin que les POS, et souvent non-désambiguïsé)

Morpho-lexical tools
*    [[MtLex|]] -- Multext lexical access tools
*    [[MtTag|]] - Multext POS disambiguator and related utilities
*    [[MtMorph|]] - Multext morphological tools
Text segmentation tools
*    [[MtSeg|]] - Text segmenter

''MtTag'' (POS disambiguator) ne semble plus disponible, l'ISSCO de l'Université de Genève propose un succésseur : [[TATOO]]

*  Association for Computational Linguistics wiki
** Part-of-speech tagging :
** POS Tagging (State of the art) :
* Stanford list of NLP resources :
* TAUS Directory of tools, section //POS and lemma annotators (taggers)// :

analyse syntaxique profonde
Tagset used by : <<list filter [tag[tags:Penn]]>>

Aoife Cahill's list :

|CC  |Coordinating conjunction <br>e.g. and,but,or...|
|CD  |Cardinal Number |
|DT  |Determiner |
|EX  |Existential there |
|FW  |Foreign Word |
|IN  |Preposision or subordinating conjunction |
|JJ  |Adjective |
|JJR  |Adjective, comparative |
|JJS  |Adjective, superlative |
|LS  |List Item Marker |
|MD  |Modal <br>e.g. can, could, might, may...|
|NN  |Noun, singular or mass |
|NNP  |Proper Noun, singular |
|NNPS  |Proper Noun, plural |
|NNS  |Noun, plural |
|PDT  |Predeterminer <br>e.g. all, both ... when they precede an article|
|POS  |Possessive Ending <br>e.g. Nouns ending in 's|
|PRP  |Personal Pronoun <br>e.g. I, me, you, he...|
|PRP$  |Possessive Pronoun <br>e.g. my, your, mine, yours...|
|RB  |Adverb <br>Most words that end in -ly as well as degree words like quite, too and very|
|RBR  |Adverb, comparative <br>Adverbs with the comparative ending -er, with a strictly comparative meaning.|
|RBS  |Adverb, superlative |
|RP  |Particle |
|SYM  |Symbol <br>Should be used for mathematical, scientific or technical symbols|
|TO  |to |
|UH  |Interjection <br>e.g. uh, well, yes, my...|
|VB  |Verb, base form <br>subsumes imperatives, infinitives and subjunctives|
|VBD  |Verb, past tense <br>includes the conditional form of the verb to be|
|VBG  |Verb, gerund or persent participle |
|VBN  |Verb, past participle |
|VBP  |Verb, non-3rd person singular present |
|VBZ  |Verb, 3rd person singular present |
|WDT  |Wh-determiner <br>e.g. which, and that when it is used as a relative pronoun|
|WP  |Wh-pronoun <br>e.g. what, who, whom...|
|WP$  |Possessive wh-pronoun <br>e.g.|
|WRB  |Wh-adverb <br>e.g. how, where why|

|Punctuation Tags|h
|`` |

Autres liens :
* AMALGAM page :
* [[1993 Computational Linguistics article in PDF|]]
* TreeTagger :
SEM : Segmenteur-Étiqueteur Markovien

;Input :
:- tokenized text : 1 token/line
:- raw text (segmentation dite " maximale ")
;Output :
:le texte est réécrit sous forme de phrase avec son étiquetage POS et son parenthésage en chunks
:&nbsp;&nbsp;&nbsp;{{{(Tout/DET nouvel/ADJ organisme/NC public/ADJ national/ADJ)NP (sera/V implanté/VPP)VN (hors_d'/P Ile-de-France/NPP)PP (./PONCT)O}}}
:[[SEM Tagset]]

Use [[Wapiti]] a simple and fast discriminative sequence labelling toolkit (

Tagset used by : <<list filter [tag[tags:SEM]]>>

''POS'' - L'étiqueteur se base sur le [[jeu d'étiquettes morpho-syntaxiques de (Crabbé et al. 08)|CC Tagset]], auquel on a ajouté une étiquette récupérée de l'application sur le French TreeBank, CL, pour les clitiques dont on ne trouve pas la sous-catégorie, c'est-à-dire les explétifs qui ne sont explicitement considérés ni comme sujets, ni objets, ni réfléchis (typiquement " y " dans " il y a " ou le démonstratif " c' ").

Voici la liste exhaustive des étiquettes utilisées et la catégorie syntaxique à laquelle ils correspondent :
|ADJ |adjectif |
|ADJWH |adjectif interrogatif |
|ADV |adverbe |
|ADVWH |adverbe interrogatif |
|CC |conjonction de coordination |
|CL |pronom clitique |
|CLO |pronom clitique objet |
|CLR |pronom clitique réfléchi |
|CLS |pronom clitique sujet |
|CS |conjonction de subordination |
|DET |déterminant |
|DETWH |déterminant interrogatif |
|ET |mot tiré d'une langue étrangère |
|I |interjection |
|NC |nom commun |
|NPP |nom propre  |
|P |préposition |
|P+D |forme contractée préposition et déterminant |
|P+PRO |forme contractée préposition et pronom |
|PONCT |ponctuation |
|PREF |préfixe |
|PRO |pronom |
|PROREL |pronom relatif |
|PROWH |pronom interrogatif |
|V |verbe |
|VIMP |forme verbale à l'impératif |
|VINF |forme verbale à l'infinitif |
|VPP |participe passé |
|VPR |participe présent |
|VS |forme verbale au subjonctif  |

''Chunks'' - Quant à la liste des chunks, elle est directement liée à celle des étiquettes POS et comporte donc 6 grands types de groupes (têtes potentielles entre parenthèses) :
|__UNKNOWN__ |chunk de nature non-identifiée (ET) |
|AP |chunk adjectival (ADJ, ADJWH) |
|AdP |chunk adverbial (ADV, ADVWH, I) |
|CONJ |chunk conjonction (CC, CS) |
|NP |chunk nominal (CLO, CLR, CLS, NC, NPP, PRO, PROREL, PROWH) |
|PP |chunk prépositionnel (P, P+D, P+PRO) |
|VN |chunk verbal (V, VIMP, VINF, VPP, VPR, VS) |


;Input :
:- tokenized text : 1 token/line, the token is expected to be the first column of the line. Lines beginning with ’## ’ are ignored by the tagger
:- multicolumns : 1 token/line,  the token is expected to be the first column of the line. (The tag to predict takes the second column in the output.) The rest of the line may contain additional information. 
;Output :
:The predicted tag will take the second column in the output. The rest of the line remains unchanged.
;Trainning Data :
:Training data must be in column format, i.e. a token per line corpus in a sentence by sentence fashion. The column separator is the blank space. The token is expected to be the first column of the line. The tag to predict takes the second column in the output. The rest of the line may contain additional information. 

Here you can find information about the SVMTool, an open source generator of sequential taggers. The SVMTool has been developed at TALP Research Center NLP group , in Universitat Politècnica de Catalunya.

The SVMTool is a simple and effective generator of sequential taggers based on Support Vector Machine. We have applied the SVMTool to a number of NLP problems, such as Part-of-speech Tagging and Base Phrase Chunking, for different languages. The proposed SVM-based tagger is robust and flexible for feature modelling (including lexicalization), trains efficiently with almost no parameters to tune, and is able to tag thousands of words per second, which makes it really practical for real NLP applications. Regarding accuracy, the SVM-based tagger achieves a very competitive accuracy of 97.2% for English on the Wall Street Journal corpus, which is comparable to the best taggers reported up to date.

The SVM^^light^^ software implementation of Vapnik's Support Vector Machine [Vapnik, 1995] by Thorsten Joachims has been used to train the models. For further information on it see [[here|]] or visit

Through this web site you will be able to download the SVMTool software. You can also download several models to tag in different languages and models to deal with noisy and ungrammatical texts as those studied in the FAUST project. 
[img[]] SYGFRAN, analyse morpho-syntaxique, dévellopé par Jacques CHAUCHÉ au [[LIRMM|]]

> L'analyseur SYGFRAN est écrit en SYGMART. Il génère une analyse en constituants d'un énoncé en français.

Les différents fichiers sources sont accessibles ici (tous sont codés en utf8) :
*    [[Le fichier des variables|]]
*    [[Le fichier des formats|]]
*    [[Le fichier dictionnaire des segments (OPALE)|]]
*    [[Le fichier dictionnaire d'étiquettes (OPALE)|]]
*    [[Le fichier de la grammaire OPALE|]]
*    [[Le fichier dictionnaire d'analyse (TELESI)|]]
*    [[Le fichier de la grammaire TELESI|]]

> [[Serveur SYGFRAN en ligne|]]

<<search>><<closeAll>><<permaview>><<newTiddler>><<newJournal "DD MMM YYYY" "journal">><<saveChanges>><<tiddler TspotSidebar>>[[(on tiddlyspot)|]]<<slider chkSliderOptionsPanel OptionsPanel "options »" "Change TiddlyWiki advanced options">>
petite étude de POS taggers pour le français
POS Taggers for French


(@@TODO: check french language@@)

;Input :
:- raw text 
:- tokenized text 
:- xml (text content inside specific tag(s))
;Output :
:append POS to each word :
:&nbsp;&nbsp;&nbsp;{{{This/DT is/VBZ a/DT sample/NN sentence/NN}}}
;Tag set :
: EN : [[Penn Treebank English POS tag set]]
: FR : [[French TreeBank POS Tags]]

> This software is a Java implementation of the log-linear part-of-speech taggers described in these papers (if citing just one paper, cite the 2003 one):
>    Kristina Toutanova and Christopher D. Manning. 2000. [[Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger|]]. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70. 
>    Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. [[Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network|]]. In Proceedings of HLT-NAACL 2003, pp. 252-259. 
> The tagger was originally written by Kristina Toutanova. Since that time, Dan Klein, Christopher Manning, William Morgan, Anna Rafferty, Michel Galley, and John Bauer have improved its speed, performance, usability, and support for other languages.

> The system requires Java 1.6+ to be installed. Depending on whether you're running 32 or 64 bit Java and the complexity of the tagger model, you'll need somewhere between 60 and 200 MB of memory to run a trained tagger (i.e., you may need to give java an option like java -mx200m). Plenty of memory is needed to train a tagger. It again depends on the complexity of the model but at least 1GB is usually needed, often more.

> Part-of-speech name abbreviations: The English taggers use the Penn Treebank tag set. Here are some links to documentation of the Penn Treebank English POS tag set: [[1993 Computational Linguistics article in PDF|]], [[AMALGAM page|]], [[Aoife Cahill's list|]]. See the included README-Models.txt in the models directory for more information about the tagsets for the other languages.

> This software provides a GUI demo, a command-line interface, and an API. Simple scripts are included to invoke the tagger.
:- a record/field format as given by the Multext segmenter with sentence boundaries marked and with words annotated by lexical look-up (either mtlexax or mmorph look-up)
:- multi-colonnes
config.shadowTiddlers.TagsTreeStyleSheet +="/*}}}*/";

store.addNotification("TagsTreeStyleSheet", refreshStyles); 


config.shadowTiddlers.PageTemplate = config.shadowTiddlers.PageTemplate.replace(/id='mainMenu' refresh='content' /,"id='mainMenu' refresh='content' force='true' ")


;Input :
:tokenized : one token per line
;Ouput :
:- basic : the tagger adds a second column to each line, containing the tag for the word.
:&nbsp;&nbsp;&nbsp;{{{FORM POS}}}
:- optionally: the tagger emits alternative tags for each token, together with a probability distribution
:&nbsp;&nbsp;&nbsp;{{{FORM POS POS1 PROB1 POS2 PROB2 ...}}}
;Trainning data :
:one token per line, the first column is the word, the second column is the tag.
:&nbsp;&nbsp;&nbsp;{{{FORM POS}}}
;Tag set :
:DE: The German model is trained on the Saarbrücker [[German newspaper corpus|]] using the [[Stuttgart-Tübingen-Tagset|]].
:EN: [[Susanne Corpus|]] / [[Penn Treebank|]]

What is TnT?

TnT, the short form of Trigrams'n'Tags, is a very efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. The component for parameter generation trains on tagged corpora. The system incorporates several methods of smoothing and of handling unknown words.

TnT is not optimized for a particular language. Instead, it is optimized for training on a large variety of corpora. Adapting the tagger to a new language, new domain, or new tagset is very easy. Additionally, TnT is optimized for speed.

The tagger is an implementation of the Viterbi algorithm for second order Markov models. The main paradigm used for smoothing is linear interpolation, the respective weights are determined by deleted interpolation. Unknown words are handled by a suffix trie and successive abstraction. 
|~EditToolbar|+saveTiddler -cancelTiddler deleteTiddler|

;Input :
:raw text 
;Output :
;Tag sets :
:EN: [[Penn Treebank English POS tag set]]
:FR: [[French TreeBank POS Tags]]

> The TreeTagger is a tool for annotating text with part-of-speech and lemma information. It was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese, Swahili, Latin, Estonian and old French texts and is adaptable to other languages if a lexicon and a manually tagged training corpus are available.

>  The TreeTagger can also be used as a chunker for English, German, and French. The parameter file for the French chunker was kindly provided by Michel Généreux.

The tagger is described in the following two papers:
*    Helmut Schmid (1995): [[Improvements in Part-of-Speech Tagging with an Application to German|]]. Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.
*    Helmut Schmid (1994): [[Probabilistic Part-of-Speech Tagging Using Decision Trees|]]. Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.

''Links'' :
- //TreeTagger for Java// (tt4j) is a Java wrapper around the popular TreeTagger package by Helmut Schmid :
[img[]] Unstructured Information Management Applications

Article du blog UIAM-FR pour construire les modèle français d'un POS tagger HMM : [[Construire des modélisations du French Treebank pour le UIMA HMM Tagger|]]
<div class='title' macro='view title'></div>
<div class='subtitle'><span macro='view modifier link'></span>, <span macro='view modified date'></span> (<span macro='message views.wikified.createdPrompt'></span> <span macro='view created date'></span>)</div>
<div class='tagging' macro='tagging'></div>
<div class='tagged' macro='tags'></div>
<div class='viewer' macro='view text wikified'></div>
<div class='tagClear'></div>
Tagset used by : <<list filter [tag[tags:WSJ]]>>

|CC  |Coordinating conjunction  |
|CD  |Cardinal number  |
|DT  |Determiner  |
|EX  |Existential there  |
|FW  |Foreign word  |
|IN  |Preposition/subord. conjunction  |
|JJ  |Adjective  |
|JJR  |Adjective, comparative  |
|JJS  |Adjective, superlative  |
|LS  |List item marker  |
|MD  |Modal  |
|NN  |Noun, singular or mass  |
|NNS  |Noun, plural  |
|NNP  |Proper noun, singular  |
|NNPS  |Proper noun plural  |
|PDT  |Predeterminer  |
|POS  |Possessive ending  |
|PRP  |Personal pronoun  |
|PP$  |Possessive pronoun |
|RB  |Adverb |
|RBR  |Adverb, comparative |
|RBS  |Adverb, superlative |
|RP  |Particle |
|SYM  |Symbol (mathematical or scientific) |
|TO  |to |
|UH  |Interjection |
|VB  |Verb, base form |
|VBD  |Verb, past tense |
|VBG  |Verb, gerund/present participle |
|VBN  |Verb, past participle |
|VBP  |Verb, non-3rd ps. sing. present |
|VBZ  |Verb,3rd ps. sing. present |
|WDT  |wh-determiner |
|WP  |wh-pronoun |
|WP$  |Possessive wh-pronoun |
|WRB  |wh-adverb |

Conditional random field [[(wikipedia)|]]
Decision Tree [[(wikipedia)|]]
Hidden Markov model [[(wikipedia)|]]
Log-linear model [[(wikipedia)|]]
Maximum-entropy Markov model [[(wikipedia)|]]
Maximum entropy classifier [[(wikipedia)|]]

Perceptron [[(wikipedia)|]]
Rules based
Support vector machine [[(wikipedia)|]]
Dictionaries based
;Input :
:tokenized text : 1 sentence per line, token separated by space
;Output/Trainning data :
:append POS tags
:&nbsp;&nbsp;{{{Pierre_NNP Vinken_NNP ,_, 61_CD years_NNS old_JJ ,_, will_MD join_VB the_DT board_NN as_IN
    a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD ._.
Mr._NNP Vinken_NNP is_VBZ chairman_NN of_IN Elsevier_NNP N.V._NNP ,_, the_DT Dutch_NNP publishing_VBG group_NN}}} 

>The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
>It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.

Voir (le site de référence du Système SYGMART©)
Jeu d'étiquettes utilisées par le tagger.

Le jeu d'étiquettes dépends généralement de la langue :
* pour le français : <<matchTags popup "label:tag-set & lang:FR" tag-set AND lang:FR>> <<matchTags "#[[%0]]" "\n" tag-set AND lang:FR>>
* pour l'anglais : <<matchTags popup "label:tag-set & lang:EN" tag-set AND lang:EN>> <<matchTags "#[[%0]]" "\n" tag-set AND lang:EN>>