Language Technology 1
MA in IT and Cognition. Wednesdays/Thursdays 9:00AM-12:00PM in Room 24.0.30/24.1.07. Teachers: Anders Søgaard and Hector Martinez. Exam: written (CD-ROM w. implementation + dependency parser output + 15 page report). Curriculum here. The course enables the student to implement data-driven natural language processing systems. Our main focus will be dependency parsing. Lecture plan1. February 15. (AS) Introduction. Literature: Kübler et al, Ch. 1. Exercise: Evaluate a (supervised) parser on distributed data. [Slides] 2. February 22. (AS) State-of-the-art in dependency parsing (1/3). Literature: Kübler et al., Ch. 2-4. Exercise: Evaluate a parser on a reversed treebank. [Slides] [Slides] [Code] 3. February 29. (AS) State-of-the-art in dependency parsing (2/3). Literature: Surdeanu and Manning (2010), McDonald and Nivre (2011). Exercise: (i) Stack a classifier on a parser and a reversed parser. (ii) Diagnose your parser using the full output of eval07.pl (remove "-q"). [Slides] 4. March 7. (AS) State-of-the-art in dependency parsing (3/3). Literature: Søgaard and Rishøj (2010), Zhou et al. (2011), Søgaard and Haulrich (2011). Exercise: Decorate the Danish treebank with clusters and evaluate a parser on it. [Slides] [Slides] 5. March 14. (HM) Supervised POS tagging. Literature: Bird et al. Ch. 5; Brants (2000); Toutanova and Manning (2000). [Slides] 6. March 21. (HM). Using NLP for treebank augmentation: lemmatization, named entity recognition, chunking. Literature: Bird et al. Ch. 7, Novak and Zabokrtsky (2007), Tongchim et al. (2008), Bengoetxea and Gojenola (2009). Exercise: Evaluate a POS tagger on the Danish treebank. [Slides] 7. March 28. (HM) Semantically informed dependency parsing. Literature: Bird et al., Sect. 2.4-2.5, Agirre et al. (2011), Søgaard and Johannsen (2011), Martinez et al. (2011). 8. April 4. Easter. 9. April 12 (THURSDAYS FROM NOW ON). (HM) Evaluation and error analysis. Literature: Goldberg and Elhadad (2010), McDonald and Nivre (2011), Schwarz et al. (2011). 10. April 19. (AS) Unsupervised and minimally supervised dependency parsing (I). Literature: Klein and Manning (2004), Ponvert et al. (2011). [Slides] 11. April 26. (AS) Unsupervised and minimally supervised dependency parsing (II). Literature: Reichart and Rappoport (2009), Spitkovsky et al. (2010), Reichart and Rappoport (2010), Naseem et al. (2010). Exercise: Evaluate an unsupervised parser on the Danish treebank. [Slides] 12. May 3. (HM/AS) Cross-language adaptation and cross-linguistic differences. Literature: Spreyer et al. (2009), Søgaard (2011), McDonald et al. (2011). [Slides] 13. May 10. (HM) Applications of dependency parsing: machine translation, semantic parsing, and summarization. Literature: Galley and Manning (2009), Xu et al. (2009), Poon and Domingos (2009), Berg-Kirkpatrick et al. (2011). 14. May 17. TBA. TBA. Copenhagen Dependency Parsing Workshop 2012. Presentations
Check out the CONLL-X and CONLL 2007 shared task websites. There is freely available data for Danish, Dutch, Portuguese, and Swedish here. Mappings of POS tagsets to universal tags can be found here. ParsersThe MaltParser can be downloaded from maltparser.org. The MaltBlender is available here. You should also check out the MSTParser, the clearparser, the easyfirst, the Stanford Parser, the ISBN Dependency Parser, Mate-Tools, and LRDep. Code for unsupervised dependency parsing can be found here. Current state-of-the-art PCFG parsers are available at aclweb.org. The Java program pennconverter.jar converts Penn treebank-style CFG trees into CONLL-formatted dependency structures. The eval07.pl script is found here. This script computes oracle scores at dependency level (Usage: python compute_oracle_score.py [GOLD] [PRED1] [PRED2]), whereas this one does it at sentence level. Suggested literatureClustering: [Koo et al. (2008)] [Sagae and Gordon (2009)] [Haffari et al. (2011)] [Bansal and Klein (2011)] Domain adaptation: [Kawahara and Uchimoto (2008)] [Finkel and Manning (2009)] [McClosky et al. (2010)] [Søgaard and Haulrich (2011)] [Plank and van Noord (2011)] [Foster et al. (2011)] [Hall et al. (2011)] Metonomy/metaphor: [Nissim and Markert (2003)] [Baumer et al. (2010)] [Shutova et al. (2010)] [Roberts and Harabagiu (2011)] Morphology: [Lee et al. (2011)] [Marton et al. (2011)] (Franz: [Goldberg and Elhadad (2010)]) Semi-supervised learning algorithms: [Wang et al. (2008)] [Kawahara and Uchimoto (2008)] [Suzuki et al. (2009)] Text classification: [Wu et al. (2009)] [Joshi and Penstein-Rose (2009)] [Solorio et al. (2011)] [Wong and Dras (2011)] Wrapper methods: [Sagae and Tsujii (2006)] [Chen et al. (2008)] [Chen et al. (2009)] |
|
|
Njalsgade 140-142, bygn. 25, DK-2300 KBH S
Tlf: +45 35329090 - Fax: +45 35329089 |
|