Papers by William W. Cohen

Papers published in 2025

Jixuan Leng, Cassandra A. Cohen, Zhixian Zhang, Chenyan Xiong, William W. Cohen (2025): Semi-structured LLM Reasoners Can Be Rigorously Audited in preparation.
Jixuan Leng, Chengsong Huang, Langlin Huang, Bill Yuchen Lin, William W. Cohen, Haohan Wang, Jiaxin Huang (2025): CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation in preparation.

Papers published in 2024

Cassandra A. Cohen, William W. Cohen (2024): Watch Your Steps: Observable and Modular Chains of Thought in preparation.
Adam Fisch, Joshua Maynez, R. Alex Hofer, Bhuwan Dhingra, Amir Globerson and William W. Cohen (2024): Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation in NeurIPS-2024.
Gabriel Sarch, Lawrence Jang, Michael J. Tarr, William W. Cohen, Kenneth Marino, Katerina Fragkiadaki (2024): ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights in NeurIPS-2024.
R. Alex Hofer, Joshua Maynez, Bhuwan Dhingra, Adam Fisch, Amir Globerson and William W. Cohen (2024): Bayesian Prediction-Powered Inference in progress.
Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler (2024): SEMQA: Semi-Extractive Multi-Source Question Answering in NAACL-2024.
Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie (2024): MEMORY-VQ: Compression for Tractable Internet-Scale Memory in NAACL-2024.

Papers published in 2023

Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, William W. Cohen, Ming-Wei Chan, Xuhui Jia (2023): Instruct-Imagen: Imagen Generation with Multi-modal Instruction in CVPR.
- Accepted as an oral presentation (one of 90 orals out of 11,500 submissions).
Chung-Ching Chang, William W. Cohen, Yun-Hsuan Sung (2023): Characterizing Tradeoffs in Language Model Decoding with Informational Interpretations in progress.
Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2023): Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions in progress.
- Following up the 'QA is the new KR' paper, we present a new collection of question-answer pairs automatically generated from Wikipedia which are more specific and ambiiguous than generated questions used in prior work, and show that this can be used to answer ambiguous questions. On the challenging ASQA benchmark, which requires generating long-form answers that summarize the multiple answers to an ambiguous question, our method improves performance by 10-15%. The new queston DB can also be used to improve diverse passage retrieval.
Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie (2023): GLIMMER: generalized late-interaction memory reranker in progress.
Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen (2023): Subject-driven Text-to-Image Generation via Apprenticeship Learning in NeurIPS-2023.
Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, William W. Cohen (2023): Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute in ICML-2023.
Wenhu Chen, Hexiang Hu, Xi Chen, Pat Verga, William W. Cohen (2023): MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text in EACL-2023.
Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, William Cohen (2023): FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference in ACL-2023 (Findings).
Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen (2023): Re-Imagen: Retrieval-Augmented Text-to-Image Generator in ICLR-2023.
Julian Martin Eisenschlos, Jeremy R. Cole, Fangyu Liu, William W. Cohen (2023): WinoDict: Probing language models for in-context word acquisition in EACL-2023.
- One of two winners of an Outstanding Paper Award at EACL.
John Wieting, Jonathan H. Clark, William W. Cohen, Graham Neubig, Taylor Berg-Kirkpatrick (2023): Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval in ACL-2023.
Wenhu Chen, William W. Cohen, Michiel De Jong, Nitish Gupta, Alessandro Presta, Pat Verga, John Wieting (2023): QA Is the New KR: Question-Answer Pairs as Knowledge Bases in AAAI-2023.
- Proposes that symbolic KBs can be replaced with a collection of question-answer pairs automatically generated from a corpus, augmented with entity-linking annotations. Like a symbolic KB, this representation is well-suited to structured queries involving joins and aggregation, and can support 'multi-hop' reasoning. However, it has the advantage that the information in it is closely aligned to likely user information needs, as modeled by the question generation process.
Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2023): Scenario-based Question Answering with Interacting Contextual Properties in ICLR-2023.

Papers published in 2022

Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2022): Reasoning over Logically Interacted Conditions for Question Answering in progress.
Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen (2022): Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks in progress.
Vidhisha Balachandran, Hannaneh Hajishirzi, William Cohen, Yulia Tsvetkov (2022): Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling in EMNLP-2022.
Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, and Kellie Webster (2022): Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models in progress.
Wenhu Chen, Pat Verga, Michiel de Jong, John Wieting, William W. Cohen (2022): Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering in EACL-2022.
- Extends the techniques of Mention Memory in several important ways. (1) The memory is a memory of generated question-answer pairs, which is more interpretable than neural entity-mention encodings; (2) it is based on pre-trained T5, not a custom Transformer; and (3) it allows use of the token-level encoding of retrieved QA pairs as well as neural encodings of them for reasoning. Using QA pairs instead of passages allows a clever pre-training trick for learning to retrieve, and the model greatly outperfoms a prior similar model (i.e., RePAQ) on smaller QA benchmarks.
Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen and Donald Metzler (2022): Transformer Memory as a Differentiable Search Index in NeurIPS 2022.
Siddhant Arora, Danish Pruthi, Norman Sadeh, William W. Cohen, Zachary C. Lipton, Graham Neubig (2022): Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations in AAAI 2022.

Papers published in 2021

Vidhisha Balachandran and Bhuwan Dhingra and Haitian Sun and Michael Collins and William W. Cohen (2021): Investigating the Effect of Background Knowledge on Natural Questions in DeeLIO-2021.
- Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2021): ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers in ACL 2022.
- A novel dataset with (1) long context documents containing information that is related in logically complex ways; (2) multi-hop questions that require compositional logical reasoning. Intended as a more realistic version of ShARC, a QA task considered in 'End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents'
Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, William Cohen (2021): Mention Memory: incorporating textual knowledge into Transformers through entity mention attention in ICLR 2021.
- Similar to the Entities-as-Experts model, but uses a much larger memory of entity mentions, which allows the model to potentially provide meaningful provenance for information. The model, called TOME, outperforms Entities-as-Experts on several tasks, and required some non-trivial technical innovations relating to memory pre-training and efficient retrieval.
Keshav Kolluru, Martin Rezk, Pat Verga, William W. Cohen, Partha Talukdar (2021): Multilingual Fact Linking in AKBC-2021.
Julian Martin Eisenschlos, Maharshi Gor, Thomas Muller, William W. Cohen (2021): MATE: Multi-view Attention for Table Transformer Efficiency in EMNLP-2021.
Bhuwan Dhingra, Jeremy R. Cole, Julian Martin Eisenschlos, Daniel Gillick, Jacob Eisenstein, William W. Cohen (2021): Time-Aware Language Models as Temporal Knowledge Bases in preparation.
Haitian Sun, William W. Cohen, Ruslan Salakhutdinov (2021): End-to-End Multihop Retrieval for Compositional Question Answering over Long Documents in preparation.
- Adapts many of the ideas used for multihop KBQA to a new task - answering multihop questions over a large document. Retrieval steps in this "DocHopper" system retrieve passages of a document, and the retrieved items are combined with a question neurally: i.e., rather than appending text to a question and re-encoding that discrete object, what is retrieved is a vector summary of the document, which is mixed with the previous question encoding. This is fast, fully differentiable, allows retrieval of large document subsections, and gets a new SOTA on three datasets.
Avishai Zagoury, Einat Minkov, Idan Szpektor, William W. Cohen (2021): What's the best place for an AI conference, Vancouver or ______: Why completing comparative questions is difficult in AAAI2021.
Haitian Sun, Pat Verga, Bhuwan Dhingra, Ruslan Salakhutdinov, William W. Cohen (2021): Reasoning Over Virtual Knowledge Bases With Open Predicate Relations in ICML2021.
- Modifies the FILM model by using a virtual KB of small text passages containing pairs of entities. This required adding a Matching-the-Blanks pretraining phase, but got strong results on a number of QA-from-corpora tasks.
Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Wang, William W. Cohen (2021): Open Question Answering Over Tables and Text in ICLR-2021.
- Answering open QA multi-hop questions over tables and text with a clever ``early fusion'' idea, which proposes and indexes likely reasoning chains, and uses large-document Transformers to merge these noisy evidence chains.
Pat Verga, Haitian Sun, Livio Baldini Soares, and William W. Cohen (2021): Adaptable and Interpretable Neural Memory Over Symbolic Knowledge in NAACL-2021.
- Most recent paper on Fact-Injected Language Model (FILM), which includes an Entities-as-Experts style memory of neural entity encodings, plus a second "fact memory" of KG triples. FILM has good results on KBQA tasks, and allows one to use an edited KB with retraining.

Papers published in 2020

Danish Pruthi, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C. Lipton, Graham Neubig, William W. Cohen (2020): Evaluating Explanations: How Much Do Explanations From the Teacher Aid Students? in preparation.
Bill Yuchen Lin, Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Xiang Ren, William W. Cohen (2020): Differentiable Open-Ended Commonsense Reasoning in NAACL-2021.
- Extends DrKIT's virtual KB to a corpus of documents of common-sense statements ("facts"). In DrFact, entities are replaced by noisy and ambiguous concepts, and navigation is between documents with overlapping sets of mentions. Also introduces new "open" tasks for common-sense QA.
Haitian Sun, Andrew O. Arnold, Tania Bedrax-Weiss, Fernando Pereira, William W. Cohen (2020): Faithful Embeddings for Knowledge Base Queries in NeurIPS2020.
- An extension to Neural Query Language (NQL) which extends the query language to work with a "centroid-sketch" representation of sets. The centroid encoders a geometric area, and the sketch is a randomized data structure that adds capacity to the sketch, allowing faithful differential logical reasoning to be combined with good generalization.
Pat Verga, Haitian Sun, Livio Baldini Soares, and William W. Cohen (2020): Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge in arxiv.
- Earlier draft of the NAACL paper on FILM (Fact-Injected LM).
William W. Cohen, Fan Yang, and Kathryn Rivard Mazaitis (2020): TensorLog: A Probabilistic Database Implemented Using Deep-Learning Infrastructure in JAIR.
- Most complete paper on TensorLog, a predecessor of NQL/EmQL that was a Prolog-like logic, not a dataflow query language.
William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler (2020): Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base in ICLR-2020.
- Paper on Neural Query Language (NQL) a differentiable dataflow query language. NQL is useful for building KBQA systems that can be trained from denotations, but relies heavily on sparse-matrix operations that are not implemented in all accelerators.
Bhuwan Dhingra, Manzil Zaheer, Vidhisha Balachandran, Graham Neubig, Ruslan Salakhutdinov, William W. Cohen (2020): Differentiable Reasoning over a Virtual Knowledge Base in ICLR-2020.
- Describes DrKIT, which allows one to answer multihop chain queries on a "virtual KB"---a corpus of entity-linked documents. In DrKIT, entity mentions are indexed for neural retrieval with a rich representation of their context, and reasoning consists of navigating between co-occurring mentions.
Yifeng Tao, Chunhui Cai, William W. Cohen, Xinghua Lu (2020): From genome to phenome: Predicting multiple cancer phenotypes based on somatic genomic alternations bia the genomic impact transformer in PSB-2020.

Papers published in 2019

Andrew O. Arnold, William W. Cohen (2019): Instance-based Transfer Learning for Multilingual Deep Retrieval in arxiv.
Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W Cohen, and Xinghua Lu (2019): PubMedQA: A Dataset for Biomedical Research Question Answering in EMNLP-2019.
Bhuwan Dhingra, Manaal Faruqui, Ankur Parikh, Ming-Wei Chang, Dipanjan Das, William W. Cohen (2019): Handling Divergent Reference Texts when Evaluating Table-to-Text Generation in ACL-2019.
William W. Cohen, Haitian Sun, Alex Hofer, Matthew Siegler (2019): Differentiable Representations For Multihop Inference Rules in arxiv.
- Earlier version of ICLR paper on NQL.
William W. Cohen, Matthew Siegler, Alex Hofer (2019): Neural Query Language: A Knowledge Base Query Language for Tensorflow in arxiv.
- Earlier version of ICLR paper on NQL focusing on the language constructs used.
Haitian Sun, Tania Bedrax-Weiss, William W. Cohen (2019): PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text in EMNLP-2019.
Qiao Jin, Bhuwan Dhingra, William W. Cohen, Xinghua Lu (2019): Probing Biomedical Embeddings from Language Models in NAACL-2019.
Haohan Wang, Xiang Liu, Yifeng Tao, Wenting Ye, Qiao Jin, William W. Cohen and Eric P. Xing (2019): Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning in Biocomputing.

Papers published in 2018

Haitian Sun, William W. Cohen, Lidong Bing (2018): Semi-Supervised Learning with Declaratively Specified Entropy Constraints in NIPS-2018.
Zhilin Yang, Jake (Junbo) Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun (2018): GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations in NIPS-2018.
Qiao Jin, Bhuwan Dhingra, William W. Cohen, and Xinghua Lu (2018): AttentionMeSH: Simple, Effective and Interpretable Automatic MeSH Indexer in BioASQ-2018.
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning (2018): HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering in EMNLP-2018.
Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, and William W. Cohen (2018): Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text in EMNLP-2018.
Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov (2018): Neural Models for Reasoning over Multiple Mentions using Coreference in NAACL-2018.
Vidhisha Balachandran and Dheeraj Rajagopal and , Rose Catherine Kanjirathinkal and William W. Cohen (2018): Learning to Define Terms in the Software Domain in W-NUT 2018.

Papers published in 2017

T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. La, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov,M. Greaves, J. Welling (2017): Never-Ending Learning in CACM.
Fan Yang, Jiazhong Nie, William W. Cohen, Ni Lao (2017): Learning to Organize Knowledge with N-Gram Machines in arxiv.org/abs/1711.06744.
Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen (2017): Breaking the Softmax Bottleneck: A High-Rank RNN Language Model in arxiv.org 1711.03953.
Fan Yang, Zhilin Yang, William W. Cohen (2017): Differentiable Learning of Logical Rules for Knowledge Base Reasoning in NIPS-2017.
Zihang Dai, Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov (2017): Good Semi-supervised Learning that Requires a Bad GAN in NIPS-2017.
William W. Cohen and Fan Yang (2017): TensorLog: Deep Learning Meets Probabilistic Databases in arxiv.org 1707.05390.
Rose Catherine, Kathryn Mazaitis, Maxine Eskenazi, William W. Cohen (2017): Explainable Entity-based Recommendations with Knowledge Graphs (poster paper) in RecSys-2017.
Bhuwan Dhingra, Kathryn Mazaitis, William W. Cohen (2017): Quasar: Datasets for Question Answering by Search and Reading in arxiv 1707.03904.
Bhuwan Dhingra, Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov (2017): Linguistic Knowledge as Memory for Recurrent Neural Networks in arxiv 1703.02620.
Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen (2017): A Comparative Study of Word Embeddings for Reading Comprehension in arxiv 1703.00993.
Rose Catherine, William W. Cohen (2017): TransNets: Learning to Transform for Recommendation in RecSys-2017.
Lidong Bing, William W. Cohen, Bhuwan Dhingra, and Richard C. Wang (2017): Using Graphs of Classifiers to Impose Constraints on Semi-supervised Relation Extraction in IJCAI 2017.
Bhuwan Dhingra, Hanxiao Liu, William W. Cohen, and Ruslan Salakhutdinov (2017): Gated-Attention Readers for Text Comprehension in ACL-2017.
Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, William W. Cohen (2017): Semi-Supervised QA with Generative Domain-Adaptive Nets in ACL-2017.
Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov (2017): Words or Characters? Fine-grained Gating for Reading Comprehension in ICLR 2017.
Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen (2017): Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks in ICLR 2017.
Lidong Bing, Bhuwan Dhingra, Kathryn Mazaitis, Jong Hyuk Park, William W. Cohen (2017): Bootstrapping Distantly Supervised IE using Joint Learning and Small Well-structured Corpora in AAAI 2017.

Papers published in 2016

William W. Cohen (2016): TensorLog: A Differentiable Deductive Database in arxiv.org 1605.06523.
Abhinav Maurya, Kenton Murray, Yandong Liu, Chris Dyer, William W. Cohen and Daniel B. Neill (2016): Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams in arxiv 1602.04393.
- won Yelp Dataset Challenge Grand Prize
Douglas R. Pierce, David P. Redlawsk, and William W. Cohen (2016): Social Influences on Online Political Information Search and Evaluation in Political Behavior DOI 10.1007/s11109-016-9374-4.
Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen (2016): Encode, Review, and Decode: Reviewer Module for Caption Generation in NIPS-2016.
Rose Catherine and William W. Cohen (2016): Personalized Recommendations using Knowledge Graphs: A Probabilistic Logic Programming Approach in RecSys 2016.
Lidong Bing, William W. Cohen, Bhuwan Dhingra, and Richard C. Wang (2016): Using Graphs of Classifiers to Impose Constraints on Semi-supervised Relation Extraction in WAKBC-2016.
Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl and William W. Cohen (2016): Tweet2Vec: Character-Based Distributed Representations for Social Media in ACL-2016 (short paper).
William Yang Wang and William W. Cohen (2016): Learning First-Order Logic Embeddings via Matrix Factorization in IJCAI-2016.
Zhilin Yang, Jei Tang, and William W. Cohen (2016): Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs in IJCAI-2016.
Zhilin Yang, Ruslan Salakhutdinov, William Cohen (2016): Revisiting Semi-Supervised Learning with Graph Embeddings in ICML-2016.
Zhilin Yang, Ruslan Salakhutdinov, William Cohen (2016): Multi-Task Cross-Lingual Sequence Tagging from Scratch in arxiv 1603.06270.
Lidong Bing, Mingyang Ling, Richard C. Wang, William W. Cohen (2016): Distant IE by Bootstrapping Using Lists and Document Structure in AAAI-2016.
Bhavana Dalvi and Aditya Mishra and William W. Cohen (2016): Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies in WSDM-2016.

Papers published in 2015

Evgenia Wasserman Pritsker, William Cohen and Einat Minkov (2015): Learning to Identify the Best Contexts for Knowledge-Based WSD in EMNLP-2015.
Jay Pujara, Hui Miao, Lise Getoor, and William W. Cohen (2015): Using semantics and statistics to turn data into knowledge in AI Magazine 2015.
Lidong Bing, Sneha Chaudhari, Richard C. Wang, and William W. Cohen (2015): Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists in EMNLP-2015.
Ni Lao, Einat Minkov, and William W. Cohen (2015): Learning Relational Features with Backward Random Walks in ACL-2015.
William Yang Wang, Kathryn Mazaitis, and William W. Cohen (2015): Joint Information Extraction and Reasoning: A Scalable Statistical Relational Learning Approach in ACL-2015.
Dana Movshovitz-Attias and William W. Cohen (2015): KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts in ACL-2015.
William Yang Wang, Kathryn Mazaitis, and William W. Cohen (2015): A Soft Version of Predicate Invention Based on Structured Sparsity in IJCAI-2015.
William Yang Wang, Kathryn Mazaitis, Ni Lao, Tom Mitchell, and William W. Cohen (2015): Efficient Inference and Learning in a Large Knowledge Base: Reasoning with Extracted Information using a Locally Groundable First-Order Probabilistic Logic in Machine Learning, 2015.
Bhavana Dalvi, Einat Minkov, Partha P. Talukdar, and William W. Cohen (2015): Automatic Gloss Finding for a Knowledge Base using Ontological Constraints in WSDM-2015.
T. Mitchell, W. Cohen, E. Hruscha, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner,B. Kisiel,J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohammad, N. Nakashole, E. Platanios,A. Ritter, M. Samadi, B. Settles, R.Wang, D.Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J.Welling (2015): Never-Ending Learning in AAAI-2015.
Nan Li, Noboru Matsuda, William W. Cohen, Kenneth R. Koedinger (2015): Integrating representation learning and skill learning in a human-like intelligent agent in Artificial Intelligence, 2015, pp. 67-91.

Papers published in 2014

Einat Minkov and William W. Cohen (2014): Adaptive graph walk-based similarity measures for parsed text in Natural Language Engineering 20(03), pp 361-397.
Ramnath Balasubramanyan and William W. Cohen (2014): Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links in Handbook of Mixed Membership Models and Their Applications.
Kathryn Mazaitis, Richard C. Wang, Frank Lin, Bhavana Dalvi, Jakob Bauer, William W. Cohen (2014): A Tale of Two Entity Linking and Discovery Systems in KBP-TAC 2014.
William Yang Wang, Lingpeng Kong, Kathryn Mazaitis, and William W. Cohen (2014): Dependency Parsing for Weibo: An Efficient Probabilistic Logic Programming Approach in EMNLP-2014.
William Yang Wang, Kathryn Mazaitis, and William W. Cohen (2014): Structure Learning via Parameter Learning in CIKM-2014.
Bhavana Dalvi and William W. Cohen (2014): Multi-View Hierarchical Semi-supervised Learning by Optimal Assignment of Sets of Labels to Instances in WSDM-2016.
Jay Pujara, Hui Miao, Lise Getoor and William W. Cohen (2014): Using Semantics & Statistics to Turn Data into Knowledge in AI Magazine 2014.
Tuan-Ahn Hoang and William W. Cohen and Ee-Peng Lim (2014): On Modeling Community Behaviors and Sentiments in Microblogging in SDM-2014.
Partha Pratim Talukdar and William W. Cohen (2014): Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch in AI-Stats 2014.
Noboru Matsuda, Gabriel Stylianides, William W. Cohen, & Ken R. Koedinger (2014): Using a Synthetic Peer to Investigate the Effect of Competitive Learning by Teaching in Mathematics Paper in AERA-2014.
N. Matsuda, C. L. Griger, N. Barbalios, G. Stylianides, W. W. Cohen, & K. R. Koedinger. (2014): Investigating the Effect of Meta-Cognitive Scaffolding for Learning by Teaching. in ITS-2014.

Papers published in 2013

Nan Li, William W. Cohen, and Ken Koedinger (2013): Problem Order Implications for Learning in IJAIE.
Noboru Matsuda, E. Yarzebinski, V. Keiser, R. Raizada, William W. Cohen, G. J. Stylianides, & K. R. Koedinger (2013): Cognitive anatomy of tutor learning: Lessons learned with SimStudent. in Journal of Ed Psych, 105(4), 1152-1163. doi: 10.1037/a0031955. .
Jay Pujara, Hui Miao, Lise Getoor, and William W. Cohen (2013): Ontology-Aware Partitioning for Knowledge Graph Identification in AKBC-2013.
Bhavana Dalvi, William W. Cohen, and Jamie Callan (2013): Classifying Entities into an Incomplete Ontology in AKBC-2013.
Douglas Pierce, David P. Redlawsk, and William W. Cohen (2013): Social Influences on Political Information Search and Evaluation in APSA-2013.
William Yang Wang, Kathryn Mazaitis, William W. Cohen (2013): Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic in CIKM-2013.
- Honorable Mention for Best Paper at CIKM-2013
William Yang Wang, Kathryn Mazaitis, William W. Cohen (2013): Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic in arxiv 1305.2254.
William Yang Wang, Kathryn Mazaitis, William W. Cohen (2013): Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic in ICML 2103 Workshop on Inferning.
Jay Pujara, Hui Miao, Lise Getoor, and William W. Cohen (2013): Knowledge Graph Identification in ISWC-2013.
- Best Student Paper at ISWC-2013, and winner of SWSA Ten-Year Award.
Ramnath Balasubramanyan, Bhavana Dalvi and William W. Cohen (2013): From Topic Models to Semi-Supervised Learning: Biasing Mixed-membership Models to Exploit Topic-Indicative Features in Entity Clustering in ECML/PKDD-2013.
Bhavana Dalvi and William W. Cohen and Jamie Callan (2013): Exploratory Learning in ECML/PKDD-2013.
William W. Cohen, David P. Redlawsk and Douglas Pierce (2013): The Effect of Biased Communications On Both Trusting and Suspicious Voters in arxiv 1306.2558.
Nan Li, William W. Cohen, and Ken Koedinger (2013): Discovering Student Models With A Clustering Algorithm Using Problem Content in EDM-2013.
Nan Li, Yanduong Tian, William W. Cohen, and Ken Koedinger (2013): Integrating Perceptual Learning With External World Knowledge In A Simulated Student in AIED-2013.
Nan Li, Eliane Stampfer, William W. Cohen, and Ken Koedinger (2013): Efficient Cross-Domain Cognitive Model Discovery Using A Simulated Student in CogSci-2013.
Tuan-Ahn Hoang, William W. Cohen, Ee-Peng Lim, Doug Pierce, David Redlawsk (2013): Politics, Sharing and Emotion in Microblogs in ANOMAN-2013.
Dana Movshovitz-Attias and William W. Cohen (2013): Natural Language Models for Predicting Programming Comments in ACL-2013 (short paper).
Ramnath Balasubramanyan and William W. Cohen (2013): Regularization of Latent Variable Models to Obtain Sparsity in SDM-2013.
Bhavana Dalvi and William W. Cohen (2013): Very Fast Similarity Queries on Semi-Structured Data from the Web in SDM-2013.

Papers published in 2012

Frank Lin and William W. Cohen (2012): A General and Scalable Approach to Mixed Membership Clustering in ICDM-2012.
Doug Pierce David P. Redlawsk William W. Cohen, Tae Yano, Ramnath Balasubramanyan (2012): Social and Affective Responses to Political Information in APSA-2012.
Freddy Chong Tat Chua, William W. Cohen, Justin Betteridge, and Ee-Peng Lim (2012): Community-Based Classification of Noun Phrases in Twitter in CIKM-2012 (short paper).
Nan Li, William W. Cohen, Kenneth R. Koedinger (2012): Learning to Perceive Two-Dimensional Displays Using Probabilistic Grammars in ECML-2012.
Ramnath Balasubramanyan and William W. Cohen (2012): Entropic Regularization of Mixed-membership Network Models using Pseudo-observations in MLG-2012.
Nan Li, Abraham Schreiber, William W. Cohen, Kenneth R. Koedinger (2012): Creating Features from a Learned Grammar in a Simulated Student in ECAI-2012.
Mahesh Joshi, Mark Dredze, William Cohen and Carolyn Rose (2012): Multi-Domain Learning: When Do Domains Matter? in EMNLP-CoNLL-2012.
Ni Lao, Amar Subramanya, Fernando Pereira and William W. Cohen (2012): Reading The Web with Learned Syntactic-Semantic Inference Rules in EMNLP-CoNLL-2012.
Bhavana Dalvi, William W. Cohen, and Jamie Callan (2012): Collectively Representing Semi-Structured Data from the Web in AKBC-2012.
- Honorable Mention for Best Paper at AKBC-2012
Ramnath Balasubramanyan, Kathryn Rivard, William W. Cohen, Jelena Jakovljevic and John Woolford (2012): Evaluating Joint Modeling of Yeast Biology Literature and Protein-Protein Interaction Networks in BioNLP-2012.
Dana Movshovitz-Attias and William W. Cohen (2012): Alignment-based Extraction of Abbreviations from Biomedical Text in BioNLP-2012.
Dana Movshovitz-Attias and William W. Cohen (2012): Bootstrapping Biomedical Ontologies for Scientific Text using NELL in BioNLP-2012.
Partha Pratim Talukdar and William W. Cohen (2012): Crowdsourced Comprehension: Predicting Prerequisite Structure in Wikipedia in BEA-2012.
Einat Minkov and William W. Cohen (2012): Graph Based Similarity Measures for Synonym Extraction from Parsed Text in TextGraphs-2012.
Noboru Matsuda, Evelyn Yarzebinski, Victoria Keiser, Rohan Raizada, William W. Cohen, Gabriel Stylianides, Kenneth R. Koedinger (2012): Shallow learning as a pathway for successful learning both for tutors and tutees in CogSci-2012.
Nan Li, William W. Cohen, and Ken Koedinger (2012): Problem Order Implications for Learning Transfer in ITS-2012.
Noboru Matsuda, E. Yarzebinski, V. Keiser, V., R. Raizada, G. Stylianides, G., W. W. Cohen, et al. (2012): Motivational factors for learning by teaching: The effect of a competitive game show in a virtual peer-learning environment. in ITS-2012.
Nan Li, William W. Cohen, and Ken Koedinger (2012): Efficient Cross-Domain Learning of Complex Skills in ITS-2012 (short paper).
Ramnath Balasubramanyan, William W. Cohen, Doug Pierce, and David P. Redlawsk (2012): Modeling Polarizing Topics: When Do Different Political Communities Respond Differently to the Same News? in ICWSM-2012.
Bhavana Dalvi, William W. Cohen, and Jamie Callan (2012): WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction in WSDM-2012.

Papers published in 2011

Ni Lao, Tom Mitchell, and William W. Cohen (2011): Random Walk Inference and Learning in A Large Scale Knowledge Base in EMNLP-2011.
Jacob Eisenstein, Tae Yano, William W. Cohen, Noah A. Smith, and Eric P. Xing (2011): Structured Databases of Named Entities from Bayesian Nonparametrics in UNSUP-2011.
Frank Lin and William W. Cohen (2011): Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data in MLG-2011.
Nan Li and William W. Cohen and Kenneth R. Koedinger and Noboru Matsuda (2011): A Machine Learning Approach for Automatic Student Model Discovery in EDM-2011.
Ramnath Balasubramanyan, William W. Cohen, Doug Pierce, and David P. Redlawsk (2011): What pushes their buttons? Predicting comment polarity from the content of political blog posts in LSM-2011.
Noboru Matsuda, Evelyn Yarzebinski, Victoria Keiser, Rohan Raizada, Gabriel J. Stylianides, William W. Cohen, Kenneth R. Koedinger (2011): Learning by Teaching SimStudent - An Initial Classroom Baseline Study comparing with Cognitive Tutor in AIED-2011.
Bhavana Dalvi, Jamie Callan, and William W. Cohen (2011): Entity List Completion Using Set Expansion Techniques in TREC 2011.
Ramnath Balasubramanyan and William W. Cohen (2011): Block-LDA: Jointly modeling entity-annotated text and entity-entity links in SDM-2011.

Papers published in 2010

Ramnath Balasubramanyan, Frank Lin, and William W. Cohen (2010): Node Clustering in Graphs: An Empirical Study in NIPS-2010 Workshop on Networks Across Disciplines.
Philip Stutz, Abraham Bernstein and William W. Cohen (2010): Signal/Collect: Graph Algorithms for the (Semantic) Web in ISWC-2010.
Einat Minkov and William W. Cohen (2010): Improving Graph-Walk Based Similarity with Reranking: Case Studies for Personal Information Management in TOIS-2010.
Ni Lao and William W. Cohen (2010): Relational Retrieval Using a Combination of Path-Constrained Random Walks in ECML-2010 and MLJ-2010 Special Issue.
Frank Lin and William W. Cohen (2010): Semi-Supervised Classification of Network Data Using Very Few Labels in ASONAM-2010.
Ramnath Balasubramanyan and William W. Cohen (2010): Block-LDA: Jointly modeling entity-annotated text and entity-entity links in ICML-2010 Workshop on Topic Modeling.
Frank Lin and William W. Cohen (2010): Power Iteration Clustering in ICML-2010.
Frank Lin and William W. Cohen (2010): A Very Fast Method for Clustering Big Text Datasets in ECAI-2010.
Ni Lao and William W. Cohen (2010): Fast Query Execution for Retrieval Models based on Path Constrained Random Walks in KDD-2010.
L. P. Coelho, A. Ahmed, A. Arnold, J. Kangas, A.-S. Sheikh, E. Xing, W. Cohen, and R. F. Murphy (2010): Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature in Lecture Notes in Bioinformatics.
A. Ahmed, A. Arnold, L. P. Coelho, J. Kangas, A.-S. Sheikh, E. Xing, W. Cohen, and R. F. Murphy (2010): Structured Literature Image Finder: Parsing Text and Figures in Biomedical Literature in Journal of Web Semantics.
Nan Li, William W. Cohen, and Kenneth R. Koedinger (2010): A Computational Model of Accelerated Future Learning through Feature Recognition in ITS-2010 (poster).
Noboru Matsuda, Victoria Keiser, Rohan Raizada, Arthur Tu, Gabriel Stylianides, William W. Cohen, Kenneth R. Koedinger (2010): Learning by Teaching SimStudent: Technical Accomplishments and an Initial Use with Students in ITS-2010.

Papers published in 2009

William Cohen (2009): Graph Walks and Graphical Models in SCS Technical Report Collection.
William W. Cohen, Natalie Glance, Charles Schafer, Roy Tromble, Yuk Wah Wong (2009): Data Integration for Many Data Sources using Context-Sensitive Similarity Metrics in limbo.
Vitor Carvalho, Ramnath Balasubramanyan and William W. Cohen (2009): Information Leaks and Suggestions: A Case Study using Mozilla Thunderbird in CEAS-2009.
Richard Wang and William W. Cohen (2009): Character-level Analysis of Semi-Structured Documents for Set Expansion in EMNLP 2009.
Ramnath Balasubramanyan and William W. Cohen and Matthew Hurst (2009): Modeling corpora of timestamped documents using semisupervised nonparametric topic models in limbo.
Richard Wang and William W. Cohen (2009): Automatic Set Instance Extraction using the Web in ACL-IJNLP 2009.
Noboru Matsuda, Andrew Lee, William W. Cohen, and Ken Koedinger (2009): A Computational Model of How Learner Errors Arise from Weak Prior Knowledge in CogSci-2009.
Amr Ahmed, Andrew Arnold, Luis Pedro Coelho, Joshua Kangas, Abdul-Saboor Sheikk, Eric P. Xing, William W. Cohen, and Robert F. Murphy (2009): Structured Literature Image Finder in Biolink-2009.
Amr Ahmed, Eric P. Xing, William W. Cohen, and Robert F. Murphy (2009): Structured Correspondence Topic Models for Mining Captioned Figures in Biological Literature in KDD-2009.
Tae Yano, Noah A. Smith, and William W. Cohen (2009): Predicting Response to Political Blog Posts with Topic Models in NAACL-2009.
Ramnath Balasubramanyan, Frank Lin, William W. Cohen, Noah A. Smith, and Matthew Hurst (2009): From Episodes to Sagas: Understanding the News by Identifying Temporally Related Story Sequences in ICWSM-2009 (poster).
Andrew Arnold and William W. Cohen (2009): Information Extraction as Link Prediction: Using Curated Citation Networks to Improve Gene Detection in WASA-2009.
Andrew Arnold and William W. Cohen (2009): Information Extraction as Link Prediction: Using Curated Citation Networks to Improve Gene Detection in ICWSM-2009 (poster).

Papers published in 2008

Richard Wang and William W. Cohen (2008): Iterative Set Expansion of Named Entities Using the Web in ICDM-2008.
Andrew Arnold and William W. Cohen (2008): Intra-document Structural Frequency Features for Semi-Supervised Domain Adaptation in CIKM-2008.
Vitor Carvalho, Jonathan L. Elsas, William W. Cohen, and Jaime G. Carbonell (2008): Suppressing Outliers in Pairwise Preference Ranking in CIKM-2008.
Richard Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg (2008): Automatic Set Expansion for List Question Answering in EMNLP-2008.
Einat Minkov and William W. Cohen (2008): Learning Graph Walk Based Similarity Measures for Parsed Text in EMNLP-2008.
Andrew Arnold, Ramesh Nallapati and William W. Cohen (2008): Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition in ACL-2008.
Ramnath Balasubramanyan, Vitor Carvalho, and William W. Cohen (2008): CutOnce - Recipient Recommendation and Leak Detection in Action in AAAI-2008 Workshop on Enhanced Messaging.
Einat Minkov, Ramnath Balasubramanyan, and William W. Cohen (2008): Activity-centric Search in Email in AAAI-2008 Workshop on Enhanced Messaging.
Noboru Matsuda, William W. Cohen, Jonathan Sewall, Gustavo Lacerda, Kenneth Koedinger (2008): SimStudent: Building an Intelligent Tutoring System by Tutoring a Synthetic Student in limbo.
Einat Minkov and William W. Cohen (2008): Learning to Walk Structured Text Networks in CMU SCS Technical Report Series (CMU-LTI-08-02).
Ramesh Nallapati, Amr Ahmed, Eric Xing, and William W. Cohen (2008): Joint Latent Topic Models for Text and Citations in KDD-2008.
Ramesh Nallapati and William W. Cohen (2008): Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs in ICWSM-2008.
Yi-Chia Wang, Mahesh Joshi, William Cohen, and Carolyn Rose (2008): Recovering Implicit Thread Structure in Newsgroup Style Conversations in ICWSM-2008.
Frank Lin and William W. Cohen (2008): The MultiRank Bootstrap Algorithm: SemiSupervised Political Blog Classification and Ranking Using SemiSupervised Link Classification in ICWSM-2008 (poster).
Frank Lin and William W. Cohen (2008): The MultiRank Bootstrap Algorithm: SemiSupervised Political Blog Classification and Ranking Using SemiSupervised Link Classification in CMU SCS Technical Report Series (CMU-LTI-08-03).
Noboru Matsuda, William W. Cohen, Jonathan Sewall, Gustavo Lacerda, and Kenneth R. Koedinger (2008): Why Tutored Problem Solving may be better than Example Study: Theoretical Implications from a Simulated-Student Study in ITS-2008.
Vitor Carvalho and William W. Cohen (2008): Ranking Users for Intelligent Message Addressing in ECIR-2008.

Papers published in 2007

Andrew Arnold, Ramesh Nallapati and William W. Cohen (2007): A Comparative Study of Methods for Transductive Transfer Learning in ICDM Workshop on Mining and Management of Biological Data.
Ramesh Nallapati, William W. Cohen, and John Lafferty (2007): Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability in ICDM Workshop on High Performance Data Mining.
Ramesh Nallapati, Amr Ahmed, William Cohen and Eric Xing (2007): Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models in ICDM Workshop on High Performance Data Mining.
Richard Wang and William Cohen (2007): Language-Independent Set Expansion of Named Entities using the Web in ICDM-2007.
Einat Minkov and William Cohen (2007): Learning to Rank Typed Graph Walks: Local and Global Approaches in WebKDD-2007.
Sarah Zelikovitz, William Cohen, and Haym Hirsh (2007): Extending WHIRL with background knowledge for improved text classification in Information Retrieval 10(1) pp 35-67.
Vitor Carvalho, Wen Wu and William Cohen (2007): Discovering Leadership Roles in Email Workgroups in CEAS-2007.
Zhenzhen Kou, Vitor Carvalho and William Cohen (2007): Online Stacked Graphical Learning in NIPS-07 Workshop on Efficient Machine Learning .
Vitor Carvalho and William Cohen (2007): Recommending Recipients in the Enron Corpus in limbo.
Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty and Kin Ung (2007): Multiscale Topic Tomography in KDD-2007.
Noboru Matsuda, William Cohen, Jonathan Sewall, Gustavo Lacerda and Ken Koedinger (2007): Predicting students performance with a SimStudent that learns cognitive skills from observation in AIED-2007.
Noboru Matsuda, William Cohen, Jonathan Sewall, Gustavo Lacerda and Ken Koedinger (2007): Evaluating a simulated student using real students data for training and testing in UM-2007.
Juchang Hua, Orhan Ayasli, William Cohen and Robert Murphy (2007): Identifying Fluorescence Microscope Images in Online Journal Publications using Both Image and Text Features in ISBI-2007.
Vitor Carvalho and William W. Cohen (2007): Preventing Information Leaks in Email in SDM-2007.
Zhenzhen Kou and William W. Cohen (2007): Stacked Graphical Models for Efficient Inference in Markov Random Fields in SDM-2007.
Zhenzhen Kou, William W. Cohen, and Robert F. Murphy (2007): A Stacked Graphical Model for Associating Information from Text And Images In Figures in PSB-2007.

Papers published in 2006

Richard C. Wang, Anthony Tomasic, Robert E. Frederking, William W. Cohen (2006): Learning to Extract Gene-Protein Names from Weakly-Labeled Text in CMU SCS Technical Report Series (CMU-LTI-08-04).
Noboru Matsuda, William Cohen & Ken Koedinger (2006): What characterizes a better demonstration for cognitive modeling by demonstration? in CMU SCS Technical Report Series (CMU-ML-06-106).
Noboru Matsuda, William W. Cohen, Jonathan Sewall, Kenneth R. Koedinger (2006): Applying Machine Learning to Cognitive Modeling for Cognitive Tutors in CMU SCS Technical Report Series (CMU-ML-06-105).
Einat Minkov and William W. Cohen (2006): An Email and Meeting Assistant using Graph Walks in CEAS-2006.
Einat Minkov, Andrew Ng and William W. Cohen (2006): Contextual Search and Name Disambiguation in Email using Graphs in SIGIR-2006.
Vitor Carvalho and William W. Cohen (2006): Single-Pass Online Learning: Performance, Voting Schemes and Online Feature Selection in KDD-2006 .
Vitor Carvalho and William W. Cohen (2006): Improving Email Speech Act Analysis via N-gram Selection in HLT/NAACL ACTS Workshop 2006.
Einat Minkov, Richard C.Wang, Anthony Tomasic and William W. Cohen (2006): NER Systems that Suit Users Preferences: Adjusting the Recall-Precision Trade-off for Entity Extraction in HLT/NAACL-2006 (short paper).
William W. Cohen (2006): A Graph-Search Framework for GeneId Ranking (Extended Abstract) in BioNLP'06.
William W. Cohen & Einat Minkov (2006): A Graph-Search Framework for Associating Gene Identifiers with Documents in BMC Bioinformatics.

Papers published in 2005

Einat Minkov, Richard C. Wang, and William W. Cohen (2005): Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text in EMNLP/HLT-2005.
Edoardo M. Airoldi, William W. Cohen, Stephen Fienberg (2005): Bayesian methods for frequent terms in text: Models of contagion and the Delta square statistic in CSNA-2005.
William W. Cohen, Einat Minkov & Anthony Tomasic (2005): Learning to Understand Web Site Update Requests in IJCAI-2005.
William W. Cohen & Vitor Carvalho (2005): Stacked Sequential Learning in IJCAI-2005.
Vitor Carvalho & William W. Cohen (2005): On the Collective Classification of Email Speech Acts in SIGIR 2005.
Zhenzhen Kou, William W. Cohen & Robert F. Murphy (2005): High-Recall Protein Entity Recognition Using a Dictionary in ISMB-2005.
Carolyn Rose, Pinar Donmez, G. Gweon, A. Knight, B. Junker, W. Cohen, K. Koedinger, N. Hefferman (2005): Automatic and Semi-Automatic Skill Coding with a View Towards Supporting On-Line Assessment in AIED-2005.
Noboru Matsuda, William Cohen & Ken Koedinger (2005): An Intelligent Authoring System with Programming by Demonstration. in Proceedings of the Japan National Conference on Information and Systems in Education.
Noboru Matsuda, William Cohen & Ken Koedinger (2005): Building Cognitive Tutors with Programming by Demonstration in ILP-2005 (late-breaking paper).
Noboru Matsuda, William Cohen & Ken Koedinger (2005): Applying Programming by Demonstration in an Intelligent Authoring Tool for Cognitive Tutors in AAAI Workshop on Human Comprehensible Machine Learning.

Papers published in 2004

Einat Minkov, Richard Wang & William Cohen (2004): Extracting Personal Names from Emails: Applying Named Entity Recognition to Informal Text in NAACL-2005.
Sunita Sarawagi & William W. Cohen (2004): Semi-Markov Conditional Random Fields for Information Extraction in NIPS 2004.
Robert F. Murphy, Zhenzhen Kou, Juchang Hua, Matthew Joffe, William W. Cohen (2004): Extracting and Structuring Subcellular Location Information from On-line Journal Articles: The Subcellular Location Image Finder in KSCE-2004.
Pradeep Ravikumar, William W. Cohen, Stephen E. Fienberg (2004): A Secure Protocol for Computing String Distance Metrics in PSDM-2004.
Anthony Tomasic, William W. Cohen, Einat Minkov (2004): Learning to Navigate Web Forms in IIWeb 2004.
William W. Cohen, Vitor Carvalho & Tom Mitchell (2004): Learning to Classify Email into "Speech Acts" in EMNLP 2004.
Vitor Carvalho & William W. Cohen (2004): Learning to Extract Signature and Reply Lines from Email in CEAS 2004.
Yifen Huang, Dinesh Govindaraju, Tom Mitchell, Vitor Carvalho & William W. Cohen (2004): Inferring Ongoing Activities of Workstation Users by Clustering Email in CEAS 2004.
Pradeep Ravikumar & William W. Cohen (2004): A Hierarchical Graphical Model for Record Linkage in UAI 2004.
William W. Cohen & Sunita Sarawagi (2004): Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods in KDD 2004: 89-98.

Papers published in 2003

William W. Cohen (2003): Learning and Discovering Structure in Web Pages in IEEE Data Eng. Bull. 26(3): 3-10 (2003).
Mikael Bilenko, Ray Mooney, William W. Cohen, Pradeep Ravikumar & Steve Fienberg (2003): Adaptive Name-Matching in Information Integration in IEEE Intelligent Systems 18(5): 16-23 (2003).
William W. Cohen, Zhenzhen Kou & Robert F. Murphy (2003): Extracting Information from Text and Images for Location Proteomics in BIOKDD 2003: 2-9.
William W. Cohen, Pradeep Ravikumar & Stephen Fienberg (2003): A Comparison of String Metrics for Matching Names and Records in KDD Workshop on Data Cleaning and Object Consolidation.
William W. Cohen, Pradeep Ravikumar & Stephen Fienberg (2003): A Comparison of String Distance Metrics for Name-Matching Tasks in IIWeb 2003: 73-78.
William W. Cohen, Richard Wang & Robert Murphy (2003): Understanding Captions in Biomedical Publications in KDD 2003: 499-504.
William W. Cohen (2003): Infrastructure Components for Large-Scale Information Extraction Systems in IAAI 2003: 71-78.
Cheng Zhai, William W. Cohen & John Lafferty (2003): Beyond Independent Topical Relevance: Methods and Evaluation Metrics for Subtopic Retrieval in SIGIR 2003: 10-17.
- Won Ten-Year "Test of Time" Award at SIGIR 2014
William W. Cohen, Matthew Hurst & Lee S. Jensen (2003): A Flexible Learning System for Wrapping Tables and Lists in HTML Documents in Web Document Analysis: Challenges and Opportunities, ed. Antonacopoulos & Hu, Word Scientific Publishing.

Papers published in 2002

William W. Cohen (2002): Improving A Page Classifier with Anchor Extraction and Link Analysis in NIPS 2002.
William W. Cohen & Jacob Richman (2002): Learning to Match and Cluster Large High-Dimensional Data Sets For Data Integration in KDD 2002: 475-480.
William W. Cohen, Matthew Hurst & Lee S. Jensen (2002): A Flexible Learning System for Wrapping Tables and Lists in HTML Documents in WWW 2002: 232-241.

Papers published in 2001

William W. Cohen & Jacob Richman (2001): Learning to Match and Cluster Entity Names in Proc. of the ACM SIGIR-2001 Workshop on Mathematical/Formal Methods in IR.
Lee S. Jensen & William W. Cohen (2001): Grouping Extracted Fields in Proc. of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining.
Lee S. Jensen & William W. Cohen (2001): A Structured Wrapper Induction System for Extracting Information from Semi-Structured Documents in Proc. of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining.
Chumki Basu, Haym Hirsh, William W. Cohen & Craig Neville-Manning (2001): Technical Paper Recommendation: A Study in Combining Multiple Information Sources in J. Artif. Intell. Res. (JAIR) 14: 231-252 (2001).
William W. Cohen (2001): Issues in Extracting Information from the Web (Extended Abstract) in IWPT 2001.

Papers published in 2000

William W. Cohen (2000): Extracting Information from the Web for Concept Learning and Collaborative Filtering in ALT 2000: 1-12.
William W. Cohen, Andrew McCallum, Dallan Quass (2000): Learning to Understand the Web in IEEE Data Eng. Bull. 23(3): 17-24 (2000).
Jaime G. Carbonell, Yiming Yang, William W. Cohen (2000): Special Issue of Machine Learning on Information Retrieval - Introduction in Machine Learning 39(2/3): 99-101 (2000).
William W. Cohen, David McAllester, and Henry Kautz (2000): Hardening Soft Information Sources in KDD 2000: 255-259.
William W. Cohen (2000): Automatically extracting features for concept learning from the Web in ICML 2000: 159-166.
William W. Cohen and Wei Fan (2000): Web-Collaborative Filtering: Recommending Music by Crawling The Web in Computer Networks 33(1-6): 685-698 (2000).
William W. Cohen and Wei Fan (2000): Web-Collaborative Filtering: Recommending Music by Crawling The Web in WWW 2000.
William W. Cohen (2000): Data Integration using Similarity Joins and a Word-based Information Representation Language in ACM Trans. Inf. Syst. 18(3): 288-321 (2000).
William W. Cohen (2000): WHIRL: A Word-based Information Representation Language in Artif. Intell. 118(1-2): 163-196 (2000).
William W. Cohen & Prem Devanbu (2000): Automatically Exploring Hypotheses about Fault Prediction: a Comparative Study of Inductive Logic Programming Methods in International Journal of Software Engineering and Knowledge Engineering 9(5): 519-546 (1999).

Papers published in 1999

William W. Cohen and Yoram Singer (1999): Simple, Fast, and Effective Rule Learner in AAAI/IAAI 1999: 335-342.
William W. Cohen (1999): What Can We Learn from the Web in ICML 1999.
William W. Cohen (1999): Recognizing Structure in Web Pages using Similarity Queries in AAAI/IAAI 1999: 59-66.
William W. Cohen and Wei Fan (1999): Learning Page-Independent Heuristics for Extracting Data from Web Pages in Computer Networks 31(11-16): 1641-1652 (1999).
William W. Cohen and Wei Fan (1999): Learning Page-Independent Heuristics for Extracting Data from Web Pages in WWW 1999.
William W. Cohen (1999): Reasoning about Textual Similarity in a Web-Based Information Access in Autonomous Agents and Multi-Agent Systems 2(1): 65-86 (1999).
William W. Cohen (1999): A Demonstration of WHIRL (demonstration abstract) in SIGIR 1999: 327.
William W. Cohen (1999): Some Practical Observations on Integration of Web Information in WebDB (Informal Proc.) 1999: 55-60.
William W. Cohen, Rob Schapire, Yoram Singer (1999): Learning to Order Things in J. Artif. Intell. Res. (JAIR) 10: 243-270 (1999).
William W. Cohen & Yoram Singer (1999): Context-sensitive learning methods for text categorization in ACM Trans. Inf. Syst. 17(2): 141-173 (1999).

Papers published in 1998

William W. Cohen (1998): The WHIRL Approach to Information Integration in IEEE Intelligent Systems, Sept/Oct 1998, pp 20--23.
William W. Cohen & Haym Hirsh (1998): Joins that Generalize: Text Classification Using WHIRL in KDD 1998: 169-173.
Chumki Basu, Haym Hirsh, William W. Cohen (1998): Recommendation as Classification: Using Social and Content-Based Information in Recommendation. in AAAI/IAAI 1998: 714-720.
William W. Cohen (1998): Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity in SIGMOD Conference 1998: 201-212.
- Won Ten-Year "Test of Time" Award at SIGMOD 2008
William W. Cohen (1998): Providing Database-like Access to the Web Using Queries Based on Textual Similarity (demonstration abstract) in SIGMOD Conference 1998: 558-560.
William W. Cohen (1998): A Web-based Information System that Reasons with Structured Collections of Text in Agents 1998: 400-407.
William W. Cohen (1998): The WHIRL Approach to Integration: An Overview in IIWeb 1998 (informal proceedings).
William W. Cohen (1998): Hardness Results for Learning First-Order Representations and Programming by Demonstration in Machine Learning 30(1): 57-87 (1998).

Papers published in 1997

William W. Cohen (1997): Knowledge Integration for Structured Information Sources Containing Text (Extended Abstract) in SIGIR Workshop on Networked IR (informal proceedings).
William W. Cohen and Prem Devanbu (1997): A Comparative Study of Inductive Logic Programming Methods for Software Fault Prediction in ICML 1997: 66-74.
William W. Cohen and Daniel Kudenko (1997): Transferring and Retraining Learned Information Filters in AAAI/IAAI 1997: 583-590.
William W. Cohen, Robert E. Schapire, Yoram Singer (1997): Learning to Order Things in NIPS 1997.

Papers published in 1996

William W. Cohen (1996): Learning Trees and Rules with Set-valued Features in AAAI/IAAI, Vol. 1 1996: 709-716.
William W. Cohen (1996): The Dual DFA Learning Problem: Hardness Results for Programming by Demonstration and Learning First-Order Representations (Extended Abstract) in COLT 1996: 29-40.
William W. Cohen (1996): Learning Rules that Classify E-Mail in AAAI Spring Symposium on ML and IR 1996.
William W. Cohen and Yoram Singer (1996): Learning to Query the Web in AAAI Workshop on Internet-Based Information Access Systems 1996.
William W. Cohen and Yoram Singer (1996): Context-sensitive learning methods for text categorization in SIGIR 1996: 307-315.
William W. Cohen (1996): Adaptive mapping and navigation by teams of simple robots in Robotics and Autonomous Systems, 18: 411-434 (1996).

Papers published in 1995

William W. Cohen (1995): Pac-learning non-recursive prolog clauses in Artif. Intell. 79(1): 1-38 (1995).
William W. Cohen (1995): Inductive Specification Recovery: Understanding Software by Learning from Example Behaviors in Autom. Softw. Eng. 2(2): 107-129 (1995).
William W. Cohen (1995): Learning to Classify English Text with ILP Methods in Advances in ILP, ed. L. de Readt, IOS Press.
William W. Cohen (1995): Fast effective rule induction in ICML 1995: 115-123.
William W. Cohen (1995): Text categorization and relational learning in ICML 1995: 124-132.
William W. Cohen and C. David Page Jr (1995): Polynomial learnability and inductive logic programming: Methods and results in New Generation Comput. 13(3&4): 369-409 (1995).
William W. Cohen (1995): Pac-learning recursive logic programs: Efficient algorithms in J. Artif. Intell. Res. (JAIR) 2: 501-539 (1995).
William W. Cohen (1995): Pac-learning recursive logic programs: Negative results in J. Artif. Intell. Res. (JAIR) 2: 541-573 (1995).

Papers published in 1994

William W. Cohen (1994): Pac-learning nondeterminate Clauses in AAAI 1994: 676-681.
L. Thorn McCarty and William W. Cohen (1994): The case for explicit exceptions in Methods of Logic in Computer Science, 1(1).
Haym Hirsh and William W. Cohen (1994): Learning from data with bounded inconsistency: Theoretical and experimental results in Computational learning theory and natural learning systems (Volume I), MIT Press..
William W. Cohen and Haym Hirsh (1994): Learning the CLASSIC description logic: Theoretical and experimental results in KR 1994: 121-133.
William W. Cohen and Haym Hirsh (1994): Learnability of description logics with equality constraints in Machine Learning 17(2-3): 169-199 (1994).
William W. Cohen (1994): Recovering Software Specifications with Inductive Logic Programming in AAAI 1994: 142-148.
William W. Cohen, Russell Greiner, and Dale Schuurmans (1994): Probabilistic hill-climbing in Computational learning theory and natural learning systems (Volume II), MIT Press..
William W. Cohen (1994): Incremental abductive EBL in Machine Learning 15(1): 5-24 (1994).
William W. Cohen (1994): Grammatically biased learning: learning logic programs using an explicit antecedent description language in Artif. Intell. 68(2): 303-366 (1994).

Papers published in 1993

William W. Cohen (1993): Cryptographic limitations on learning one-clause logic programs in AAAI 1993: 80-85.
P. S. Rosenbloom, H. Hirsh, Cohen, and B. D. Smith (1993): Two frameworks for integrating knowledge in induction in Proc. of the Seventh Annual Workshop on Space Operations, Applications, and Research (SOAR '93).
William W. Cohen (1993): Rapid prototyping of ILP systems using explicit bias in Proc. of the 1993 IJCAI Workshop on Inductive Logic Programming.
William W. Cohen (1993): Pac-learning a restricted class of recursive logic programs in AAAI 1993: 86-92.
William W. Cohen (1993): Efficient pruning methods for separate-and-conquer rule learning systems in IJCAI 1993: 988-994.
William W. Cohen (1993): Learnability of Restricted Logic Programs in Proc. of the Third International Workshop on Inductive Logic Programming (ILP-93).
William W. Cohen (1993): A Review of `Creating a Memory of Causal Relationships' by Michael Pazzani in Machine Learning (1993).

Papers published in 1992

William W. Cohen and Haym Hirsh (1992): Learnability of Description Logics in COLT 1992: 116-127.
William W. Cohen, Alex Borgida, and Haym Hirsh (1992): Computing least common subsumers in description logics in AAAI 1992: 754-760.
William W. Cohen (1992): Using distribution-free learning theory to analyze solution path caching mechanisms in Computational Intelligence 8: 336-375 (1992).
William W. Cohen (1992): Desiderata for generalization-to-n algorithms in AII 1992: 140-150.
William W. Cohen (1992): Compiling prior knowledge into an explicit bias in ICML 1992: 102-110.
William W. Cohen (1992): Abductive explanation based learning: A solution to the multiple inconsistent explanation problem in Machine Learning 8: 167-219 (1992).

Papers published in 1991

William W. Cohen (1991): The generality of overgenerality in ICML 1991: 490-494.

Papers published in 1990

William W. Cohen (1990): Learning from textbook knowledge: A case study in AAAI 1990: 743-748.
William W. Cohen (1990): Learning approximate control rules of high utility in ICML 1990: 268-276.
William W. Cohen (1990): An analysis of representation shift in concept learning in ICML 1990: 104-112.
William W. Cohen (1990): Learning from Examples and an "Abductive Theory" in Proc. of the 1990 AAAI Spring Symposium on Abduction.

Papers published in 1988

William W. Cohen (1988): Generalizing number and learning from multiple examples in explanation-based learning in ICML 1988: 256-269.
William W. Cohen, Jack Mostow & Alex Borgida (1988): Generalizing number in explanation-based learning in Proc. of the 1988 AAAI Spring Symposium on Explanation-Based Learning.

Papers published in 1987

G. Miller, D. Rosenthal, W. Cohen, and M. Johnston (1987): Expert systems tools for hubble space telescope scheduling in Proc. of the Goddard Conference on Space Applications of Artificial Intelligence and Robotics.
T. Hornick, W. Cohen, and G. Miller (1987): A natural language query system for hubble space telescope proposal selection in Proc. of the Goddard Conference on Space Applications of Artificial Intelligence and Robotics.

Papers published in 1986

K. Bartlett, W. Cohen, Aart De Geus, and G. Hachtel (1986): Synthesis and optimization of multi-level logic under timing constraints in Proc. of the IEEE International Conference on Computer-Aided Design.

Papers published in 1985

A. De Geus and W. Cohen (1985): Optimization of combinational logic using a rule-based expert system in IEEE Design and Test of Computers.
W. Cohen, K. Bartlett, and A. De Geus (1985): Impact of metarules in a rule-based expert system for gate level optimization in Proc. of the IEEE Int'l Symp. on Circuits and Systems.

Papers published in 1984

W. Cohen, K. Bartlett, and A. De Geus (1984): Impact of metarules in a rule-based expert system for gate level optimization in Proc. of the IEEE Int'l Symp. on Circuits and Systems.
Karl Garrison, David Gregory, William W. Cohen & Aart De Geus, (1984): Automatic Area and Performance Optimization of Combinatorial Logic in Proc. of the IEEE International Conference on Computer-Aided Design.