10 Highlights from NeurIPS 2023 (2024)

Since the debut of OpenAI's ChatGPT in November 2022↗, AI has been at the peak of its second wave. The companies behind the large foundation language models are leading the charge with models like Claude by Antropic↗, Llama 2 by Meta↗, and Gemini by Google↗. Even with a border slowdown in venture capital funding, numerous early-stage AI startups↗ building on these foundation models flourished in 2023. This enthusiasm extends to the academic world with a significant jump in the number of accepted papers↗ on Generative AI and LLMs. VESSL AI was in New Orleans for NeurIPS 2023 and we could the this same trend and and excitement throughout the conference. Here are the 10 highlights and trends we observed at NeurIPS 2023.

1. Numbers

The conference began with an opening remark that highlighted the consistent increase in the number of paper submissions since 2017. NeurIPS 2023 saw a 20% increase in both the submissions and accepted papers — each growing from 10,411 to 12,344 and 2,617 to 3,218. Despite this increase, the acceptance rate remains relatively stable with over 12,000+ reviewers involved in the review process.

A notable trend is the increase in the number of authors per paper and the length of titles and abstracts. Terms like"generative," "transformer," "agent," and "zero-shot"↗ appeared more frequently with "multimodal," "text-to-image," "captioning," and "visual question answering"↗ being the most common modalities.

2. Hiring

Despite the slowdown in the tech industry, quant trading firms and AI startups prominently dominated↗ the sponsor booths at NeurIPS. Quant trading giants Citadel and Jane Street led the sponsorship alongside tech behemoths like Google DeepMind, Microsoft, Meta, and Apple. AI startups including Cohere, Perplexity, and together.ai also featured, along with Hudson River Trading, DE Shaw & Co, and Two Sigma.

This dynamic occurs amidst a broader context of industry layoffs; according to layoffs.fyi↗, 260,771 employees were laid off from 1,178 tech companies in 2023, predominantly in large tech firms. However, this has contrastingly resulted in a shift of hiring power, with machine learning researchers and engineers increasingly moving from big tech companies to quant trading firms and innovative AI startups.

3. Outstanding papers

NeurIPS announced two outstanding papers, two runner-ups, and two outstanding datasets and benchmark track papers↗, celebrating breakthroughs across various domains. The outstanding papers were awarded to “Privacy Auditing with One\(1\) Training Run”↗ by Google DeepMind and “Are Emergent Abilities of Large Language Models a Mirage?”↗ by Stanford University. These papers offer significant insights into the auditing of differentially private systems and the true nature of emergent abilities in language models.

The runner-up recognitions went to “Scaling Data-Constrained Language Models,”↗ a collaborative work between HuggingFace and Harvard University, which delves into scaling language models with limited data, and “Direct Preference Optimization: Your Language Model is Secretly a Reward Model”↗ by Stanford University, introducing an innovative method for model fine-tuning.

The conference also spotlighted two outstanding datasets and benchmark track papers: “ClimSlim”↗ and “DecodingTrust,”↗ which contribute to a large-scale hybrid dataset for climate emulation and a comprehensive assessment of trustworthiness in GPT models, respectively.

4. LLM — HuggingFPT, Tree of Thoughts

For the LLM subject in NeurIPS, HuggingGPT↗ and Tree of Thoughts↗ are worth highlighting. HuggingGPT is the next leap in AI task management and execution. As an advanced controller, HuggingGPT employs the power of Large Language Models (LLMs) like ChatGPT to orchestrate a symphony of existing AI models, tackling complex AI challenges in the following stages.

Stage #1: Task Planning — LLM parses the user request into a task list and determines the execution order.
Stage #2: Model Selection — LLM assigns appropriate models to tasks based on the description of expert models on Hugging Face.
Stage #3: Task Execution — Expert models on hybrid endpoints execute the assigned tasks.
Stage #4: Response Generation — LLM integrates the inference results of experts and generates a summary of workflow logs to respond to the user.

Tree of Thoughts represents an evolutionary stride in language model inference, enhancing the problem-solving capabilities of AI. Building upon the "Chain of Thought" approach, it introduces a more deliberate and generalized method for prompting language models. By enabling exploration over coherent units of text, or "thoughts," it facilitates a structured progression of intermediate steps toward problem-solving. Each "thought" acts as a stepping stone, guiding the AI through complex reasoning paths and fostering a deeper, more nuanced understanding. This approach not only refines the inference process but also amplifies the model's ability to tackle intricate tasks with increased clarity and sophistication.

5. Optimization & fine-tuning — QLoRA & MeZO

Two significant developments stand out in the optimization and fine-tuning sector: QLoRa↗ and MeZO↗. QLoRA, or Quantized Low-Rank Adapters, enables the fine-tuning of 65B parameter models on just a single 48GB GPU by reducing memory usage, all while maintaining full 16-bit-fine-tuning task performance. This is achieved by backpropagating gradients through a frozen, 4-bit quantized pre-trained language model into Low-Rank Adapters. Remarkably, the best model family, Guanaco, sets new standards by outperforming all previously openly released models on the Vicuna benchmark. Achieving 99.3% of the performance level of a model like ChatGPT with just 24 hours of fine-tuning on a single GPU, QLoRA represents a significant leap forward in making state-of-the-art AI more accessible and efficient.

MeZO, a Memory-efficient Zeroth-Order optimizer, revolutionizes the fine-tuning of language models. Adapting the classical zeroth-order stochastic gradient descent (ZO-SGD) method to operate in place, MeZO allows for fine-tuning LMs with the same memory footprint as inference, significantly improving resource utilization. For instance, with just a single A100 80GB GPU, MeZO can train a substantial 30B parameter model. This task would typically be limited to a 2.7B parameter model when using traditional backpropagation methods.

MeZO doesn't compromise on performance; through comprehensive experiments across various model types, scales, and downstream tasks, MeZO not only significantly outperforms in-context learning and linear probing but also achieves comparable results to fine-tuning with backpropagation. It offers up to 12 times memory reduction and up to 2 times reduction in GPU hours, marking it as a highly efficient and powerful tool in AI optimization.

6. RL & Alignment — Direct Preference Optimization (DPO)

In the domain of reinforcement learning, revisiting Direct Preference Optimization(DPO)↗ sheds light on an innovative approach to controlling large-scale unsupervised LMs. DPO emerges as a groundbreaking solution by introducing a new parameterization of the reward model in RLHF↗. This novel approach enables the extraction of the corresponding optimal policy directly and in closed form. As a result, it simplifies the RLHF problem to one that can be solved with a straightforward classification loss. By doing so, DPO stands out as a stable, performant, and computationally lightweight alternative.

7. Multi-modality — InstructBLIP, LLaVA, AudioCraft

The multi-modality in AI saw significant advancements at NeurIPS, with three papers standing out for their innovative contributions: InstructBLIP↗, LLaVA↗, and AudioCraft↗. InstructBLIP tackled the complex challenge of building general-purpose vision-language models. This paper provided a systematic and comprehensive study on vision-language instruction tuning based on the pre-trained BLIP-2 models. It introduced an instruction-aware Query Transformer designed to extract informative features. InstructBLIP attained a state-of-the-art zero-shot performance across various datasets.

LLaVA, or LArge Language and Vision Assistant, merges visual understanding and language processing into a cohesive framework. One of LLaVA’s novel approaches is its use of instruction tuning on generated data, where it employs a language-only GPT-4 to generate multimodal language-image instruction-following data. This unique method effectively leverages the strengths of the pre-trained LLM and the visual model. By choosing Vicuna↗ as its LLM, LLaVA enhances its performance across various visual and language tasks.

AudioCraft, developed by Meta, represents a groundbreaking development in the generative audio landscape. It serves as a comprehensive code base for a wide array of generative audio needs, encompassing music, sound effects, and audio compressions. This ambitious project is composed of four following papers:

MusicGen: Simple and Controllable Music Generation↗ (NeurIPS 2023)
AudioGen: Textually Guided Audio Generation↗ (ICLR 2023)
EnCodec: High-Fidelity Neural Audio Compression↗ (TMLR)
Multi-Band Diffusion: From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion↗ (NeurIPS 2023)

MusicGen and AudioGen focus on generating music and audio from text-based user inputs. EnCodec lays the foundation for both MusicGen and AudioGen. It functions as a state-of-the-art, real-time audio codec, utilizing neural networks to compress various types of audio and reconstruct the original signal with high fidelity.

8. Mix small models — Mistral 8x7B

The buzz at NeurIPS 2023 wasn’t just about the papers; it was also about the impressive release of Mistral AI’s Mixtral 8x7B.↗ This high-quality sparse mixture of the expert model (SMoE) set a new standard in efficiency and performance, outshining the Llama 2 70B on most benchmarks with a 6x faster inference rate. Designed to handle 32k tokens and multilingual capabilities. Despite its relatively puny size, its strong performance extends notably into code generation andits ability to match GPT-3.5 on some benchmarks↗.

9. Social events

Outside the formal sessions and booths at the Convention Center, NeurIPS offered a vibrant array of social events that provided excellent networking opportunities for students and faculty. This year marked a shift from previous NeurIPS conferences. At the same time, big tech companies scaled back their social engagements, and a surge of startups stepped in to fill the void, possibly reflecting their aggressive hiring strategies. Noteworthy events included Pika @ NeurIPS↗, Perplexity’s Happy Hour↗, AI Grant @ NeurIPS↗, Luma Happy Hour @ NeurIPS↗, Open Source AI Party↗, and VESSL AI’s Networking Event↗. Despite the changing landscape, these gatherings remained essential for anyone looking to mingle with senior faculty, industry leaders, and investors, underlining the enduring value of face-to-face interaction in the AI community.

10. Creative AI & 10-year ToT Award

NeurIPS presented two interesting trials this year. One highlight was the “Creative AI” segment, featuring stunning performances during the opening and closing sessions. “The WHOOPS! Gallery: Intersection of AI, Creativity, and the Unusual↗” showcased a blend of AI creativity and unusual, captivating viewers with its innovative artistry. Meanwhile, “Kiss/Crash - NeurIPS 2023 Creative Track↗” showed an artistic AI expression to attendees a continuous stream of thought-provoking content.

The conference commemorated a decade of impactful research with the Ten-year Test-of-Time Award, celebrating the influential paper “Distributed Representations of Words and Phrases and their Compositionality↗.” Published at NeurIPS 2013 and cited over 40,000 times, this groundbreaking study introduced the word embedding technique word2vec↗, revolutionizing the field of natural language processing. The award honored the contributions of speakers Greg Corrado↗ and Jeff Dean↗, acknowledging their pivotal role in advancing our understanding of word representations and their applications in AI. This acknowledgment highlighted the lasting impact of their work and set a precedent for future innovations in the domain.

—

Floyd, Product Manager