Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in
W
word2vec
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 35
    • Issues 35
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • DESHPANDE SRIJAY PARAG
  • word2vec
  • Issues
  • #32

Closed
Open
Opened Mar 21, 2016 by DESHPANDE SRIJAY PARAG@srijayd
  • Report abuse
  • New issue
Report abuse New issue

Patch for distance.c: minor off-by-one error

Created by: GoogleCodeExporter

This is really nitpicky but.... When populating the vocab array, distance.c 
begins skipping characters after index max_w (having read 51 characters), but 
it should have stopped after index max_w - 1. Consequently, the string 
terminator for long strings is entered in the space reserved for the subsequent 
string, and is overwritten when the next string is read in causing the two to 
be mashed together.

For example, when searching for Cash_Flow given the current (as of 2015-06-15) 
GoogleNews-vectors-negative300.bin, two results overflow the printf format 
buffer, which is padded for strings up to length 50; indeed these two string do 
not appear in the vocabulary, but are constructed when two vocabulary entries 
-- a long one followed by a normal one -- are mashed together as described 
above. After applying the attached patch the printf formatting looks fine as 
only the first 50 characters of the long entries are printed.

Original issue reported on code.google.com by daniel.j...@gmail.com on 17 Jun 2015 at 12:24

Attachments:

  • distance.patch
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: srijayd/word2vec#32