ImdexBoston, MA

  • 31 Commits
  • 24 Pushes
  • 1 Deploys
Launch Site

Imdex

By srobin

Screencast
Quick Intro

Imdex is an OCR-Indexed image searching engine, making text within images searchable.

Description

Imdex allows you to search for images based on the the text found inside them. It runs a crawler that indexes images found online by running them through both image transformations, and Tesseract, an open source OCR engine.

Judging Instructions

Click any of the three links below to see info/stats. Start typing anywhere on the page and press enter to search for a query. 4 or more letter words work best due to inaccuracies in Tesseract.

What they Used

Tesseract, mongodb, jsfeat, mongoose, node canvas, async, express, jquery, elasticsearch,mustache

Who

Votes

Your Vote

Voting is now closed.

Other Votes

  • (64)
  • judge

    Pivotal Labs

    Awesome idea & execution! Some misses on searching for certain words (seems to be only shorter words that confuse it), otherwise very impressive.

  • judge

    I like the idea. Results were a lot different than what i expected. I guess this is possibly due to what is indexed.

  • contestant
  • judge

    A Medicore Corporation

    Pretty awesome entry for a solo team. Those live stats are pretty cool. I'd love to know more about how you built the crawler, it looks impressive (1000 URLs per second?)

    I'd love to have more context when I click on a search result. Maybe metadata about when the image was crawled and maybe what keywords it found... but for sure I want to be able to visit the original URL where you found the image.

  • judge

    Amazing job at using the right tools/technologies to get this done and working in 48 hrs, especially as a solo contestant. Certainly lots of interesting use cases this could be applied to. Wondering why google images doesn't do this already, since they maintain tesseract? If I had to nitpick, it would be that the UX doesn't seem to always be responsvie ( a couple times I searched and it didn't seem to do anything, but refreshing the page fixed it)

  • Impressive use of these tools in 48 hours! This even found images with skewed text, nice!

    Would be nice to show a loading indicator — I didn't know if my search was going through. I refreshed the page and then it worked fine.

  • judge
  • judge

    Keychain Logistics

  • judge

    Joyent

    Excellent idea, and well implemented. Found some words that work very well ("california"), and some that don't ("sun").

  • judge

    SimBin Studios

    Really nice implementation. Design is clean and easy to use. Can see it being really useful when the database becomes big enough.

    My only nitpick would be that trying to view larger versions of .png files triggers a download in Chrome (OSX).

  • contestant

    SAPO

    Didn't see much of the design. The reminds me of evernote. It seems very useful - esp. if you want your content out of evernote.

  • contestant

    Uva Wellassa University of Sri lanka

    Awesome Dude!

  • judge

    Sellside

    Awesome idea. Make a company. Seriously, right now. I can't believe this isn't already ubiquitous.

    An idea is to implement some kind of basic system for correcting results based on user feedback, a la re-captcha, whereby multiple users must suggest the same correction before it's accepted, etc

    good luck!

  • judge

    Slick.

  • contestant

    Cognifide

    I really like it! Nice idea and implementation. Front page reminds me pokeball for some reasons ;) Congratulations and good luck!

  • judge

    Yahoo

  • judge

    Joyent

    Looks great!

  • judge

    Sencha

    Simple to use and yet very powerful.

  • judge

    Very nicely done - this is something that I would definitely consider using in the future. The design is nice and clean and I got results quickly. I found the results to be accurate also - which is nice.

    I did hit the occasional JS error (mustache not being defined in one case) but all in all this is a great entry.

  • judge

    GitHub

    I really like the idea of this, especially since recently I found myself stuck when trying to search for doge images based on what I remembered the text being, however, I couldn't get this to really return any useful results for the queries I tried.

  • judge

    Samsung

  • judge

    Adobe

    Great first project!

  • judge

    Pluralsight

    Cool idea and great use of a third-party library. After watching the demo video I was expecting to see image and text side by side after searching. But it makes sense: search for a word, see the images that contain it.

    Marked low for innovation since the app written was primarily a front end to another library that does the hard work.

    • sdrobs
      contestant

      Sorry, what?

      This is completely untrue. I can show you any file data or proof that you need, that I didn't start working on this until NKO began.

    • I misheard the intro. Re-scored.

  • contestant

    Great idea!

  • judge

    nearForm

    Very impressive! Nice design, fast and accurate results. Extremely useful for xkcd alone :)

  • judge

    Awesome work!

  • judge

    GoodFit.co

    Very cool! Would be great if I could upload my images and have your app index them. Also indexing the meta data of the image such as where/when the image was taken, size of the image, dimensions, etc...

  • contestant
  • contestant

    Woah cool, can't wait until it crawls more data. Also, you really thought the design through, interesting animation!

  • judge

    uTest

    Great project, and I love all the tools used. I am disappointed in the results produced. Is this the OCR quality or just not having enough images? For example "Sheldon Cooper" does not find anything with those two words. It finds a twitter image of "Shelton", and a few others. Weird

    Also, under "HOW" tab, copying and pasting an url seems weird. How about letting me paste any url to see the processing in action? That would be great.

  • contestant

    iZotope

    Hey this is cool, well done! How do you plan to scale this up to a full index of google images?

    • sdrobs
      contestant

      Well, unlike Google, I unfortunately don't have a warehouse of dedicated crawling servers at hand.

      If imdex were to actually become something widely used, I'm sure I could collect funds to expand my computing power. 8million+ images is not bad for just one crawling server though!

    • ZECTBynmo
      contestant

      8 million!? I somehow got the idea that it was only 100k or so. Well done. Vote updated :)

      Also, hello fellow Cambridger

    • sdrobs
      contestant

      Ah, yes. If you click stats on the home page, you can view the current index count (with a lag time of ~10 seconds)

      And hey! Glad to see another cambridge competitor here. Best of luck to you in NKO.

  • judge

    Brandcast

    Great functionality and novel idea.

  • contestant

    Ancestry.com

    This is awesome!! Very impressive work.

  • judge

    Sequoia Capital

    This is great. The biggest limitation in my playing around with it is just the number of images you've indexed. I've often encountered scanned court documents and yearned for them to be searchable. Seems like this could be a way to do that.

  • judge

    Microsoft

    Great idea. Very nice use of open-source OCR. Would be cool if you could upload your own images to search.

  • contestant
  • contestant

    Strange results

    After review the whole entries directory I've changed my scales

    • sdrobs
      contestant

      Do you have any constructive criticism for me? Did something not work?

    • becevka
      contestant

      http://srobin.2013.nodeknockout.com/search/sun http://srobin.2013.nodeknockout.com/search/man

      shows most results without search word and I am not sure that would be the best matches.

      But in general the idea is quite innovative so I've appreciate that.

    • sdrobs
      contestant

      Ah, ok that makes sense; I've actually noticed that 3 letter words aren't working too well due to having fuzzy-search enable for searching. Larger words seem to be significantly more accurate though.

      Thank you for the feedback!

  • contestant

    Wow this is awesome!

    I love that it is an active crawler, so it will get better with time.

    How are you planning to scale this up?

  • judge

    Groupon

    The demo was great! Awesome project, great use of the technology. Unfortunately, i couldn't get any searches to work (http://srobin.2013.nodeknockout.com/search/theorem, http://srobin.2013.nodeknockout.com/search/basketball, http://srobin.2013.nodeknockout.com/search/latitude)

  • contestant

    Big-Oh Studios, Inc.

    Very useful in the age of all the "memes" that people put up. It's hard to track images like that down, and that frustration is compounded by remembering a word from it and STILL not being able to find it.

    Well done, sir.

  • contestant

    I think it has a very useful potential. How about extending it to become a document repository that's automatically OCR-ed? There are organisations with legacy paper-based documents that would love to be able to search those documents.

  • contestant

    copyPastel

    This is the kind of project that surprises you when it works at all.

  • contestant

    Activimetrics LLC

    This is really cool, but it wasn't clear to me what the site was supposed to be doing. I tried typing a different image url into the box and it kept showing the frat boy one. I wanted to see how it would do on a word cloud image. The search terms I typed in produced images that my old eyes had trouble reading without clicking, and then the images were generally off-topic.

  • judge

    Scoutzie

    I typed "cat" and search didn't worked. so I typed "http://srobin.2013.nodeknockout.com/search/cat" for searching. In the result, this image was shown. http://data1.whicdn.com/images/83991406/thumb.jpg I think the OCR engine is not that accurate.

    • sdrobs
      contestant

      Sorry your searches didn't turn up correctly :(

      Tesseract often has 1 letter typos (eg. twatter instead of twitter), so I am using a fuzzy search algorithm. Because of this, imdex seems to work a bit better on searches with longer words.

  • contestant

    Hacker School

    Very cool! This seems like a really useful tool.

  • contestant

    SocialTables.com

    Seinfeld came up!

IMPORTANT DATES

REGISTRATION
SEP 17
COMPETITION
NOV 9-11 UTC
JUDGING
NOV 11-17
WINNERS
NOV 18

Thank you to our Platinum Sponsors