Report On Object Detection Using YOLO
Report On Object Detection Using YOLO
With the advent of new technologies, a whole lot realm of possibilities is open to
mankind which otherwise would have been either impossible or a miracle. Computer
vision is one such field which has millions of possibilities and this project itself being
a primary example of it.
This project aims to help the blind society to experience the world independently
with the help of a speech-based feedback device. This project proposes (1)
identifying walkable spaces, (2) text recognition and text- to-speech, (3) identify and
locate specific types of objects; and walking navigation which can be incorporated
into this project as a future scope.
This is implemented with the YOLO algorithm for object detection, which uses the
COCO dataset. Our project will help a blind person to walk easily by finding the
path, detect obstacles in front of them and thus avoid it. It will help them read texts as
well, which is done using OCR which uses python and an API for the text
recognition. Thereafter, gTTS is used to convert text to speech, which is the final
output for the users.
VI
TABLE OF CONTENTS
1.1Introduction 1
2.3 Technologies 4
2.4 Management 4
following areas 7
4. Training program 9
5. Learning experience 10
7.3 Solutions 14
7.3.1 Proposed System 14
7.3.2 Objectives 15
7.3.3 Advantages of proposed system 15
7.4 Methodology 15
7.4.3 OpenCV 17
7.4.4 TesseractOCR 18
7.4.5 Python 18
8. Snapshots 21
9. Conclusion 22
Reference 23
VIII
LIST OF FIGURES
1 Embedded System 05
2 System Architecture 17
IX
Object Detection using YOLO
CHAPTER 1
INTRODUCTION
The company is the pioneers in design and development of Single Board Computers,
Compilers for micro-controllers within India. Talented professional in the field of embedded
hardware, software design and development toil to reach its excellence.
Technofly Solutions & Consulting was found in year 2017 by a team with 14+ years
of experience in embedded systems domain. Technofly Solutions focuses globally on
automotive embedded technologies and VLSI Design, Corporate Training & Consulting.
Till now we have delivered more than 15+ Corporate Trainings for companies working in
Embedded Automotive Technologies in India.
Also involved in the Development of OBD2 (On Board Diagnose Product for
Passenger cars) for clients in India.
SUMMARY: The main Problem associated with visually impaired person is that they
cannot see any object, they cannot read any text and also when they move anywhere, they
can‟t get any proper information about any obstacles in their path which might be able to
hurt them. A Special type of camera which will be mounted on the glove will capture the
image of that text which blind wants to read, according to the instruction of blind.
Also involved in the Development of OBD2 (On Board Diagnose Product for
Passenger cars) for clients in India.
Present the company is involved with developing the GPS Training system for two
wheels with our associated partners also more focusing on Corporate Trainings on
AUTOMOTIVE EMBEDDED and Focused on providing ASIC solutions that involves
Design and Verification IP‟s And Functional Verification of Designs.
Since 2015, company also started offering design and development services. This
includes a complete spectrum of activities in product development life cycle that is idea
generation, requirement gathering to prototype making, testing and manufacturing.
Company has so far provided product design services for various sectors which include the
Industrial automation, Instrumentation, Automotive, Consumer and Defense sector.
2.2Technical Expertise
Technical skills are the abilities and knowledge needed to perform specific tasks. They are
practical, and often relate to mechanical, information technology, mathematical, or
scientific tasks. Some examples include knowledge of programming languages, design
programs, mechanical equipment, or tool.
2.4 Management:
The Management team as mixture of Technical and Business development expertise
with 14+years of experience in the Information Technology Field.
Initially, the company Developed system software tools; these include C Compilers for
micro-controllers and other supporting tools such as assembler, linker, simulator and
Integrated Development Environment. Later Single Board Computers (SBCs) – were
developed and are still manufactured. Such hardware boards support a broad range of
processors – including 8 bit, 16 and 32 bit processor.
Since 2015, company also started offering design and development services. This includes a
complete spectrum of activities in product development life cycle that is idea generation,
requirement gathering to prototype making, testing and manufacturing. Company has so far
provided product design services for various sectors which include the Industrial
automation, Instrumentation, Automotive, Consumer and Defense sector.
2.6 Services of Technofly:
Embedded Software engineering Services:
When you don‟t have enough time, or the right skills on hand, you can supplement the team
with expert embedded engineers from Technofly, who can tackle the projects with
confidence, take out the risk, and hit the milestones. They have take as much ownership as
you want us to, and make sure the project is done right, on time and on budget. Go ahead,
check our reputation for on-time, on-budget delivery. They have earned it, time and again.
They can help you cut risk on embedded systems R&D, and accelerate time to market.
Technofly is the best choice for designing and developing embedded products from concept
to delivery. Our team is they will versed in product life cycles. They build complex software
systems for real-time environments and have unique expertise and core competencies in the
following domains: Wireless, Access and IOT/Cloud.
The department is actively involved in acquiring latest technologies related projects in Low
power VLSI, wireless domain and these projects are well thought out and detailed
implementations are carried out. Projects are mainly done on Verilog, MATLAB platform
(from math works) and may also depend on NS2, NetSim and Xilinx platforms as per the
requirements of the project in progress.
Current internship involves study implementation and analysis of High speed and Energy
Efficient Carry Skip adder (CSKA) with Hybrid model for achieving high speed and reducing
the power consumption.
Study Requirements: Low power VLSI design and fundamentals of Digital circuits
Implementation Requirements: Verilog code / Modelsim tool
Detection Test Static: Simulation results
Platform: Verilog and simulated by Modelsim 6.4cand synthesized by Xilinx tool.
Real Time Embedded System and Low power VLSI design Department:
Technofly solution embedded software, hardware, system development, system integration,
verification and product realization services to customers in automotive electronics and
consumer electronics segments worldwide. Technofly solution has more than 14 years of
experience in embedded systems on a variety of platforms such as Microprocessors,
Programmable Logic Devices (PLDs) and ASICs. Accord develops applications based on the
various commercially available real time and embedded operating systems.
Design Services:
Technofly solution offer services in the areas of:
Hardware design and development
Software design and development
SUMMARY: Technofly is the best choice for designing and developing embedded products
from concept to delivery. The team is well-versed in product life cycles. They have built
complex software systems for real-time environments and have unique expertise and core
competencies in the following domains: Wireless, Access and IOT/Cloud.
CHAPTER 3
SUMMARY:
The training has equipped us with more knowledge in addition to that obtained in class in
relation to the working environment.
CHAPTER 4
TRAINING PROGRAM
The training program is defined as an activity or activities that include undertaking one
or a series of courses to boost performance, productivity, skills and knowledge.
During the next week, we did a study on gradle and components of android studio
and focused on the front end while my teammates worked on the back end. After that, the
work proceeded with the design phase which led to the development of the object
detection and labeling of the objects by the end of the second week.
During the third week, we focused in the section of Graphic User Interface (GUI)
where we primarily worked to perform different tasks in the desktop from a laptop and
other electronic device.
In the last week, with my teammates to detect various objects in the video clips
and convert Image to Text.
I finally worked with my teammates to prepare Flow charts and diagrams and
report for the work done during previous weeks.
SUMMARY: The Internship exercise was mainly to enable us to acquire practical skills and
link theory to practice in the real-world so as to meet the labor market needs. We have been
able to acquire practical skills like object detection, text reading and image to speech also
monitoring and evaluation techniques among others.
CHAPTER 5
LEARNING EXPERIENCE
Learning experience design (LX design) is the process of creating learning experiences that
enable the learner to achieve the desired learning outcome in a human centered and goal-
oriented way.
Knowledge acquired: The knowledge I have gained in our training is about Machine
Learning. I have learnt many things on Computer Vision as well. While working on
technologies related to machine learning, it really enriched my knowledge about
Jupyter note book and programming languages like python. It has been a great
technical learning experience as it taught me a lot about various algorithms in machine
learning and appropriate use of it and really improved my technical knowledge and
skills.
Skills learned: I have learned to work well as a team. Another side that I learned
throughout my internship is to never be afraid to ask doubts. By asking questions I got
answers. I learnt to read the use-cases and understand the need and analyse the
problem and find solution to it. I had worked projects in group and group discussion
helped me a lot in finding the solution because team work helps to get more ideas.
Internships give students that hands-on experience they need. I feel that worth
internships are essential to develop key skills. It helped support my knowledge of
responsibility, focus, energy and motivation.
Observed attitudes and gained values: The value that I have gained is to always
work hard even if the task is small and it seems unimportant. It helped me to build a
good work idea, and the effort could be seen. Co-workers had a lot of experience and
I have talked to them and asked some advice they have for me. I could learn a lot and
get more ideas. I think this internship is extremely cherished to me. Internship
enhanced my skill and ability to work in a team. Internship allowed me to gain
experience and develop interpersonal skills which made me an attractive candidate.
The most challenging task performed: The most challenge task was to learn to work
with resilience and patience. While working on the logic of face recognition, I really
felt that I might not be able to achieve with what I wanted to with the results
Object Detection using YOLO
but after tireless work, it has all been worth it. Also, to sit for hours without much
break and to continuously work towards the development of the project was
challenging as well but it molded me in a better way.
SUMMARY: Although there are several aids available for the visually impaired person but
those aids can be considered only for few small obstacle’s detection and also, they do not
Completely ensure the safety of blinds. In this Paper a smart glove has been presented
which can be easily Carried out anywhere and perform multifunction’s. Basically, it works
as an artificial eye for blinds.
CHAPTER 6
WEAKNESS
My weakness during internship were, I was not comfortable to work under pressure. Before
this, I thought I am someone that is competent to work under pressure. Since I am still new in
the field, I am lacking in terms of planning, making decisions and business plan. I am quite
weak in receiving information therefore I need to carefully listen to what guide briefs out to
me and I need to take note immediately what has been assigned to me.
OPPORTUNITIES
The opportunities that I have gained from this internship are, I was able to gain more
experience and knowledge, and also build a relationship with teammates. I am not a tough
person but since I am working in internship, it demanded me to embrace myself in dealing
with everyone. In fact, I think this is among the best things I had experienced. Moreover, I
was able to increase my knowledge.
THREATS
Threats arises when conditions in external environment affects the reliability and profitability
of the organization’s business, they compound the vulnerability when they relate to the
weakness. Threats are uncontrollable like unrest among employees, ever changing
technologies, increasing competition leading to excess capacity, price wars and reducing
industry profits.
MyVox—Device for the communication between people: blind, deaf, deaf- blind
andunimpaired
In the MyVox based existing system for the communication device, named MyVox,
has proven to be a useful tool for an Usher syndrome patient who is now able to
communicate with others without the need of an interpreter upgraded system that will also
Object Detection using YOLO
upgraded system that will also be tried by a larger population of deaf-blind users. The
problem with this approach is that no internet accessibility, portability refers Individuals
with disabilities such as visual, auditory, or speech impairments has difficulty in
communicating as others have limited or no knowledge of Braille and sign language.
Implementation of Gesture Based Voice and Language Translator For Dumb People.
Dual channel ADC, this paper proposes a system that converts gestures given by the user in
the form of English alphabets into corresponding voice and translates this English voice
output into any other Microsoft supported languages. The system consists of MPU6050 for
sensing gesture movement, Pc with open cv for processing, three button Keypad and
speaker. It is implemented by using trajectory recognition algorithm for recognizing
alphabets. Pc with open cv generates voice output for the text in multiple languages using
voice RSS and Microsoft translator. When tested, the system recognized A-Z alphabets and
generated voice output based on the gestures in multiple languages.
7.3 Solutions
7.3.1 Proposed System
According to World Health Organization (WHO), there are over 1.3 billion people who are
visually impaired across the globe, out of which more than 36 million people are blind. India
being the second largest population in the world, contributes 30% of the overall blind
population. Although there are enough campaigns being conducted to treat these people, it
has been difficult to source all the requirements. It is the era of artificial intelligence and it
has gained immense traction due to large amount of data and ease of computation. Using
artificial intelligence, it is possible to make these people‟s life much easier. The goal is to
provide a “secondary sight” until they have enough resources required to treat them. People
with untreatable blindness can use this to make their everyday tasks much easier and
simpler.
Sequence diagrams are typically associated with use case realizations in the Logical View of
the system under development. Sequence diagrams are sometimes called event diagrams or
event scenarios.
Firstly the user sends a file to gui::OCR (graphical user interface). The Optical Character
Recognition (OCR) identifies if there is any image, next it gets image then if any character is
identified then it is returned back to the user. Then the image is read using
reader:Image_reader and passed to OCR Engine. Next it identifies a character using
nextChar: Graphic char function, after reading all the characters it is segmented then moved
to recognition function. Finally, the EOF is reached and returned back to the user.
Fig 7.1: Sequence diagram
The "actors" are people or entities operating under defined roles within the system.
The “scenario” is a specific sequence of actions and interactions between actors and the
system. “Use case” is a collection of related success and failure scenarios, describing
actors using the system to support a goal.
There are 3 use case diagrams:
Text-to-speech usecase
Image to speech use case
Object detection
Fig 7.2: Use case Diagram of Image-To-Speech
The user writes words in a file, he/she uploads a file containing words. This is given as input
to the system. System converts words into speech output. The users can pause/stop/rewind
this speech or he can even re-listen to the speech as shown in figure.
The user captures the image/gesture from the Logitech camera. He/she can zoom-in or
zoom-out the image. This is given as input to the system, speech output is given. One can
pause/stop/rewind the speech or even can re-listen to specific word.
7.4.3 OpenCV
Its bindingis in Python, Java, and Mat lab. OpenCV runs on a varietyof platform i.e.
Windows, Linux, and MacOS, openBSD in desktop and Android, IOS and Blackberry in
mobile. It is used in diverse purpose for facial recognition, gesture recognition, object
identification, mobile robotics, segmentation etc. It is a combination of OpenCV C++ API
and Python language.
In our project we are using OpenCV version 2 OpenCV is used to gesture control to
open a camera and capture the image. It is also used in the image to text and voice
conversion technique.
7.4.4 TessaractOCR
Python Tesseract is an optical character recognition (OCR) engine for various OS.
Tesseract OCR is the process of electronically extracting1text from images and1reusing it in
a variety of ways1such as document1editing, 1free-text1searches. OCR is a technology that
is capable converting documents such as scanned papers, PDF files and captured image into
editable data. Tesseract can be used for Linux, Windows and Mac OS. It can be used by
programmers to extract typed, printed text from images using an API. Tesseract can use GUI
from available 3rd party page.
7.4.5 Python
Python is an interpreted, high-level and general-purpose programming language.
Python's design philosophy emphasizes code readability with its notable use of significant
indentation. Its language constructs and object-oriented approach aim to help programmers
write clear, logical code for small and large-scale projects.
The increasing importance of software running on generic platforms has enhanced the
discipline of software engineering. Object-oriented analysis and design methods are becoming
the most widely used methods for computer systems design. The UML has become the
standard language in object-oriented analysis and design. It is widely used for modelling
software systems and is increasingly used for high designing non- software systems and
organizations.
The design will contain the specification of all the modules, their interaction with other
modules and the desired output from each module. The output of the design process is a
description of the software architecture.
System architecture is a conceptual model that defines the structure, behavior, and more
views of a system. An architecture description is a formal description and representation of a
system, organized in a way that supports reasoning about the structures and behaviors of the
system. The figure 7.3 shows a general block diagram describing the activities performed by
this project.
Hardware Requirements
Software Requirements
SUMMARY: Using artificial intelligence it is possible to make these people’s life much
easier. The goal is to provide a “secondary sight” until they have enough resources required
to treat them. People with untreatable blindness can use this to make their everyday tasks
much easier and simpler.
CHAPTER 8
SNAPSHOTS
Fig 8.1: Before Object Detection
YOLO Algorithm is applied to identify and locate objects within an image or video
CONCLUSION
Saving time, cost and resource are some of the main benefits we get through object
detection movement in video records. It also reduces human efforts.
This result in better performance, more accuracy, and low latency compared to the
earlier version of YOLO.
YOLOv3 can detect object movement in video records with a proficient accuracy.
The YOLOv3 model collectively are able to detect and classify objects varying from
multiple instances of single objects to multiple instances of multiple objects.
REFERENCES
[1] Jana, A. P., Biswas, A., & Mohana. (2018). YOLO based Detection and Classification of
Objects in video records. 2018 3rd IEEE International Conference on Recent Trends in
Electronics, Information & Communication Technology (RTEICT).
[2] Garg, D., Goel, P., Pandya, S., Ganatra, A., & Kotecha, K. (2018). A Deep Learning
Approach for Face Detection using YOLO. 2018 IEEE Punecon.
[3] L. Chen, J. Su, M. Chen, W. Chang, C. Yang and C. Sie, "An Implementation of an
Intelligent Assistance System for Visually Impaired/Blind People," 2019 IEEE
International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, pp. 1-2,
2019.
[4] N. Dey, A. Paul, P. Ghosh, C. Mukherjee, R. De and S. Dey, "Ultrasonic Sensor Based
Smart Blind Stick," 2018 International Conference on Current Trends towards Converging
Technologies (ICCTCT), Coimbatore, pp. 1-4, 2018.
[5] Shaha, S. Rewari and S. Gunasekharan, "SWSVIP-Smart Walking Stick for the
Visually Impaired People using Low Latency Communication," 2018 International
Conference on Smart City and Emerging Technology (ICSCET), Mumbai, pp. 1-5, 2018.