JULY SOFT .NET BLOG

Hadoop 1 Master & 2 Slaves Setup

01 January 2017 Julysoft Blog, Geysir Enterprise Search, BlogEngine.NET (0)

Why Hadoop is important in handling Big Data?

Hadoop provides excellent big data management provision, supports the processing of large data sets in a distributed computing environment. It is designed to expand from single servers to thousands of machines, each providing computation and storage. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure, which minimizes the risk of catastrophic system failure, even if a significant number of nodes become out of action. Hadoop is very valuable for large scale businesses.

Hadoop installation scenario on 3 Ubuntu machines:

ub1 is server node and ub2 and ub3 are the slaves nodes.

Steps:

We will install Hadoop on master node ub1
Hadoop is based on java framework, so we will install java first:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install default-jdk
sudo apt-get install oracle-java8-installer

Last command will install java at "/usr/lib/jvm/java-8-oracle". In order to check if the installation was Ok use next command:
Create a hadoop group and "hduser" user as system user:
Install SSH for secure accessing one machine from another(used by Hadoop for acceing slaves nodes):
Configure SSH. Login with hduser:
Generate SSH key for hduser:
Copy id_rsa.pub to authorized keys from hduser:
Add "hduser" to sudoers:
Hadoop doesn’t work on IPv6, so Ipv6 must be disabled:
- Add into above file below settings:

# disable ipv6

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6 = 1

net.ipv6.conf.lo.disable_ipv6 = 1

CRTL+X -> yes

Locate hadoop installation parent directory:
Download Hadoop:
Extract Hadoop sources:
Move hadoop-2.7.3 to hadoop folder:
Assign ownership of this folder to Hadoop user hduser:
Create Hadoop temp dirs for namenode and datanode:
Assign ownership of this Hadoop temp folder to Hadoop user:
Check JAVA_HOME path:
Edit hadoop configuration files. Edit ".bashrc" file:

# -- HADOOP ENVIRONMENT VARIABLES START -- #

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOMEi

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

# -- HADOOP ENVIRONMENT VARIABLES END -- #

Edit "hadoop-env.sh":
Edit "core-site.xml":
Edit "hdfs-site.xml":
- cd /usr/local/hadoop/etc/hadoop
- sudoedit hdfs-site.xml
- add into above file:

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>

</property>

Edit "yarn-site.xml":
- cd /usr/local/hadoop/etc/hadoop
- sudoedit yarn-site.xml
- Add into above file:

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

Copy template of mapred-site.xml.template file:
Edit "mapred-site.xml":
- cd /usr/local/hadoop/etc/hadoop
- sudoedit mapred-site.xml
- Add into above file it:

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

Reset the PC & open again the terminal with hduser. Format namenode:
Start all hadpop daemons:
Verify hadoop daemons:

Test resource manager: Http://localhost:8088

Test namenode: Http://localhost:50070

Now we will extend setup hadoop on slaves nodes.
Add all host names to /etc/hosts directory in all Machines (Master and Slave nodes). You can find each PC IP using ifconfig command
- on UB1 / then on UB2 / then on UB3:
  - sudo vim /etc/hosts
  - if vim is not installed you will intall it using:
  - Add into above file:

10.0.3.15 UB1

10.0.3.16 UB2

10.0.3.17 UB3

Create hadoop as group and hduser as user in all slaves Pcs
Install rsync for sharing hadoop source on all PCs
Edit core-site.xml on master PC:
Edit hdfs-site.xml on master and replace replication factor from 1 to 3
Edit yarn-site.xml on master:

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>UB1:8025</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>UB2:8035</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>UB3:8050</value>

</property>

Edit mapred-site.xml on master and add new entry:

<property>

<name>mapreduce.job.tracker</name>

<value>UB1:5431</value>

</property>

Edit on master node the master:
Update slaves on master:
- cd /usr/local/hadoop/etc/hadoop
- sudo vim slaves

## Add name of slave nodes

UB2

UB3

Use rsync on master:
- First install SSH on each slave PC
- sudo rsync -avxP /usr/local/hadoop/ hduser@UB1:/usr/local/hadoop/
- sudo rsync -avxP /usr/local/hadoop/ hduser@UB2:/usr/local/hadoop/
On master:
On each slave node:
Execute on master:
Execute on master:
Excute on each slave:
Test:

In order to configure WebHDFS, we need to hdfs-site.xml as follows:

        <property>
           <name>dfs.webhdfs.enabled</name>
           <value>true</value>
        </property>

Copy local folder to hadoop:

hadoop fs -copyFromLocal /home/user/DataFolder /

Http://UB1:50070 :

For a Better Search,

July Soft Team - www.julysoft.net

Responsibility. Integrity. Passion.

Geysir Ent. Search version 1.1 – CRM Import Service from VC & Emails Free*

25 October 2016 Julysoft Blog, Geysir Enterprise Search (0)

Geysir is an High-End Enterprise Search Solution, and if you get Enterprise Licence – this includes also a Document Management Solution (DMS), a Customer Relationship Solution (CRM) and an Issue Tracker Support Solution and if you take into consideration price and seamless integration of those systems (Geysir Search Portal can search also in those systems) really the choice is no brainier.

Let’s say you acquired Enterprise Licence and this includes a DMS and a CRM solution on top of your Geysir Enterprise Licence. You now are willing to use your CRM to make a difference towards your customers – you need to take time to add every client, its contact data, communication history, etc.

Well – we @ JULY SOFT are 100% aware of this tedious operation and that you / your employees many not afford the time to execute it. This is why we offer – for free – the service to import your clients / contacts (initial import) from 2 main sources:

- your Emails

- your Visit Cards

Yes, you got it right – we OCR / parse your Visit Cards and we then import it to your new Geysir CRM – offered free along Geysir Enterprise Full Licence! Of course we cannot guarantee 100% accuracy because OCR is not “exact science” but overall – from our past experiences we deliver an average of 70-90% accuracy (depending on VC / scanning quality) of VC Import into CRM – note for phone numbers this tends to 100%...

We will provide you / yours IT staff full guidance to make this process as fast and as smooth as possible and we guarantee 1-2 business days is done and you may start using your CRM from day 3 having ALL your contacts + associated history and data safely stored in your database!

*this service is available as free service rather for clients based in Bucharest / Romania

As you may think this operation requires rather our staff on your site and this is why we offer at the moment this service for free only for clients from Bucharest / Romania. There are nevertheless options to provide same service for remote clients.

For a Better Search,

July Soft Team

Responsibility. Integrity. Passion.

Geysir Enterprise Search version 1.1 supports now OCR (Optical Character Recognition) for PDF and IMAGE Formats

25 October 2016 Julysoft Blog, Geysir Enterprise Search (0)

As we all know, too often the company's scanner does not offers OCR for scanned documents, and this is why very often companies have a lot of documents in PDF (Portable Document Format) or Image Files (png, jpg, tiff, bmp, etc) that contains Text and Business Data but in Image/Binary format thus hardly searchable. More than that, those documents are stored in Network Shares or SQL Databases – in same format (image, non-OCR)– without easy way to search by text content’s keywords.

We at JULY SOFT are 100% aware of this fact – companies have huge amounts of business-critical documents (PDFs or Images) as Image Scans (non-OCR). This is why, starting version 1.1 we have introduced OCR (Optical Character Recognition) feature for PDF and Image Files for any Enterprise Licence.

The work to run OCR on all those documents and then update them and updating associated keywords and tags takes too much and your business cannot afford to lose time in non-productive activities. Just by implementing Geysir Enterprise you will get – never mind having a high-end Enterprise Search Solution – also OCR out-of the box, totally transparent for you. Geysir will take care to OCR all your scanned PDFs and Image Files (regardless exact format) and you will use Geysir Search Web Portal to search within those images texts – with exact same experience as they were already in text format (OCR).

Just keep in mind that the price of a 12 months Geysir Enterprise Licences includes OCR as a nice feature along many dozens others and yes, is way cheaper than to buy only a OCR library that well, does only OCR. Geysir is an Enterprise Search High-End Class Solution, and if you get 1 Enterprise Licence – this includes also a Document Management Solution (DMS), a Customer Relationship Solution (CRM) and an Issue Tracker Support Solution and if you take into consideration price and seamless integration of those systems (Geysir Search Portal can search also in those systems) really the choice is no brainier.

For a Better Search,

July Soft Team

Responsibility. Integrity. Passion.

Geysir Enterprise Search 1.0.0.8 is out - You Get More Time Every New Version...

11 September 2016 Julysoft (0)

July Soft just rolled out Geysir Enterprise Search version 1.0.0.8.

The main new functionality is now "Get More Like This..." button.

As its name says, this new feature allows the user - both in Web UI (available to Enterprise, Professional or Basic license owners) and in Desktop Console UI (available to all license owners, including Free Version users) - to get all other documents that are "similar" with a given one - resulted from a previous search.

Let's take a very common example:

A lawyer in your company needs to consult "Contract Client Ben". He logs in Geysir Search Portal and types "Contract Ben".

Geysir returns few results - among them "IT Services Contract - Ben.pdf".

The lawyer needs now to see other IT Services contracts to compare them with Ben's contract...

Now, he can just click on "More Like This ..." button that is available on first result - he will then magically get all IT Services contracts existing in the Company's repository.

So, now, any Geysir user - including Free users - can enjoy this time saving new feature that is needed so often in today's reality in any company...

Also Geysir Desktop UI has been further polished and starts to look sleek and elegant as Web UI is.

Curious to see if you can Get More This? Just request us your download Geysir Free Link!

Happy Searching,

July Soft Team

Request Geysir Enterprise Search Free

04 September 2016 Julysoft Geysir Enterprise Search (0)

Thank you for requesting July Soft Geysir Free product from HERE.

Hope you will like Geysir and how it helps your business get more energy from data!

Change your search experience with us and try Geysir,

July Soft Team – www.julysoft.net

Responsibility. Integrity. Passion.

What is the benefit for My Organization to acquire July Soft Geysir?

04 September 2016 Julysoft Geysir Enterprise Search (0)

It's hard to measure the gain brought by an Enterprise Search system. But what would we do without them? Let's try to imagine a day without web search engines.

The first benefit of Geysir is increased productivity, less time spent on searching or re-creating documents not found! New employees are easier to accommodate, reduced costs, cheaper IT operations and satisfied customers about your support quickness! Real help in implementing any certification, really useful in any audit scenario!

Don't waste your time, you have to find quickly your intranet data.

It's simple, Geysir is your solution.

Feel free to ask for your GEYSIR FREE kit here.

Enterprise Search systems are more than an informatics product: they are a way of working to be successful!

Change your search experience with us and try Geysir,

July Soft Team – www.julysoft.net

Responsibility. Integrity. Passion.

July Soft Geysir – Take the Search Stress Out Like a Cat Does ...

03 September 2016 Julysoft Geysir Enterprise Search (0)

July Soft Geysir – Take the Search Stress Out Like a Cat Does ...

Dear computer and data user,

I know you're wondering sometimes, especially during extremely busy days how you can improve your search experience and find instantly the file you need.

I know you're wondering sometimes, what you can do to avoid the need to bother any colleague with a lot of questions about where the data is, or to ask the network admin where is an email, or to avoid the time lost to re-create certain documents you simply don't find!

Just ask yourself: are you really satisfied how quick you find any type of electronic data inside the company?

Maybe not ... Today you have a new product Geysir able to find within seconds your data. Indeed there is a product like this and there is no exaggeration!

You find hard to believe, but it's true: using Geysir you can find within 1 second everything you are searching in an index of 800 GB data files!

Convince yourself in the following ways:

Geysir FREE – use for free a limited version of Geysir
Skype Geysir Demo: request a free live demo of Geysir by Skype
Geysir TRIAL: request the Trial kit to test Geysir for free for 7 days

by sending us an email cu julysoft@runbox.com and give a chance to Geysir product.

You have nothing to lose, by contrary you will find the solution to take the company data search stress out simple as a cat does...

Change your search experience with us and try Geysir,

July Soft Team – www.julysoft.net

Responsibility. Integrity. Passion. Pet lovers.

Geysir v. 1.0.0.7 launched! New features: Search Autocomplete Suggestions and Similarity "More Like This"!

24 August 2016 Julysoft Geysir Enterprise Search (0)

We have just launched version 1.0.0.7 of Geysir Enterprise Search.

As you can see in the above image, two brand new features has been introduced in 1.0.0.7:

1) Search Autocomplete Suggestions - very handy when users are exploring data and they don't really know in advance exactly they need to find. One such example is when they search a term / name / that they do not understand, and they try to find out what that term is. The term has a context and associated keywords. This new feature does just this: takes the current user's query (incomplete) and automatically finds its most important keywords associated with its context, thus helping the user to get better the initial term.

2) Similarity "More Like This..." More precise: Web UI has now a new button – for each document called "More Like This...". This new feature allows that starting from one document to get all similar documents in all categories. By contrast with Danube SIMILARITY – that is multi-node (allows similarity across multiple Geysir instances), "More Like This" is only on current instance but in compensation the later is available both to admin and normal users, while Danube SIMILARITY is only for admin users.

Very common scenario: User type Invoice 34 - an invoice for Client X, Just by clicking on "More Like This" command of the resulted document "Invoice 34", user will obtain instantly ALL Invoices of Client X!

"More Like This" is available in following license types:
a) Basic
b) Professional
c) Enterprise
Whilst Danube SIMILARITY is available only in Enterprise Licenses!

Conclusion:

Enterprise Search should help the users spend less time searching information they need and same time helping them finding answers to questions. Similarity feature helps users to get a large collection of documents similar with a given one and as our example above - is very, very often the case in any company, while Search Suggestions Autocomplete feature helps users to dive and explore large collections of data in same way web search engines do...

Happy Searching,

July Soft Team

Why Successful Enterprise Search Requires Data De-Duplication tools like JULY SOFT KATLA!

21 August 2016 Julysoft Geysir Enterprise Search (0)

Executive Summary:

Before Investing in Enterprise Search, Invest in Maximizing Data Quality - this way you will get the most return form your investment!

The above image is in fact a summary of this article and emphasizes that as Step #1, before Step #2 (which is Enterprise Search Implementation) companies should invest time in increasing Data Quality given that Data Quality impacts directly the success and return of investment in Enterprise Search Tools!

Any Enterprise Search Implementation should be treated as a separate Project.

A Project is: limited in time, has measurable results, has clear goals.

So how we measure the Success of an Enterprise Search Project Implementation?

Answer is: there are many variables here, but we will focus now on two:

- Recall - the fraction from ALL RELEVANT DOCUMENTS of RETURNED RESULTS

- Precision - the fraction from RETURNED RESULTS of RELEVANT RESULTS

In plain English -Recall is capacity of Search Engine to "Remember" all relevant documents relative to a user's query, while Precision is capacity to return a high concentration of relevant documents relative to user query. Higher the Recall and Precision the better!

So, a successful implementation of an Enterprise Search for a company can be measured by computing (for most important/often terms that employees are using in day-to-day operations) values of both Recall and Precision!

We at July Soft develop and implement GEYSIR Enterprise Search. We help companies to use their time, information and workforce more efficient and effective.

To see more details about benefits of Geysir you may visit this link.

Geysir's implementation success depends of its Recall and Precision - as mentioned before. But unfortunately those 2 variables are not only depending on our software's quality, they also depends heavily on input data quality!

Data quality can be drastically improved by:

- Eliminating Duplicated Files / Data

- Create Quality Meta-Data (implicit or explicit)

- Organize Data

- Eliminate old, useless data

- Etc

As we offer a limited GEYSIR Free Version you may request here, we also offer - FREE - July Soft KATLA File Organizer and Duplicate Removal Tool.

KATLA de-duplicate files, organizes data (Ex: split it between Archive and Working data, etc), creates implicit meta-data through data auto-organization, and more.

As the header image summarizes our point here, it worth, before investing in any Enterprise Search Implementation Project to increase Quality, Search-ability of data just to make sure Recall and Precision of implementation are highest possible thus maximizing the return of your investment and make your organization more efficient.

Note: If you are a technical person or willing to see more details and WHY eliminating duplicates increases Recall and Precision, you need to be at ease with following terms:

- Term Frequency

- Inverse Document Frequency

- TF-IDF Weighting

- You may do so while visiting this Wikipedia page.

If you want more details, we can show you at your email request at iulia@runbox.com a free live Geysir demo by Skype for about 1 hour.

For a better search,

July Soft Team - www.julysoft.net

Responsibility. Integrity. Passion.

Geysir Enterprise Search supports now LAN Indexing

15 August 2016 Julysoft Geysir Enterprise Search (0)

We support now in Geysir quite few important connectors at the moment:

Windows Local Disks, Folders, LAN Shares - which is Disk Indexer
Outlook PST backups - which is PST Outlook Indexer
Unix & Mac OS Folders - which is SSH Indexer
Web Sites (both Intranet and Internet) - which is Web Indexer
FTP sites (secured or no) - which is FTP Indexer
Mailboxes, using POP3 protocol - which is Mail Indexer
SQL Databases Binary Files - which is SQL Indexer
SharePoint web sites - which is SharePoint Indexer
Team Foundation Server sites / tasks / documents - which is Tfs Indexer

Today we have just launched Geysir Enterprise Search Server version 1.0.0.6 that has a brand new family member:

LAN IP, Hosts & IP Ranges Shares - which is LAN Indexer

Using this new connector very easy - in few seconds & clicks any network administrator can setup Geysir to index HUNDREDS of LAN computers.

Why so fast? Simply because we do support IP Ranges and cascading settings - which drastically reduces time needed to setup a search LAN project!

How LAN Manager works?

He index 1 to N IP Addresses, Hosts and/or N IP Ranges you may need to setup to index. For all computers found on from the collection will crawl all shares set either at root level or for any particular device. We also support excluded hosts - where you may exclude servers, printers, etc.

And just because a picture makes at 1000 words, see below a capture of Geysir LAN Indexer Setup Settings Dashboard:

As you may notice in the above picture you can set in your search project many IP Ranges and for each one you may opt-in for root settings (Username, Password, Domain, Shares) or you can personalize for each range / IP or Host you opt-out so!

Just imagine how convenient is to:

Be able to search within all important Company Documents - spread over hundreds of computers in few seconds
Be able to access a File from Bob's computer while his computer is shut down and see its changes history also
Allow your employees to search inside a whole LAN in a matter of few seconds
Allow your employees to be more efficient by leveraging EXISTING LAN data - that today you have but is so difficult to use it!

Hope you find this article interesting. Any questions you may have about Geysir Enterprise Search Server or LAN Indexer in particular feel free to:

Visit Geysir Official Web Page @ Geysir Enterprise Search
Ask us directly by email @ stelian

Happy Searching!

July Soft Team - www.julysoft.net

Responsibility. Integrity. Passion.

Hadoop 1 Master & 2 Slaves Setup

Why Hadoop is important in handling Big Data?

Hadoop installation scenario on 3 Ubuntu machines:

Geysir Ent. Search version 1.1 – CRM Import Service from VC & Emails Free*

Geysir Enterprise Search version 1.1 supports now OCR (Optical Character Recognition) for PDF and IMAGE Formats

Geysir Enterprise Search 1.0.0.8 is out - You Get More Time Every New Version...

Curious to see if you can Get More This? Just request us your download Geysir Free Link!

Request Geysir Enterprise Search Free

What is the benefit for My Organization to acquire July Soft Geysir?

July Soft Geysir – Take the Search Stress Out Like a Cat Does ...

Geysir v. 1.0.0.7 launched! New features: Search Autocomplete Suggestions and Similarity "More Like This"!

Why Successful Enterprise Search Requires Data De-Duplication tools like JULY SOFT KATLA!

Executive Summary:

Before Investing in Enterprise Search, Invest in Maximizing Data Quality - this way you will get the most return form your investment!

Geysir Enterprise Search supports now LAN Indexing

What #Logistics, Software Development have in common?

Planning is 10x more efficient than action without planning!

Wanting to have average usage of your Warehouse's gates of 90%?

3PL - DockScheduling Solution

You can't do Efficient Cross-Docking without a #DockScheduling Solution!

#JulySoftNet can implement #DockScheduling

#DockScheduling Solution may free up to 50% of time spent by Managers

How to eliminate time spent by Warehouse Managers with scheduling?

La Multi Ani 2022!

July Soft HEKLA at #EECONNECTED 2021